Run business automating tasks with python programming.

Small business requires many tasks to be completed on PC. With python language you need to learn to integrate many libraries which can run your daily tasks or to code own solutions to outpace competitors. There are billions lines of python codes, so the limit is integrating knowledge and efficiency. To run business you can predict demand, use artificial intelligence to bring leads, automate routines tasks, integrate other programs. From salesmen to scientists use coding. Become curious!

Sunday, October 29, 2017

Python String and Loop methods

Python Sting methods can be used for file automation split(), lsplit, rsplit and other methods. For expample with lsplit('1234567890') method the numbers from the beginning of files name can removed.

With interactive examples you can use test methods https://www.programiz.com/python-programming/methods/string/

Python String Methods
Python has quite a few methods that string objects can call to perform frequency occurring task (related to string). For example, if you want to capitalize the first letter of a string, you can use capitalize() method.

The page contains all methods of string objects. Also, the page includes built-in functions that can take string as a parameter and perform some task. For example, len() method returns the length of the string passed as a parameter to this function.

Python String Method Description
capitalize() Converts first character to Capital Letter
center() Pads string with specified character
casefold() converts to casefolded strings
count() returns occurrences of substring in string
endswith() Checks if String Ends with the Specified Suffix
expandtabs() Replaces Tab character With Spaces
encode() returns encoded string of given string
find() Returns the Highest Index of Substring
format() formats string into nicer output
index() Returns Index of Substring
isalnum() Checks Alphanumeric Character
isalpha() Checks if All Characters are Alphabets
isdecimal() Checks Decimal Characters
isdigit() Checks Digit Characters
isidentifier() Checks for Valid Identifier
islower() Checks if all Alphabets in a String are Lowercase
isnumeric() Checks Numeric Characters
isprintable() Checks Printable Character
isspace() Checks Whitespace Characters
istitle() Checks for Titlecased String
isupper() returns if all characters are uppercase characters
join() Returns a Concatenated String
ljust() returns left-justified string of given width
rjust() returns right-justified string of given width
lower() returns lowercased string
upper() returns uppercased string
swapcase() swap uppercase characters to lowercase; vice versa
lstrip() Removes Leading Characters
rstrip() Removes Trailing Characters
strip() Removes Both Leading and Trailing Characters
partition() Returns a Tuple
maketrans() returns a translation table
rpartition() Returns a Tuple
translate() returns mapped charactered string
replace() Replaces Substring Inside
rfind() Returns the Highest Index of Substring
rindex() Returns Highest Index of Substring
split() Splits String from Left
rsplit() Splits String From Right
splitlines() Splits String at Line Boundaries
startswith() Checks if String Starts with the Specified String
title() Returns a Title Cased String
zfill() Returns a Copy of The String Padded With Zeros
format_map() Formats the String Using Dictionary
Python any() Checks if any Element of an Iterable is True
Python all() returns true when all elements in iterable is true
Python ascii() Returns String Containing Printable Representation
Python bool() Coverts a Value to Boolean
Python bytearray() returns array of given byte size
Python bytes() returns immutable bytes object
Python compile() Returns a Python code object
Python complex() Creates a Complex Number
Python enumerate() Returns an Enumerate Object
Python filter() constructs iterator from elements which are true
Python float() returns floating point number from number, string
Python input() reads and returns a line of string
Python int() returns integer from a number or string
Python iter() returns iterator for an object
Python len() Returns Length of an Object
Python max() returns largest element
Python min() returns smallest element
Python map() Applies Function and Returns a List
Python ord() returns Unicode code point for Unicode character
Python reversed() returns reversed iterator of a sequence
Python slice() creates a slice object specified by range()
Python sorted() returns sorted list from a given iterable
Python sum() Add items of an Iterable
Python zip() Returns an Iterator of Tuples

With search "pypi bulk file renaming", there are two projects to automate bulk file renames
batch-rename 1.0.6 : Python Package Index
rename 1.2 : Python Package Index
Search "github file bulk file renaming: first page bring 5-7 projects that you can use.

Today I coded from shared Youtube videos two files renaming automation codes. What for I need? To understand and to program integrated or tailored solutions to automate codes. Today I tested also argparse library with cmd parameters.

With search " python github automate first file in folder in selection" I found interesting project to download from cmd Youtube videos.
youtube-dl - download videos from youtube.com or other video platforms. Not test yet.; AND

autopep8 1.3.3 PEP8 is code layout and notes standard which programmers use. Like code line shall not exceed 79 characters.

autopep8 automatically formats Python code to conform to the PEP 8 style guide. It uses the pycodestyle utility to determine what parts of the

Search "github python copy files in sequence" shows "download files and renames them in sequence · GitHub". So I looked at code and copy code which rename files from first one file till end of folder.
I will adapt code if needed and use to automate files - numerate and move first, numerate and move first, there fore I will need write loop for this and other tasks functions . This code uses files list iteration.

def read_in():

lines = sys.stdin.readlines()

for i in range(len(lines)):

lines[i] = lines[i].replace('\n','')

#print lines

return lines

Also same search shows Python-programming-exercises 100+ Python challenging programming exercises. I looked through codes and excercises and many are about loops and iterations, that is worthy to know and to look for reference. Important coding to become efficient and to integrate other libraries for files automation.

Solution not to loop infinitely with variables are below:

Python Sequence

https://www.python.org/dev/peps/pep-0212/#methods-for-sequence-objects

PEP 212 -- Loop Counter Iteration

Methods for sequence objects
This solution proposes the addition of indices, items and values methods to sequences, which enable looping over indices only, both indices and elements, and elements only respectively.

Loop counter iteration
The current idiom for looping over the indices makes use of the built-in range function:

for i in range(len(sequence)):
# work with index i
Looping over both elements and indices can be achieved either by the old idiom or by using the new zip built-in function [2]:

for i in range(len(sequence)):
e = sequence[i]
# work with index i and element e
or:

for i, e in zip(range(len(sequence)), sequence):
# work with index i and element e

This would immensely simplify the idioms for looping over indices and for looping over both elements and indices:

for i in sequence.indices():
# work with index i

for i, e in sequence.items():
# work with index i and element e
Additionally it would allow to do looping over the elements of sequences and dictionaries in a consistent way:

for e in sequence_or_dict.values():
# do something with element e

Saturday, October 28, 2017

Pandas operations with data

After pdf.read_csv('anyfile.txt') I need to sort and selct data. With search " pandas dataframe select data lines will value more than" google shows link Selecting Pandas DataFrame Rows Based On Conditions https://chrisalbon.com/python/pandas_selecting_rows_on_conditions.html

# Import modules
import pandas as pd
import numpy as np

# Create a dataframe

# Create a dataframe
raw_data = {'first_name': ['Jason', 'Molly', np.nan, np.nan, np.nan], 
        'nationality': ['USA', 'USA', 'France', 'UK', 'UK'], 
        'age': [42, 52, 36, 24, 70]}
df = pd.DataFrame(raw_data, columns = ['first_name', 'nationality', 'age'])
df

	first_name	nationality	age
0	Jason	USA	42
1	Molly	USA	52
2	NaN	France	36
3	NaN	UK	24
4	NaN	UK	70

Method 1: Using Boolean Variables

# Create variable with TRUE if nationality is USA
american = df['nationality'] == "USA"

# Create variable with TRUE if age is greater than 50
elderly = df['age'] > 50

# Select all casess where nationality is USA and age is greater than 50
df[american & elderly]

first_name	nationality	age
1	Molly	USA	52

Method 2: Using variable attributes

# Select all cases where the first name is not missing and nationality is USA 
df[df['first_name'].notnull() & (df['nationality'] == "USA")]

	first_name	nationality	age
0	Jason	USA	42
1	Molly	USA	52

And with my data set I code simple code with works and on cmd screen I see:

import pandas as pd

import numpy as np

dataset = pd.read_csv("C:\\Users\\ANTRAS\\Documents\\PANDASTEST.txt", delimiter="\t")

#print(dataset)
numerous = dataset['Values'] > 30000

dataset[numerous]

forprint = dataset[numerous]

print(forprint)

C:\Python27\python.exe C:/Users/ANTRAS/.PyCharmCE2017.2/config/scratches/scratch_11.py

Name ID Values Rules

1 FIRST 4152.0 50000.0 SELL

2 SECOND 4153.0 70000.0 BUY

3 THIRD 4154.0 40000.0 OFFER

Process finished with exit code 0

And with value less than 30000 I get

numerous = dataset['Values'] < 30000

C:\Python27\python.exe C:/Users/ANTRAS/.PyCharmCE2017.2/config/scratches/scratch_11.py
Name ID Values Rules
4 FOURTH 4155.0 1500.0 SWAP
5 FIFTH 4156.0 2000.0 FREE
6 SIXTH 4157.0 500.0 GET
7 SEVENTH 4158.0 1000.0 GO

Process finished with exit code 0

Another operations needed are to delete rows which names contain only underscores (since with regex I remove all non-alphanumeric characters) and values cell contain non-numerical values. Afterwards I will extract data using keywords list to new data frame and export it to csv file.

Here again I copy Chris Alon https://chrisalbon.com/

Dropping Rows And Columns In pandas Dataframe

Import modules

import pandas as pd

Create a dataframe

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'year': [2012, 2012, 2013, 2014, 2014], 
        'reports': [4, 24, 31, 2, 3]}
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
df

	name	reports	year
Cochice	Jason	4	2012
Pima	Molly	24	2012
Santa Cruz	Tina	31	2013
Maricopa	Jake	2	2014
Yuma	Amy	3	2014

Drop an observation (row)

df.drop(['Cochice', 'Pima'])

	name	reports	year
Santa Cruz	Tina	31	2013
Maricopa	Jake	2	2014
Yuma	Amy	3	2014

Drop a variable (column)

Note: axis=1 denotes that we are referring to a column, not a row

df.drop('reports', axis=1)

	name	year
Cochice	Jason	2012
Pima	Molly	2012
Santa Cruz	Tina	2013
Maricopa	Jake	2014
Yuma	Amy	2014

Drop a row if it contains a certain value (in this case, "Tina")

Specifically: Create a new dataframe called df that includes all rows where the value of a cell in the name column does not equal "Tina"

df[df.name != 'Tina']

	name	reports	year
Cochice	Jason	4	2012
Pima	Molly	24	2012
Maricopa	Jake	2	2014
Yuma	Amy	3	2014

To find wheather it is numeric or non-numeric value requires more sophisticated methods.

Finding non-numeric rows in dataframe in pandas?

On stackoverflow I found solutions how to find with numpy isreal method numeric and non-numeric values. Here we come with Biblical alpha numeric notions. But main method is Pandas applymap

DataFrame.applymap(func)[source]

Apply a function to a DataFrame that is intended to operate elementwise, i.e. like doing map(func, series) for each series in the DataFrame

I make google search "pandas data frame select rows containing keyword attributes":

pandas + dataframe - select by partial string

def.loc[:, df.columns.str.contains('a').tolist()]

https://stackoverflow.com/questions/11350770/pandas-dataframe-select-by-partial-string

Same search brings solution of list of arguments from stackoverflow.

Select rows containing certain values from pandas dataframe

For single search value

df[df.values  == "banana"]

 df[df.isin(['banana'])]

For multiple search terms:

  df[(df.values  == "banana")|(df.values  == "apple" ) ]

df[df.isin(['banana', "apple"])]

  #         A       B      C
  #  1   apple  banana    NaN
  #  2     NaN     NaN  apple
  #  3  banana     NaN    NaN
  #  4   apple   apple    NaN

From Divakar: lines with both are returned.

select_rows(df,['apple','banana'])

 #         A       B     C
 #   0  apple  banana  pear

To delete rows with specific columns not having only intengers seems complex tasks.

So I tested many codes for Boolean how determine "Values" column to find if values are intenger.

pandas.DataFrame.dropna

also

https://stackoverflow.com/questions/14991195/how-to-remove-rows-with-null-values-from-kth-column-onward-

df2.dropna(subset=['three', 'four', 'five'], how='all')

dropna drops rows if in specified column there is NA and NaN since it is default value for null generated by Pandas.

import pandas as pd
import numpy as np
dataset = pd.read_csv("C:\\Users\\ANTRAS\\Documents\\PANDASTEST.txt", delimiter="\t")
result3 = dataset.dropna(subset=['Values'], how='all') #OKprint(result3)

Pandas remove rows which any string. On stackoverflow great solutions to integrate regex library, ix method is not known to me,

import re

regex = re.compile("[a-zA-Z]+")

result3 = dataset.ix[dataset.Values.map(lambda x: regex.search(x) is None)]

print(result3)

Before:

Name ID Values Rules

FIRST 4152 50000 SELL

SECOND 4153 70000 BUY

second45 7896 sduisl zet

THIRD 4154 40000 OFFER

FOURTH1 4161 540 SELLING

inser 4786 htpp// deit

FOURTH 4155 1500 SWAP

FIFTH 4156 2000 FREE

SIXTH 4157 500 GET

SEVENTH 4158 1000 GO

sweasd 7896 rjasj REX

After:
C:\Python27\python.exe C:/Users/ANTRAS/.PyCharmCE2017.2/config/scratches/scratch_11.py
Name ID Values Rules
0 FIRST 4152 50000 SELL
1 SECOND 4153 70000 BUY
3 THIRD 4154 40000 OFFER
4 FOURTH1 4161 540 SELLING
6 FOURTH 4155 1500 SWAP
7 FIFTH 4156 2000 FREE
8 SIXTH 4157 500 GET
9 SEVENTH 4158 1000 GO

Process finished with exit code 0

And got but which I can not explain, after I cleaned with Pandas data frame I can not make selection with values in Values column. Though bug fix research was rewarding and I found Programming with Python Python lessons for CU libraries blog. Software Carpentry.

Indexing, slicing and subsetting DataFrames with pandas

The blog is solely for stepby step automation with Pandas.

The bug with lambda function appears when there is no intenger fields in "Values column", so you need to write condition boolean if field is just with numeric exit, i.e. do not execute ix method.

Remove non-numeric rows in one column with pandas

On stackoverflow I found other solution. To be tested. Use Pandas data frames and numpy methods.

I tested these code, they not make any change.

dataset[dataset['Values'].apply(lambda x: isinstance(x, (int, np.int64)))]dataset[dataset['Values'].apply(lambda x: type(x) in [int, np.int64, float, np.float64])]

#testui = senabaze = pd.read_table('test.txt', col_index='keys_ids', col=names) #testui.to_csv('testui.csv')

I was looking to pandas cheatsheet and made searches using cheatsheet terminology "boolean indexing", objects and etc.

So I test pandas methods.

pandas.DataFrame.select_dtypes¶

C:\Python27\python.exe C:/Users/ANTRAS/.PyCharmCE2017.2/config/scratches/scratch_11.py

Name object

ID int64

Values object

Rules object

dtype: object

I there strong or object in column cell value, you get object type.

The search for advance Pandas Data Manipulation methods was rewarding. I found Python Pandas tutorial for business analysis.

So my problem can be solved by addtional function to check fo column values and exit on float64 type, o investigate and test with numpy for variablve string (object values) in specific column values.

So the problem with dtypes is that for mathematical comparison you have to have float64 type. You can convert int64 type to float64 using astype(float64) method pd (your data name) column(name of colum) plus astype(float64), i.e. example.columninexample.astype(float64). And below working data frame method to concanate data frames, remove duplicates and select rows with values more than 10.000. Note that if you read text file having add header, the data frame digits will be object type and you will not be able to convert directly to float64 (not tested for int64) type/

import pandas as pd

user_cols = ['keys_ids', 'Name', 'Members', 'Rules']
senabaze = pd.read_table('test.txt', sep=',', header=None, index_col='keys_ids', names=user_cols)
senabaze.head()

#senabaze.dtypes
#reikia su dtypes tikrimti, skaiciai del header txt file buna object, be header int64 kuriuos per astype reikia padaryti float64
senabaze['Members'] = senabaze.Members.astype(float)
#senabaze.dtypes

naujiduomenys = pd.read_table('test1.txt', sep=',', header=None, index_col='keys_ids', names=user_cols)
naujiduomenys.head()

#reikia su dtypes tikrimti, skaiciai del header txt file buna object, be header int64 kuriuos per astype reikia padaryti float64
naujiduomenys['Members'] = naujiduomenys.Members.astype(float)
naujiduomenys.dtypes

atnaujinta = pd.concat([senabaze, naujiduomenys])
atnaujinta.drop_duplicates()
netoatnaujinta = atnaujinta.drop_duplicates()

#netoatnaujinta.Members
#SSS =  netoatnaujinta.Members
#print SSS

type(False)
bool
booleans = []
for svarbios in netoatnaujinta.Members:

    if svarbios >= 10000:
        booleans.append(True)
    else:
        booleans.append(False)

daugiauneidesimttukst = pd.Series(booleans)
daugiauneidesimttukst.head()

daugiauneidesimttukst = netoatnaujinta.Members >= 10000
daugiauneidesimttukst.head()

#netoatnaujinta[daugiauneidesimttukst]

netoatnaujinta.loc[netoatnaujinta.Members >= 10000, 'Name']
TTT = netoatnaujinta.loc[netoatnaujinta.Members >= 10000, 'Name']

TTT.to_csv('Morethan10000.csv')



#testui = senabaze = pd.read_table('test.txt', col_index='keys_ids', col=names)
#testui.to_csv('testui.csv')

Python Pandas tutorial for business analysis http://modeanalytics.com
Creating Pandas DataFrames & Selecting Data

I encountered problem with reading Adwords data csv file to Pandas Data Frame. Main problem I do not see numeric data when data import, only NaN and astype(float) neither convert(objects)work. The file also is corrupted to convert to text file, null value. I also had to to choose python=engine parameter. I need to import to data frame to match with pytrends Google Trends data. Google trends data bring most relevant to category keyword whether search, picture, youtube or shopper. 100 average most relevant keyword. Google Trends data from cmd has text markups, so to automate export of five keywords data is necessary to csv or text files for further merging with glob library.

The problem to read csv file downloaded from Adwords is UTF-16 coding. Therefore you need to recode csv file to UTF-8. With UTF-8 csv file decreases in size twice.

import codecs

import csv

#for Adwords files

with codecs.open('TOTAL.csv', 'rU', 'UTF-16') as infile:

with open('TOTAL8.csv', 'wb') as outfile:

for line in infile:

outfile.write(line.encode('utf8'))

https://stackoverflow.com/questions/13875566/python-csv-file-utf-16-to-utf-8-print-error

So this pandas match of Adwords, Google Trends category relevance for images and searches. The match suggest to use 5 main keywords. Note that Adwords keywords were scrapped from competing sites urls.




 
 
 
 
 
 
 
 
 
 
 
 
  

  Keyword
  AvgMonth
  Competition
  Suggestedbid
  Inplan
  Number
  Traffic_x
  Region_x
  Traffic_y
  Region_y
 

  0
  freesia
  550
  0.64
  0.7
  N
  48
  1
  England
  2
  England
 

  1
  plants
  550
  0.59
  0.96
  Y
  52
  100
   England
  100
   England
 

  2
  alstroemeria
  55
  0.58
  0.74
  Y
  53
  0
  England
  0
  England
 

  3
  roses
  550
  0.57
  0.78
  Y
  54
  95
   England
  95
   England
 

  4
  florist
  550
  0.23
  2.12
  Y
  4
  85
   England
  67
   London
 

  5
  carnation
  550
  0.16
  0.39
  N
  8
  2
  England
  3
  London
 

  6
  horseshoe
  55
  0.16
  0.05
  N
  9
  3
  England
  3
  England
 

  7
  weddings
  55
  0.14
  0.25
  Y
  10
  38
   England
  38
   England
 

  8
  gerberas
  55
  0.02
  

  Y
  17
  0
  England
  0
  England
 

  9
  butterfly
  550
  0.01
  

  Y
  19
  1
  England
  1
  England
 

  10
  congratulations
  550
  0
  0.08
  Y
  22
  8
   England
  8
   England
 

  11
  germini
  55
  0
  

  Y
  26
  13
  England
  13
  England