Python Basic foundation understanding – How to get help
This is for beginner of python learner
Python is an object oriented programming
Keyword = Objects
Objects have properties
We can apply methods (@actions) on objects
Basic objects
– data types
– string = “a”, “b”, “text”
– integer = 1,2,3
– float = 1.0,2.0,3.0
– Boolean True, False
– data containers
– variables e.g. file = ‘test.csv’, a =1, b =2.0
– lists e.g. list_name = [‘test.csv’, 1, 2.0]
– dictionary e.g. dict ={ key: value} => dict = { ‘apple’:1, ‘banana’: 2 , ‘coconut’: 3}
String methods
e.g. .strip(), upper, lower()
Integer/float methods
Math operations
List methods
e.g. append(), remove()
Dictionary methods
e.g. keys()
How to get help
1. Find out the data types => type(object)
2. Find out the built-in methods => dir(object)
3. Get documentation of built in methods => help (object.method)
Personally, I would recommend the beginner to familiarize themselves with below 4 frequently used python packages for data science:
1. Numpy – http://www.numpy.org/
Cheat Sheet – from DataCamp. One of the best place to learn data science skill, try it out at https://www.datacamp.com/
2. Scipy – https://www.scipy.org/
3. Matplotlib – https://matplotlib.org/
Cheat Sheet – from DataCamp. One of the best place to learn data science skill, try it out at https://www.datacamp.com/
4. Pandas – https://pandas.pydata.org/
Cheat Sheet 1 – from DataCamp. One of the best place to learn data science skill, try it out at https://www.datacamp.com/
Cheat Sheet 2 – from https://pandas.pydata.org/
Additional data containers in Pandas:
-
Series
-
series = pd.Series ([1,-3,5,-7],index = [‘a’,’b’,’c’,’d’])
-
DataFrame
-
dataframe = pd.DataFrame({‘column_header1’:[‘column1_data1′,’column1_data2′,’column1_data3′],’column_header2’:[‘column2_data1′,’column2_data2′,’column2_data3′],’column_header3’:[‘column3_data1′,’column3_data2′,’column3_data3’] }, columns = [‘column_header1′,’column_header2′,’column_header3’])
Pandas I/O
-
CSV
-
pd.csv_read(‘file.csv’, header = none, nrows =5)
-
pd.to_csv(‘myDataFrame.csv’)
-
Excel
Exploring Data Sets
.shape
.head()
.tail()
.sample()
.ndim
.columns
.iloc([:,:])
.describe
.ix
.value_counts
.hist()
Each of the packages already has its own perfectly written tutorial. Make good use of the type(), dir() and help() and practice with real data, this is how I get started in python.
Python – ETL
I have recently done a small project in a financial institution. The Oracle SQL query has became so complex and took very long query execution time due to the schema design and other restriction of the Oracle database configuration. Therefore, I used the python Pandas to do the ETL instead.
Below is just a simplified version of the original Python script for the project, but the thought process should be similar in general:
import cx_Oracle
import pandas as pd
from pandas import DataFrame
con = cx_Oracle.connect(‘user/pass@host”)
cur = con.cursor()
rows = cur.execute(“select * from table”)
result = rows.fetchall()
result[10]
df = DataFrame(result, columns = [‘a’, ‘b’, ‘c’])
/# alternatively use below pandas method to import sql query result directly into dataframe #/
sql1 = “SELECT * from table ” [Note: if the sql query is very long, use “”” to wrap the sql to avoid syntax error.]
df1 = pd.read_sql(sql1, con)
sql2 = “””SELECT * from table “””
df2 = pd.read_sql(sql2, con)
sql3 = “””SELECT * from table “””
df3 = pd.read_sql(sql3, con)
sql4 = “””SELECT * from table “””
df4 = pd.read_sql(sql4, con)
sql5 = “””SELECT * from table “””
df5 = pd.read_sql(sql5, con)
wip2 = pd.merge(txn_df, price_df, how= ‘outer’, left_on =[‘stock_ID’], right_on =[‘stock_ID’])
wip2[‘PERIOD’] = wip2.DATA_MONTH.dt.to_period(‘M’) – wip2.TRADE_MONTH.dt.to_period(‘M’)
wip2 = wip2[(wip2.PERIOD =-18)]
pd.set_option(‘display.max_columns’, None)
wip3 = pd.merge(wip2, df3, how =’left’, left_on = [‘Month’,’CODE’,’ID’], right_on =[‘MONTH’,’code’,’ID’])
wip4 = pd.merge(wip3, df4, how =’left’, left_on = [‘Month’,’code’], right_on =[‘MONTH’,’code’])
wip5 = pd.merge(wip4, df5, how =’left’, left_on = [‘MONTH2′,’Code’], right_on =[‘MONTH’,’code’])
wip5 = wip5.drop(wip5.columns[38], axis=1)
wip5 = wip5.rename(columns={‘old_column_name’: ‘new_column_name’})
df = df.rename(columns={“””TRUNC(DATA_DATE,’MM’)”””: ‘DATA_DATE’})
wip5 = pd.merge(wip4, df5, how =’left’, left_on = [‘Date’,’CODE’], right_on =[‘DATE’,’code’])
wip5[‘BUY_BASE’] = wip5.BUY_AMT / wip5.BUY_QTN
wip5[‘SELL_BASE’] = wip5.SELL_AMT / wip5.SELL_QTN
conditions = [
(wip5[‘TRANSACTION_TYPES’] == ‘BUY’) & (wip5[‘TRADED_QUANTITY’] == wip5[‘HOLDING_QUANTITY’]),
(wip5[‘TRANSACTION_TYPES’] == ‘SELL’) & (wip5[‘TRADED_QUANTITY’] == wip5[‘HOLDING_QUANTITY’]),
(wip5[‘TRANSACTION_TYPES’] == ‘BUY’) & (wip5[‘TRADED_QUANTITY’] wip5[‘HOLDING_QUANTITY’]),
(wip5[‘TRANSACTION_TYPES’] == ‘SELL’) & (wip5[‘TRADED_QUANTITY’] wip5[‘HOLDING_QUANTITY’])]
choices = [‘X_BUY’, ‘X_SELL’, ‘X_x’, ‘X_t”]
wip5[‘ACTION’] = np.select(conditions, choices)
wip5[‘BUY_RETURN’] = (((wip5.PRICE * wip5.MONTHLY_EXCHANGE_RATE) – (wip5.BUY_BASE*wip5.BASE_EXCHANGE_RATE))/ (wip5.BUY_BASE*wip5.BASE_EXCHANGE_RATE))
wip5[‘SELL_RETURN’] = (((wip5.PRICE * wip5.MONTHLY_EXCHANGE_RATE) – (wip5.SELL_BASE*wip5.BASE_EXCHANGE_RATE))/ (wip5.SELL_BASE*wip5.BASE_EXCHANGE_RATE))
wip5.loc[:,’BUY_BASE’:’SELL_RETURN’]
exchange_df.EXCHANGE_MONTH.value_counts().sort_index()
wip5.groupby([‘PORTFOLIO_CODE’,’ACTION’,’TRADE_MONTH’,’DATA_MONTH’,’PERIOD’])[‘BUY_RETURN’,’SELL_RETURN’].mean()
FINAL=wip5.groupby([‘PORTFOLIO_CODE’,’ACTION’,’TRADE_MONTH’,’DATA_MONTH’,’PERIOD’])[‘BUY_RETURN’,’SELL_RETURN’].mean()
FINAL.to_csv(‘result.csv’, sep=’\t’)
DATA = wip5.loc[wip5[‘code’] == ‘xxxxx’]
DATA.to_csv(‘xxxxx_DATA.csv’,sep=’\t’)
I like your blog, I read this blog please update more content on python, further check it once at python online course
Thanks for sharing valuable information. Your blogs were helpful to tableau learners. I request to update the blog through step-by-step. Also, find the tableau news at <a href="https://onlineitguru.com/tableau-online-training-placement.html" title="Tableau Online Training | Tableau Online Course in India | Online IT Guru\” rel=\”nofollow\”>Tableau Online Training Blog.