Python Basic, Cheat Sheets & ETL

Python Basic foundation understanding – How to get help

This is for beginner of python learner

Python is an object oriented programming

Keyword = Objects

Objects have properties

We can apply methods (@actions) on objects

Basic objects

– data types

– string = “a”, “b”, “text”

– integer = 1,2,3

– float = 1.0,2.0,3.0

– Boolean True, False

– data containers

– variables e.g. file = ‘test.csv’, a =1, b =2.0

– lists e.g. list_name = [‘test.csv’, 1, 2.0]

– dictionary e.g. dict ={ key: value} => dict = { ‘apple’:1, ‘banana’: 2 , ‘coconut’: 3}

String methods

e.g. .strip(), upper, lower()

Integer/float methods

Math operations

List methods

e.g. append(), remove()

Dictionary methods

e.g. keys()

How to get help

1. Find out the data types => type(object)

2. Find out the built-in methods => dir(object)

3. Get documentation of built in methods => help (object.method)

Personally, I would recommend the beginner to familiarize themselves with below 4 frequently used python packages for data science:

1. Numpy – http://www.numpy.org/
Cheat Sheet – from DataCamp. One of the best place to learn data science skill, try it out at https://www.datacamp.com/

2. Scipy – https://www.scipy.org/

3. Matplotlib – https://matplotlib.org/
Cheat Sheet – from DataCamp. One of the best place to learn data science skill, try it out at https://www.datacamp.com/

4. Pandas – https://pandas.pydata.org/
Cheat Sheet 1 – from DataCamp. One of the best place to learn data science skill, try it out at https://www.datacamp.com/

Cheat Sheet 2 – from https://pandas.pydata.org/

Additional data containers in Pandas:

Series

series = pd.Series ([1,-3,5,-7],index = [‘a’,’b’,’c’,’d’])

DataFrame

dataframe = pd.DataFrame({‘column_header1’:[‘column1_data1′,’column1_data2′,’column1_data3′],’column_header2’:[‘column2_data1′,’column2_data2′,’column2_data3′],’column_header3’:[‘column3_data1′,’column3_data2′,’column3_data3’] }, columns = [‘column_header1′,’column_header2′,’column_header3’])

Pandas I/O

pd.csv_read(‘file.csv’, header = none, nrows =5)
pd.to_csv(‘myDataFrame.csv’)

Excel

pd.read_excel(‘file.xlsx’)
pd.to_excel(‘dir/myDataFrame.xlsx’, sheet_name =’sheet1′)

Exploring Data Sets

.shape

.head()

.tail()

.sample()

.ndim

.columns

.iloc([:,:])

.describe

.ix

.value_counts

.hist()

Each of the packages already has its own perfectly written tutorial. Make good use of the type(), dir() and help() and practice with real data, this is how I get started in python.

Python – ETL

I have recently done a small project in a financial institution. The Oracle SQL query has became so complex and took very long query execution time due to the schema design and other restriction of the Oracle database configuration. Therefore, I used the python Pandas to do the ETL instead.

Below is just a simplified version of the original Python script for the project, but the thought process should be similar in general:

import cx_Oracle

import pandas as pd

from pandas import DataFrame

con = cx_Oracle.connect(‘user/pass@host”)

cur = con.cursor()

rows = cur.execute(“select * from table”)

result = rows.fetchall()

result[10]

df = DataFrame(result, columns = [‘a’, ‘b’, ‘c’])

/# alternatively use below pandas method to import sql query result directly into dataframe #/

sql1 = “SELECT * from table ” [Note: if the sql query is very long, use “”” to wrap the sql to avoid syntax error.]

df1 = pd.read_sql(sql1, con)

sql2 = “””SELECT * from table “””

df2 = pd.read_sql(sql2, con)

sql3 = “””SELECT * from table “””

df3 = pd.read_sql(sql3, con)

sql4 = “””SELECT * from table “””

df4 = pd.read_sql(sql4, con)

sql5 = “””SELECT * from table “””

df5 = pd.read_sql(sql5, con)

wip2 = pd.merge(txn_df, price_df, how= ‘outer’, left_on =[‘stock_ID’], right_on =[‘stock_ID’])

wip2[‘PERIOD’] = wip2.DATA_MONTH.dt.to_period(‘M’) – wip2.TRADE_MONTH.dt.to_period(‘M’)

wip2 = wip2[(wip2.PERIOD =-18)]

pd.set_option(‘display.max_columns’, None)

wip3 = pd.merge(wip2, df3, how =’left’, left_on = [‘Month’,’CODE’,’ID’], right_on =[‘MONTH’,’code’,’ID’])

wip4 = pd.merge(wip3, df4, how =’left’, left_on = [‘Month’,’code’], right_on =[‘MONTH’,’code’])

wip5 = pd.merge(wip4, df5, how =’left’, left_on = [‘MONTH2′,’Code’], right_on =[‘MONTH’,’code’])

wip5 = wip5.drop(wip5.columns[38], axis=1)

wip5 = wip5.rename(columns={‘old_column_name’: ‘new_column_name’})

df = df.rename(columns={“””TRUNC(DATA_DATE,’MM’)”””: ‘DATA_DATE’})

wip5 = pd.merge(wip4, df5, how =’left’, left_on = [‘Date’,’CODE’], right_on =[‘DATE’,’code’])

wip5[‘BUY_BASE’] = wip5.BUY_AMT / wip5.BUY_QTN

wip5[‘SELL_BASE’] = wip5.SELL_AMT / wip5.SELL_QTN

conditions = [

(wip5[‘TRANSACTION_TYPES’] == ‘BUY’) & (wip5[‘TRADED_QUANTITY’] == wip5[‘HOLDING_QUANTITY’]),

(wip5[‘TRANSACTION_TYPES’] == ‘SELL’) & (wip5[‘TRADED_QUANTITY’] == wip5[‘HOLDING_QUANTITY’]),

(wip5[‘TRANSACTION_TYPES’] == ‘BUY’) & (wip5[‘TRADED_QUANTITY’] wip5[‘HOLDING_QUANTITY’]),

(wip5[‘TRANSACTION_TYPES’] == ‘SELL’) & (wip5[‘TRADED_QUANTITY’] wip5[‘HOLDING_QUANTITY’])]

choices = [‘X_BUY’, ‘X_SELL’, ‘X_x’, ‘X_t”]

wip5[‘ACTION’] = np.select(conditions, choices)

wip5[‘BUY_RETURN’] = (((wip5.PRICE * wip5.MONTHLY_EXCHANGE_RATE) – (wip5.BUY_BASE*wip5.BASE_EXCHANGE_RATE))/ (wip5.BUY_BASE*wip5.BASE_EXCHANGE_RATE))

wip5[‘SELL_RETURN’] = (((wip5.PRICE * wip5.MONTHLY_EXCHANGE_RATE) – (wip5.SELL_BASE*wip5.BASE_EXCHANGE_RATE))/ (wip5.SELL_BASE*wip5.BASE_EXCHANGE_RATE))

wip5.loc[:,’BUY_BASE’:’SELL_RETURN’]

exchange_df.EXCHANGE_MONTH.value_counts().sort_index()

wip5.groupby([‘PORTFOLIO_CODE’,’ACTION’,’TRADE_MONTH’,’DATA_MONTH’,’PERIOD’])[‘BUY_RETURN’,’SELL_RETURN’].mean()

FINAL=wip5.groupby([‘PORTFOLIO_CODE’,’ACTION’,’TRADE_MONTH’,’DATA_MONTH’,’PERIOD’])[‘BUY_RETURN’,’SELL_RETURN’].mean()

FINAL.to_csv(‘result.csv’, sep=’\t’)

DATA = wip5.loc[wip5[‘code’] == ‘xxxxx’]

DATA.to_csv(‘xxxxx_DATA.csv’,sep=’\t’)

2 thoughts on “Python Basic, Cheat Sheets & ETL”

narayana p says:

November 15, 2018 at 6:09 am

I like your blog, I read this blog please update more content on python, further check it once at python online course
mounika r says:

November 22, 2018 at 5:11 am

Thanks for sharing valuable information. Your blogs were helpful to tableau learners. I request to update the blog through step-by-step. Also, find the tableau news at <a href="https://onlineitguru.com/tableau-online-training-placement.html" title="Tableau Online Training | Tableau Online Course in India | Online IT Guru\” rel=\”nofollow\”>Tableau Online Training Blog.

Comments are closed.

2 thoughts on “Python Basic, Cheat Sheets & ETL”

You May Also Like

Building a Personal Career Coach AI Agent with Open AI Chatgpt 4.o

LLMOps Teckstack for GenAI

Data Strategy & Data Governance Framework

Learning Journey of having GAN within LLM

Python Basic, Cheat Sheets & ETL

Related Posts

Building a Personal Career Coach AI Agent with Open AI Chatgpt 4.o

LLMOps Teckstack for GenAI

2 thoughts on “Python Basic, Cheat Sheets & ETL”

You May Also Like

Building a Personal Career Coach AI Agent with Open AI Chatgpt 4.o

LLMOps Teckstack for GenAI

Data Strategy & Data Governance Framework

Learning Journey of having GAN within LLM