Today when working on a project for Udacity online course, there are several obstacles which I struggled with for a while but later realized that it can be solved easily. Let’s get started!
Lesson 1
I want to create a new column which indicates the results of the comparison of two columns. Let’s say the two columns are called LoanAmount and InterestAmount. The new column will be called LoanHigherThanInterest which should be filled with True if LoanAmount is higher than InterestAmount.
Solution
df['LoanHigherThanInterest'] = df['LoanAmount'] > df['InterestAmount']
Lesson learned
The solution looks pretty simple. However, in the beginning I have struggled a lot with using if else to fill information in the new column called LoanHigherThanInterest. I realized that if else is not working because there are multiple values in the two columns. However, in the code below I'm not accessing the values row by row, and therefore it’s not possible for Python to generate results.
if df['LoanAmount'] > df['InterestAmount']:
df['LoanHigherThanInterest'] = True
else:
df['LoanHigherThanInterest'] = False
Lesson 2
I wanted to consolidate two columns that contain the same type of information, but are not always filled into a new column with the complete information. Let’s say these two existing columns are called Grade and Rating. The new column is called NewGrade.
Solution
df['NewGrade'] = df['Grade'].fillna(df['Rating'])
Lesson learned
In the beginning, I used the merge function to merge two columns with inner on primary key (unique key of each loan). However, the pd.merge function is actually not applicable in this case. Instead we can simply use the fillna function.
However, this solution is only applicable when we are sure that there is no overlapping data in both columns. So before doing fillna, we should check if both columns are null or both are not null via the isnull and notnull functions:
df[df['Grade'].isnull() & df['Rating'].isnull()
df[df['Grade'].notnull() & df['Rating'].notnull()