Working with NULL values in a Pandas Data Frame
To get a true sense of numerical data, it is important to understand NULL values present in a dataset, and ways to action upon NULL items. In this post, we discuss ways of discovering and acting upon NULL data points in a Pandas data frame.
You may download the code, from the link
Dataset
This is a a very small dataset, with just 8 rows and as is visible, there are 3 NULL values in the SAL column of the dataset EPLREC
Understand NULL values in a Dataset
Before starting to work on a Dataset, it important to ensure there are no NULL data points. Inbuilt functions in pandas make the process of discovering NULL data points easy and a joy to work with.
Action NULL Data points
Once, we understand of the NULL values in a dataset, the occurrence of NULL values relative to the size of the Dataset and the problem statement at hand, we can then make an informed decision to act upon the NULL values
Populate missing values with a Custom Formula
In certain scenarios where the number of NULL values is significantly higher, dropping rows/columns may not be the best approach. Python provides us with ways to write custom formula for populating NULL Values.
In the case below, the Salary column has 3/8 NULL values, so I prefer to populate the missing values with the mean of available data points in the SALARY column .