PAIR PLOTS (in Seaborn)
A pair plot is a 2D categorical scatter plot that represents the pair wise relationship between the numerical variables in a data frame.
For a dataset with N numerical values, it plots Nc2 i.e n!/2!*(n-2)! plots. Pair Plots are useful when the number of numerical variables is high, though it is not useful for representing higher dimensional data.
We explore pair plot using tips dataset that comes as a part of Seaborn library. To download the code.
The Dataset
Tips dataset contains the data from a restaurant. It has 7 columns that capture the total_bill, tip rendered, sex of the person rendering the tip, whether the person was a smoker, day of the week, time of the day and size of the table.
In the data frame there are 3 numerical columns, and hence it plots (3c2) 3 plots. You may wonder there are 6 plots, 3 above and 3 below the diagonal. The 3 plots below the diagonal are the same as 3 above with the axes interchanged
Plot in Seaborn
Customization
There is a linear relationship between the Total_Bill on a table and the tip rendered, to further understand the degree of collinearity we will need to plot a correlation heatmap that I’ve discussed in a different post.