Box & Whisker Plots (in Seaborn)
Provides 5 critical estimators of any given distribution. Also, referred to as a 5 point summary plot or a Box plot.
Import Seaborn and download a Dataset
Tips dataset captures the information from a restaurant, the values in columns e.g. total_bill represents the bill rendered by a table; tip implies tip rendered, sex implies whether the person is a male of a female, smoker implies whether the person is a smoker, day implies day of the week, time implies time of the day and size implies the number of people. Tips data-se comprises of 244 rows and 7 columns .
Relationship between day and total bill for the entire range of data using a box and whisker plot.
There is a lot of information, so let us break this down.
Plot the data for a single day. For ease of explanation, I pick the day with least number of customers visiting the restaurant i.e Friday.
Summary statistics of ‘Total Bill ’ column for Friday.
Plot a Box and Whisker plot for the above data
Break down the above plot
Tally and map the data points .
How are outliers calculated
The ceiling and floor calculation , beyond which data points are treated as an outlier is as below
Outliers on the higher side Q3 + (1.5 * IQR),
Outliers on the lower side is Q1 — (1.5*IQR).
IQR stands for Inter quartile range and is the difference between the Q3 & Q1 values. In this case 21.75–12.095 = 9.655. Any value beyond (21.75 + 9.655) = 31.405 is treated as an outlier.
40.17 is way way way far away from 31.04, and is treated as an outlier.
Split the plot by categorical Column
The “hue” argument splits the plot by categorical values. We now have 2 Box and Whisker plots one for each gender — male and female