A pair plot is a 2D categorical scatter plot that represents the pair wise relationship between the numerical variables in a data frame.

For a dataset with N numerical values, it plots Nc2 i.e n!/2!*(n-2)! plots. …

A distribution plot is a combination of Histogram, Kernel Density Estimate (KDE) & Rug Plot, all in one single frame !

In this post, we discuss a distribution plot in Seaborn using Tips Dataset as an example. The dataset is available as a part of Seaborn library.


Tips dataset contains…

For the regular definition, this link should suffice. In this post, I discuss the cycle of a short sell trade and obligation of an exchange participant whilst executing short-sell trades.

Cycle of a Short-sell trade

A short trade involves an analyst (hedge fund) predict the decline of a security (could…

There is plenty of material online that covers the theoretical aspect of this topic, so I’ve instead come up with, how things actually work !


Once the sales team (sell side) onboard a client , the required infrastructure/ provisioning is set up based on trading preference. …

This article illustrates

  1. How to connect Python (Jupyter Notebook) with Snowflake Database.
  2. Retrieve the results of SQL into a Pandas Data Frame.

Once the data is in a Pandas Data Frame, any operation supported by Pandas can be formed on the data set.

Software Requirements

  1. A table in Snowflake database with some data in it

The Data infrastructure within firms has evolved continuously. Processes like ETL have enabled firms transform, store and utilize zettabytes of transactional data. I recently came across a blog post by Census that discusses a relatively new concept — The Reverse ETL . Unlike ETL infrastructure where data from a warehouse…

A linear regression model is used to predict the value of a variable based on the value of another variable. The variable you want to predict (‘Y’) is referred as Target or Output variable, the variable used to predict the value of Y i.e …

Named windows may be thought of as an alias for the condition within OVER clause of SQL code. This makes the code readable, user friendly and well as reusable.

To illustrate, I use the below data set that contains columns — educational courses, category, quantity sold .

Provides 5 critical estimators of any given distribution. Also, referred to as a 5 point summary plot or a Box plot.

Import Seaborn and download a Dataset

Tips dataset captures the information from a restaurant, the values in columns e.g. total_bill represents the bill rendered by a table; tip implies tip rendered, sex implies whether the…

Partition clause restricts an analytic function to within a partition boundary.

To illustrate, I use the below data set that contains columns — educational courses, category, quantity sold .

