Data Visualization Using Seaborn

Seaborn is a Python data visualization library that provides stunning and informative statistical graphics. In this article I will be lightly discussing a few functions used for data visualization with seaborn:

  • seaborn.jointplot()
  • seaborn.distplot()
  • seaborn.boxplot()


seaborn.jointplot() function displays a relationship between two variables (bivariate), x and y, and a univariate in the margins.

Univariate is a term used to describe a type of data that only observes a single attribute or characteristic, while a bivariate observes two types of data that are usually related. For example, number of tweets posted in a day vs number of engagements in tweets. If you only observed one of them, then it is considered univariate.

seaborn.jointplot() is intended to be a fairly lightweight wrapper [1].

There are a lot more parameters (see Seaborn official documentation) available but here are some of the important ones:

  1. x, y: vectors or keys in data
    • Variables that specify positions on the x and y axes
  2. data: pandas.DataFrame, numpy.ndarray, mapping or sequence
    • Input data structure. Either long-form or wide-form
  3. kind: {“scatter”, “kde”, “hist”, “hex”, “reg”, “resid”}
    • Kind of plot to draw

Here is an example of what a standard jointplot() function looks like when data is plotted- scatterplot with marginal histogram.

seaborn jointplot

You can load a data set of your own and assign values in parameter x and y, but in this example to show you what a jointplot() looks like I have assigned a value generated by randn(1000) to both variable’s data1 and data2, then assigned those variables to parameter x and y.

You can change the kind of plot by assigning a value from {“scatter”, “kde”, “hist”, “hex”, “reg”, “resid”} in the parameter kind.


seaborn.distplot() function displays univariate data in histogram with a line on it.

This function combines the matplotlib hist function (with automatic calculation of a good default bin size) with the seaborn kdeplot() and rugplot() functions [2].

seaborn distplot displot

There are a lot more parameters (see Seaborn official documentation) available but for this example, I have only passed in data1 (our observed data in this example) to distplot() function to show you what it looks like.


seaborn.boxplot() function displays how the values in the data are spread out. It divides the data into sections that each contains approximately 25% of the data in a set.

A boxplot displays the distribution of data based on five number summary: minimum score, first lower quartile (Q1), median, third upper quartile (Q3), and maximum score [3].

  1. Minimum score: lowest score, left whisker
  2. Lower quartile (Q1): value between the minimum and median
  3. Median: mid-point of the data
    • If the median is in the middle of the box and whiskers are the same length on both sides, then the distribution is symmetric;
    • If the median is closer to the bottom and whisker is short on the lower end, then the distribution is positively skewed;
    • If the median is closer to the top and whisker is shorter on the upper end, then the distribution is negatively skewed.
  • Upper quartile (Q3): value between median and maximum
  • Maximum score: highest score, right whisker

The longer the box the more dispersed the data is, and the shorter the box the less dispersed the data is.

Important parameters (see Seaborn official documentation for more):

  • x, y, hue: names of variables in data or vector data, optional
  • data: DataFrame, array, or list of arrays, optional

In this example and output of boxplot() function, it shows the dispersion between Mfr Name data and CombMPG data taken from a DataFrame called df.


[1] Seaborn.jointplot()

[2] Seaborn.distplot()

[3] Understanding Boxplots