Seaborn,Line charts,Bar charts and heatmap


Trends - A trend is defined as a pattern of change.

  • sns.lineplot - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.

Relationship - There are many different chart types that you can use to understand relationships between variables in your data.

  • sns.barplot - Bar charts are useful for comparing quantities corresponding to different groups.
  • sns.heatmap - Heatmaps can be used to find color-coded patterns in tables of numbers.
  • sns.scatterplot - Scatter plots show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
  • sns.regplot - Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.
  • sns.lmplot - This command is useful for drawing multiple regression lines, if the scatter plot contains multiple, color-coded groups.
  • sns.swarmplot - Categorical scatter plots show the relationship between a continuous variable and a categorical variable.

Distribution - We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.

  • sns.histplot - Histograms show the distribution of a single numerical variable.
  • sns.kdeplot - KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
  • sns.jointplot - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.

Line Charts

Visualize trends over time

Plot the data

Now that the dataset is loaded into the notebook, we need only one line of code to make a line chart!

\# Line chart showing daily global streams of each song 

# Set the width and height of the figure

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of each song 

# Set the width and height of the figure

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")

# Line chart showing daily global streams of 'Despacito'
sns.lineplot(data=spotify_data['Despacito'], label="Despacito")

# Add label for horizontal axis

Bar Charts and Heatmaps

Use color or length to compare categories in a dataset

Bar chat

Say we’d like to create a bar chart showing the average arrival delay for Spirit Airlines (airline code: NK) flights, by month.

# Set the width and height of the figure

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])

# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")

We have one more plot type to learn about: heatmaps!

In the code cell below, we create a heatmap to quickly visualize patterns in flight_data. Each cell is color-coded according to its corresponding value.

# Set the width and height of the figure

# Add title
plt.title("Average Arrival Delay for Each Airline, by Month")

# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_data, annot=True)

# Add label for horizontal axis

This code has three main components:

  • sns.heatmap - This tells the notebook that we want to create a heatmap.
  • data=flight_data - This tells the notebook to use all of the entries in xflight_data to create the heatmap.
  • annot=True - This ensures that the values for each cell appear on the chart. (Leaving this out removes the numbers from each of the cells!)








Scatter Plots

Leverage the coordinate plane to explore relationships between variables

Scatter plots

To create a simple scatter plot, we use the sns.scatterplot command and specify the values for:

  • the horizontal x-axis (x=insurance_data[‘bmi’]), and
  • the vertical y-axis (y=insurance_data[‘charges’]).
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])

sns.regplot(x=insurance_data['bmi'], y=insurance_data['charges'])

Color-coded scatter plots

We can use scatter plots to display the relationships between (not two, but…) three variables! One way of doing this is by color-coding the points.

For instance, to understand how smoking affects the relationship between BMI and insurance costs, we can color-code the points by ‘smoker’, and plot the other two columns (‘bmi’, ‘charges’) on the axes.

sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])

sns.lmplot(x="bmi", y="charges", hue="smoker", data=insurance_data)

Create histograms and density plots``


Say we would like to create a histogram to see how petal length varies in iris flowers. We can do this with the sns.histplot command.

# Histogram 
sns.histplot(iris_data['Petal Length (cm)'])

Density plots

The next type of plot is a kernel density estimate (KDE) plot. In case you’re not familiar with KDE plots, you can think of it as a smoothed histogram.

To make a KDE plot, we use the sns.kdeplot command. Setting shade=True colors the area below the curve (and data= chooses the column we would like to plot).

# KDE plot 
sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)

2D KDE plots

We’re not restricted to a single column when creating a KDE plot. We can create a two-dimensional (2D) KDE plot with the sns.jointplot command.

In the plot below, the color-coding shows us how likely we are to see different combinations of sepal width and petal length, where darker parts of the figure are more likely.

# 2D KDE plot
sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind="kde")

Color-coded plots

# Histograms for each species
sns.histplot(data=iris_data, x='Petal Length (cm)', hue='Species')

# Add title
plt.title("Histogram of Petal Lengths, by Species")

# KDE plots for each species
sns.kdeplot(data=iris_data, x='Petal Length (cm)', hue='Species', shade=True)

# Add title
plt.title("Distribution of Petal Lengths, by Species")

