Seaborn,Line charts,Bar charts and heatmap

Abstract

Trends - A trend is defined as a pattern of change.

  • sns.lineplot - Line charts are best to show trends over a period of time, and multiple lines can be used to show trends in more than one group.

Relationship - There are many different chart types that you can use to understand relationships between variables in your data.

  • sns.barplot - Bar charts are useful for comparing quantities corresponding to different groups.
  • sns.heatmap - Heatmaps can be used to find color-coded patterns in tables of numbers.
  • sns.scatterplot - Scatter plots show the relationship between two continuous variables; if color-coded, we can also show the relationship with a third categorical variable.
  • sns.regplot - Including a regression line in the scatter plot makes it easier to see any linear relationship between two variables.
  • sns.lmplot - This command is useful for drawing multiple regression lines, if the scatter plot contains multiple, color-coded groups.
  • sns.swarmplot - Categorical scatter plots show the relationship between a continuous variable and a categorical variable.

Distribution - We visualize distributions to show the possible values that we can expect to see in a variable, along with how likely they are.

  • sns.histplot - Histograms show the distribution of a single numerical variable.
  • sns.kdeplot - KDE plots (or 2D KDE plots) show an estimated, smooth distribution of a single numerical variable (or two numerical variables).
  • sns.jointplot - This command is useful for simultaneously displaying a 2D KDE plot with the corresponding KDE plots for each individual variable.

Line Charts

Visualize trends over time

Plot the data

Now that the dataset is loaded into the notebook, we need only one line of code to make a line chart!

\# Line chart showing daily global streams of each song 
sns.lineplot(data=spotify_data)

Seaborn,Line charts,Bar charts and heatmap_第1张图片

# Set the width and height of the figure
plt.figure(figsize=(14,6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of each song 
sns.lineplot(data=spotify_data)

Seaborn,Line charts,Bar charts and heatmap_第2张图片

# Set the width and height of the figure
plt.figure(figsize=(14,6))

# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")

# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")

# Line chart showing daily global streams of 'Despacito'
sns.lineplot(data=spotify_data['Despacito'], label="Despacito")

# Add label for horizontal axis
plt.xlabel("Date")

Seaborn,Line charts,Bar charts and heatmap_第3张图片

Bar Charts and Heatmaps

Use color or length to compare categories in a dataset

Bar chat

Say we’d like to create a bar chart showing the average arrival delay for Spirit Airlines (airline code: NK) flights, by month.

# Set the width and height of the figure
plt.figure(figsize=(10,6))

# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")

# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])

# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")

Seaborn,Line charts,Bar charts and heatmap_第4张图片

Heatmap

We have one more plot type to learn about: heatmaps!

In the code cell below, we create a heatmap to quickly visualize patterns in flight_data. Each cell is color-coded according to its corresponding value.

# Set the width and height of the figure
plt.figure(figsize=(14,7))

# Add title
plt.title("Average Arrival Delay for Each Airline, by Month")

# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_data, annot=True)

# Add label for horizontal axis
plt.xlabel("Airline")

Seaborn,Line charts,Bar charts and heatmap_第5张图片
This code has three main components:

  • sns.heatmap - This tells the notebook that we want to create a heatmap.
  • data=flight_data - This tells the notebook to use all of the entries in xflight_data to create the heatmap.
  • annot=True - This ensures that the values for each cell appear on the chart. (Leaving this out removes the numbers from each of the cells!)

热力图,是一种通过对色块着色来显示数据的统计图表。绘图时,需指定颜色映射的规则。例如,较大的值由较深的颜色表示,较小的值由较浅的颜色表示;较大的值由偏暖的颜色表示,较小的值由较冷的颜色表示,等等。

类型

第一,表格型热力图,也称色块图。它需要2个分类字段+1个数值字段,分类字段确定x、y轴,将图表划分为规整的矩形块。数值字段决定了矩形块的颜色。
第二,非表格型热力图,或曰平滑的热力图,它需要3个数值字段,可绘制在平行坐标系中(2个数值字段分别确定x、y轴,1个数值字段确定着色)。

优点

(1)热力图的优势在于“空间利用率高”,可以容纳较为庞大的数据。热力图不仅有助于发现数据间的关系、找出极值,也常用于刻画数据的整体样貌,方便在数据集之间进行比较(例如将每个运动员的历年成绩都浓缩成一张热力图,再进行比较)。
(2)如果将某行或某列设置为时间变量,热力图也可用于展示数据随时间的变化。例如,用热力图来反映一个城市一年中的温度变化,气候的冷暖走向,一目了然。

缺点

(1)尽管热力图能够容纳较多的数据,但反过来说,人们很难将其中的色块转换为精确的数字。因此,当需要清楚知道数值的时候,可能需要额外的标注。
(2)此外,也出现了热力图的极坐标变形,即环状的热力图。需要提醒的是,这一图表与旭日图等图表外观有相似之处,但功能却是完全不同的,使用时需谨慎。

Scatter Plots

Leverage the coordinate plane to explore relationships between variables

Scatter plots

To create a simple scatter plot, we use the sns.scatterplot command and specify the values for:

  • the horizontal x-axis (x=insurance_data[‘bmi’]), and
  • the vertical y-axis (y=insurance_data[‘charges’]).
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])

Seaborn,Line charts,Bar charts and heatmap_第6张图片

sns.regplot(x=insurance_data['bmi'], y=insurance_data['charges'])

Color-coded scatter plots

We can use scatter plots to display the relationships between (not two, but…) three variables! One way of doing this is by color-coding the points.

For instance, to understand how smoking affects the relationship between BMI and insurance costs, we can color-code the points by ‘smoker’, and plot the other two columns (‘bmi’, ‘charges’) on the axes.

sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])

Seaborn,Line charts,Bar charts and heatmap_第7张图片

sns.lmplot(x="bmi", y="charges", hue="smoker", data=insurance_data)

Seaborn,Line charts,Bar charts and heatmap_第8张图片

sns.swarmplot(x=insurance_data['smoker'],
              y=insurance_data['charges'])

Seaborn,Line charts,Bar charts and heatmap_第9张图片

Distributions

Create histograms and density plots``

Histograms

Say we would like to create a histogram to see how petal length varies in iris flowers. We can do this with the sns.histplot command.

# Histogram 
sns.histplot(iris_data['Petal Length (cm)'])

Seaborn,Line charts,Bar charts and heatmap_第10张图片

Density plots

The next type of plot is a kernel density estimate (KDE) plot. In case you’re not familiar with KDE plots, you can think of it as a smoothed histogram.

To make a KDE plot, we use the sns.kdeplot command. Setting shade=True colors the area below the curve (and data= chooses the column we would like to plot).

# KDE plot 
sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)

Seaborn,Line charts,Bar charts and heatmap_第11张图片

2D KDE plots

We’re not restricted to a single column when creating a KDE plot. We can create a two-dimensional (2D) KDE plot with the sns.jointplot command.

In the plot below, the color-coding shows us how likely we are to see different combinations of sepal width and petal length, where darker parts of the figure are more likely.

# 2D KDE plot
sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind="kde")

Seaborn,Line charts,Bar charts and heatmap_第12张图片

Color-coded plots

# Histograms for each species
sns.histplot(data=iris_data, x='Petal Length (cm)', hue='Species')

# Add title
plt.title("Histogram of Petal Lengths, by Species")

Seaborn,Line charts,Bar charts and heatmap_第13张图片

# KDE plots for each species
sns.kdeplot(data=iris_data, x='Petal Length (cm)', hue='Species', shade=True)

# Add title
plt.title("Distribution of Petal Lengths, by Species")

Seaborn,Line charts,Bar charts and heatmap_第14张图片

你可能感兴趣的:(数据科学,python,pandas,matplotlib)