Visualize trends over time
Now that the dataset is loaded into the notebook, we need only one line of code to make a line chart!
\# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)
# Set the width and height of the figure
plt.figure(figsize=(14,6))
# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")
# Line chart showing daily global streams of each song
sns.lineplot(data=spotify_data)
# Set the width and height of the figure
plt.figure(figsize=(14,6))
# Add title
plt.title("Daily Global Streams of Popular Songs in 2017-2018")
# Line chart showing daily global streams of 'Shape of You'
sns.lineplot(data=spotify_data['Shape of You'], label="Shape of You")
# Line chart showing daily global streams of 'Despacito'
sns.lineplot(data=spotify_data['Despacito'], label="Despacito")
# Add label for horizontal axis
plt.xlabel("Date")
Use color or length to compare categories in a dataset
Say we’d like to create a bar chart showing the average arrival delay for Spirit Airlines (airline code: NK) flights, by month.
# Set the width and height of the figure
plt.figure(figsize=(10,6))
# Add title
plt.title("Average Arrival Delay for Spirit Airlines Flights, by Month")
# Bar chart showing average arrival delay for Spirit Airlines flights by month
sns.barplot(x=flight_data.index, y=flight_data['NK'])
# Add label for vertical axis
plt.ylabel("Arrival delay (in minutes)")
We have one more plot type to learn about: heatmaps!
In the code cell below, we create a heatmap to quickly visualize patterns in flight_data. Each cell is color-coded according to its corresponding value.
# Set the width and height of the figure
plt.figure(figsize=(14,7))
# Add title
plt.title("Average Arrival Delay for Each Airline, by Month")
# Heatmap showing average arrival delay for each airline by month
sns.heatmap(data=flight_data, annot=True)
# Add label for horizontal axis
plt.xlabel("Airline")
This code has three main components:
热力图,是一种通过对色块着色来显示数据的统计图表。绘图时,需指定颜色映射的规则。例如,较大的值由较深的颜色表示,较小的值由较浅的颜色表示;较大的值由偏暖的颜色表示,较小的值由较冷的颜色表示,等等。
类型
第一,表格型热力图,也称色块图。它需要2个分类字段+1个数值字段,分类字段确定x、y轴,将图表划分为规整的矩形块。数值字段决定了矩形块的颜色。
第二,非表格型热力图,或曰平滑的热力图,它需要3个数值字段,可绘制在平行坐标系中(2个数值字段分别确定x、y轴,1个数值字段确定着色)。优点
(1)热力图的优势在于“空间利用率高”,可以容纳较为庞大的数据。热力图不仅有助于发现数据间的关系、找出极值,也常用于刻画数据的整体样貌,方便在数据集之间进行比较(例如将每个运动员的历年成绩都浓缩成一张热力图,再进行比较)。
(2)如果将某行或某列设置为时间变量,热力图也可用于展示数据随时间的变化。例如,用热力图来反映一个城市一年中的温度变化,气候的冷暖走向,一目了然。缺点
(1)尽管热力图能够容纳较多的数据,但反过来说,人们很难将其中的色块转换为精确的数字。因此,当需要清楚知道数值的时候,可能需要额外的标注。
(2)此外,也出现了热力图的极坐标变形,即环状的热力图。需要提醒的是,这一图表与旭日图等图表外观有相似之处,但功能却是完全不同的,使用时需谨慎。
Leverage the coordinate plane to explore relationships between variables
To create a simple scatter plot, we use the sns.scatterplot command and specify the values for:
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'])
sns.regplot(x=insurance_data['bmi'], y=insurance_data['charges'])
We can use scatter plots to display the relationships between (not two, but…) three variables! One way of doing this is by color-coding the points.
For instance, to understand how smoking affects the relationship between BMI and insurance costs, we can color-code the points by ‘smoker’, and plot the other two columns (‘bmi’, ‘charges’) on the axes.
sns.scatterplot(x=insurance_data['bmi'], y=insurance_data['charges'], hue=insurance_data['smoker'])
sns.lmplot(x="bmi", y="charges", hue="smoker", data=insurance_data)
sns.swarmplot(x=insurance_data['smoker'],
y=insurance_data['charges'])
Create histograms and density plots``
Say we would like to create a histogram to see how petal length varies in iris flowers. We can do this with the sns.histplot
command.
# Histogram
sns.histplot(iris_data['Petal Length (cm)'])
The next type of plot is a kernel density estimate (KDE) plot. In case you’re not familiar with KDE plots, you can think of it as a smoothed histogram.
To make a KDE plot, we use the sns.kdeplot command. Setting shade=True colors the area below the curve (and data= chooses the column we would like to plot).
# KDE plot
sns.kdeplot(data=iris_data['Petal Length (cm)'], shade=True)
We’re not restricted to a single column when creating a KDE plot. We can create a two-dimensional (2D) KDE plot with the sns.jointplot command.
In the plot below, the color-coding shows us how likely we are to see different combinations of sepal width and petal length, where darker parts of the figure are more likely.
# 2D KDE plot
sns.jointplot(x=iris_data['Petal Length (cm)'], y=iris_data['Sepal Width (cm)'], kind="kde")
# Histograms for each species
sns.histplot(data=iris_data, x='Petal Length (cm)', hue='Species')
# Add title
plt.title("Histogram of Petal Lengths, by Species")
# KDE plots for each species
sns.kdeplot(data=iris_data, x='Petal Length (cm)', hue='Species', shade=True)
# Add title
plt.title("Distribution of Petal Lengths, by Species")