python 螺旋
Ed Hawkins, a climate scientist, tweeted the following animated visualization in 2017 and captivated the world:
气候科学家埃德·霍金斯(Ed Hawkins)在2017年发布了以下动画动画,并迷住了整个世界:
This visualization shows the deviations from the average temperature between 1850 and 1900. It was reshared millions of times over Twitter and Facebook and a version of it was even shown at the opening ceremony for the Rio Olympics.
该可视化显示了1850年至1900年平均温度的偏差。它在Twitter和Facebook上被转发了数百万次,甚至在里约奥运会开幕式上也显示了该版本的平均值。
The visualization is compelling, because it helps viewers understand both the varying fluctuations in temperatures, and the sharp overall increases in average temperatures in the last 30 years.
可视化非常引人注目,因为它可以帮助观众了解不断变化的温度波动以及过去30年平均温度的总体急剧上升。
You can read more about the motivation behind this visualization on Ed Hawkins’ website.
您可以在Ed Hawkins的网站上详细了解此可视化背后的动机。
In this blog post, we’ll walk through how to recreate this animated visualization in Python. We’ll specifically be working with pandas (for representing and munging the data) and matplotlib (for visualizing the data). If you’re unfamiliar with matplotlib, we recommend going through the Exploratory Data Visualization and Storytelling Through Data Visualization courses.
在此博客文章中,我们将逐步介绍如何在Python中重新创建这种动画效果。 我们将专门处理pandas(用于表示和修改数据)和matplotlib(用于可视化数据)。 如果您不熟悉matplotlib,建议您参加“ 探索性数据可视化”和“通过数据可视 化 讲故事”课程。
We’ll use the following libraries in this post:
在这篇文章中,我们将使用以下库:
This post is part of our focus on nature data this month. Learn more, and check out our other posts here.
这篇文章是我们本月重点关注自然数据的一部分。 了解更多信息,并在此处查看我们的其他文章 。
The underlying data was released by the Met Office in the United Kingdon, which does excellent work on weather and climate forecasting. The dataset can be downloaded directly here.
基础数据由美国联合金顿大都会办公室发布,该办公室在天气预报和气候预报方面做得很好。 数据集可直接在此处下载。
The openclimatedata repo on Github contains some helpful data-cleaning code in this notebook. You’ll need to scroll down to the section titled Global Temperatures.
Github上的openclimatedata存储库在此笔记本中包含一些有用的数据清理代码。 您需要向下滚动到标题为Global Temperatures的部分。
The following code reads the text file into a pandas data frame:
以下代码将文本文件读取到pandas数据框中:
hadcrut = pd.read_csv( "HadCRUT.4.5.0.0.monthly_ns_avg.txt", delim_whitespace=True, usecols=[0, 1], header=None )
hadcrut = pd.read_csv( "HadCRUT.4.5.0.0.monthly_ns_avg.txt", delim_whitespace=True, usecols=[0, 1], header=None )
Then, we need to:
然后,我们需要:
month
and year
columns1
column to value
0
)month
和year
列 1
列重命名为value
0
) value | 值 | year | 年 | month | 月 | ||
---|---|---|---|---|---|---|---|
0 | 0 | -0.700 | -0.700 | 1850 | 1850年 | 1 | 1个 |
1 | 1个 | -0.286 | -0.286 | 1850 | 1850年 | 2 | 2 |
2 | 2 | -0.732 | -0.732 | 1850 | 1850年 | 3 | 3 |
3 | 3 | -0.563 | -0.563 | 1850 | 1850年 | 4 | 4 |
4 | 4 | -0.327 | -0.327 | 1850 | 1850年 | 5 | 5 |
To keep our data tidy, let’s remove rows containing data from 2018 (since it’s the only year with data on 3 months, not all 12 months).
为了使我们的数据保持整洁 ,让我们删除包含2018年数据的行(因为这是唯一具有3个月而非全部12个月数据的年份)。
hadcrut = hadcrut.drop(hadcrut[hadcrut['year'] == 2018].index)
hadcrut = hadcrut.drop(hadcrut[hadcrut['year'] == 2018].index)
Lastly, let’s compute the mean of the global temperatures from 1850 to 1900 and subtract that value from the entire dataset. To make this easier, we’ll create a multiindex using the year
and month
columns:
最后,让我们计算1850到1900年全球温度的平均值,然后从整个数据集中减去该值。 为了简化此操作,我们将使用year
和month
列创建一个多索引 :
This way, we are only modifying values in the value
column (the actual temperature values). Finally, calculate and subtract the mean temperature from 1850 to 1900 and reset the index back to the way it was before.
这样,我们仅修改值列中的value
(实际温度值)。 最后,计算并减去1850至1900年的平均温度,然后将索引重新设置为以前的水平。
hadcrut -= hadcrut.loc[1850:1900].mean() hadcrut = hadcrut.reset_index() hadcrut.head()
hadcrut -= hadcrut.loc[1850:1900].mean() hadcrut = hadcrut.reset_index() hadcrut.head()
year | 年 | month | 月 | value | 值 | ||
---|---|---|---|---|---|---|---|
0 | 0 | 1850 | 1850年 | 1 | 1个 | -0.386559 | -0.386559 |
1 | 1个 | 1850 | 1850年 | 2 | 2 | 0.027441 | 0.027441 |
2 | 2 | 1850 | 1850年 | 3 | 3 | -0.418559 | -0.418559 |
3 | 3 | 1850 | 1850年 | 4 | 4 | -0.249559 | -0.249559 |
4 | 4 | 1850 | 1850年 | 5 | 5 | -0.013559 | -0.013559 |
There are a few key phases to recreating Ed’s GIF:
重建Ed的GIF有几个关键阶段:
We’ll start by diving into plotting in a polar coordinate system.
我们将开始研究极坐标系中的绘图。
Most of the plots you’ve probably seen (bar plots, box plots, scatter plots, etc.) live in the cartesian coordinate system. In this system:
您可能已经看到的大多数图(条形图,箱形图,散点图等)都位于笛卡尔坐标系中 。 在此系统中:
x
and y
(and z
) can range from negative infinity to positive infinity (if we’re sticking with real numbers)x
和y
(和z
)的范围可以从负无穷大到正无穷大(如果我们坚持实数) In contrast, the polar coordinate system is circular and uses r
and theta
. The r
coordinate specifies the distance from the center and can range from 0 to infinity. The theta
coordinate specifies the angle from the origin and can range from 0 to 2*pi.
相反,极坐标系是圆形的,并且使用r
和theta
。 r
坐标指定到中心的距离,范围可以从0到无穷大。 theta
坐标指定了与原点的夹角,范围为0到2 * pi。
To learn more about the polar coordinate system, I suggest diving into the following links:
要了解有关极坐标系统的更多信息,建议您深入以下链接:
Let’s first understand how the data was plotted in Ed Hawkins’ original climate spirals plot.
首先,让我们了解一下如何在Ed Hawkins的原始气候螺旋图中绘制数据。
The temperature values for a single year span almost an entire spiral / circle. You’ll notice how the line spans from January to December, but doesn’t connect to January again. Here’s just the 1850 frame from the GIF:
一年的温度值几乎跨越了整个螺旋/圈。 您会注意到该行从1月到12月的跨度,但不再连接到1月。 这只是GIF的1850帧:
This means that we need to subset the data by year and use the following coordinates:
这意味着我们需要按年份对数据进行子集化,并使用以下坐标:
r
: temperature value for a given month, adjusted to contain no negative values.
theta
: generate 12 equally spaced angle values that span from 0 to 2*pi.r
:给定月份的温度值,调整为不包含负值。
theta
:生成12个等距角度值,范围从0到2 * pi。 Let’s dive into how to plot just the data for the year 1850 in matplotlib, then scale up to all years. If you’re unfamiliar with creating Figure and Axes objects in matplotlib, I recommend our Exploratory Data Visualization course.
让我们深入研究如何在matplotlib中仅绘制1850年的数据,然后扩展到所有年份。 如果您不熟悉在matplotlib中创建Figure和Axes对象,建议您参加Exploratory Data Visualization课程 。
To generate a matplotlib Axes object that uses the polar system, we need to set the projection
parameter to "polar"
when creating it.
要生成使用极坐标系的matplotlib Axes对象,我们需要在创建projection
参数时将其projection
参数设置为"polar"
。
Here’s what the default polar plot looks like:
默认极坐标图如下所示:
To adjust the data to contain no negative temperature values, we need to first calculate the minimum temperature value:
要调整数据以使其不包含负温度值,我们需要首先计算最小温度值:
hadcrut['value'].min()
hadcrut['value'].min()
Let’s add 1
to all temperature values, so they’ll be positive but there’s still some space reserved around the origin for displaying text:
让我们在所有温度值上加1
,这样它们就为正,但是在原点周围仍然保留了一些空间来显示文本:
Let’s also generate 12 evenly spaced values from 0 to 2*pi and use the first 12 as the theta
values:
我们还生成从0到2 * pi的12个均匀间隔的值,并使用前12个作为theta
值:
import numpy as np hc_1850 = hadcrut[hadcrut['year'] == 1850] fig = plt.figure(figsize=(8,8)) ax1 = plt.subplot(111, projection='polar') r = hc_1850['value'] + 1 theta = np.linspace(0, 2*np.pi, 12)
import numpy as np hc_1850 = hadcrut[hadcrut['year'] == 1850] fig = plt.figure(figsize=(8,8)) ax1 = plt.subplot(111, projection='polar') r = hc_1850['value'] + 1 theta = np.linspace(0, 2*np.pi, 12)
To plot data on a polar projection, we still use the Axes.plot()
method but now the first value corresponds to the list of theta
values and the second value corresponds to the list of r
values.
为了在极坐标投影上绘制数据,我们仍然使用Axes.plot()
方法,但是现在第一个值对应于theta
值列表,第二个值对应于r
值列表。
Here’s what this plot looks like:
这是此图的样子:
To make our plot close to Ed Hawkins’, let’s tweak the aesthetics. Most of the other matplotlib methods we’re used to having when plotting normally on a cartesian coordinate system carry over. Internally, matplotlib considers theta
to be x
and r
to be y
.
为了使我们的情节接近埃德·霍金斯(Ed Hawkins),让我们调整一下美学。 在笛卡尔坐标系上正常绘制时,我们习惯于使用其他大多数matplotlib方法。 在内部,matplotlib认为theta
为x
, r
为y
。
To see this in action, we can hide all of the tick labels for both axes using:
为了了解这一点,我们可以使用以下方法隐藏两个轴的所有刻度标签:
ax1.axes.get_yaxis().set_ticklabels([]) ax1.axes.get_xaxis().set_ticklabels([])
ax1.axes.get_yaxis().set_ticklabels([]) ax1.axes.get_xaxis().set_ticklabels([])
Now, let’s tweak the colors. We need the background color within the polar plot to be black, and the color surrounding the polar plot to be gray. We actually used an image editing tool to find the exact black and gray color values, as hex values:
现在,让我们调整颜色。 我们需要极坐标图中的背景颜色为黑色,而极坐标周围的颜色为灰色。 实际上,我们使用了图像编辑工具来查找确切的黑色和灰色颜色值( 十六进制值) :
We can use fig.set_facecolor()
to set the foreground color and Axes.set_axis_bgcolor()
to set the background color of the plot:
我们可以使用fig.set_facecolor()
设置前景色,使用Axes.set_axis_bgcolor()
设置绘图的背景色:
Next, let’s add the title using Axes.set_title()
:
接下来,让我们使用Axes.set_title()
添加标题:
ax1.set_title("Global Temperature Change (1850-2017)", color='white', fontdict={'fontsize': 30})
ax1.set_title("Global Temperature Change (1850-2017)", color='white', fontdict={'fontsize': 30})
Lastly, let’s add the text in the center that specifies the current year that’s being visualized. We want this text to be at the origin (0,0)
, we want the text to be white, have a large font size, and be horizontally center-aligned.
最后,让我们在中间添加文本,以指定要显示的当前年份。 我们希望该文本位于原点(0,0)
,我们希望该文本为白色,具有较大的字体大小并水平居中对齐。
Here’s what the plot looks like now (recall that this is just for the year 1850).
这是该地块现在的样子(回想一下这只是1850年的情况)。
To plot the spirals for the remaining years, we need to repeat what we just did but for all of the years in the dataset. The one tweak we should make here is to manually set the axis limit for r
(or y
in matplotlib).
要绘制剩余年份的螺旋线,我们需要重复数据集中所有年份的操作。 我们应该在此处进行的一项调整是手动设置r
(或matplotlib中的y
)的轴限制。
This is because matplotlib scales the size of the plot automatically based on the data that’s used. This is why, in the last step, we observed that the data for just 1850 was displayed at the edge of the plotting area. Let’s calculate the maximum temperature value in the entire dataset and add a generous amount of padding (to match what Ed did).
这是因为matplotlib根据使用的数据自动缩放绘图的大小。 因此,在最后一步中,我们观察到仅1850的数据显示在绘图区域的边缘。 让我们计算整个数据集中的最大温度值,并添加大量的填充(以匹配Ed所做的事情)。
hadcrut['value'].max()
hadcrut['value'].max()
We can manually set the y-axis limit using Axes.set_ylim()
我们可以使用Axes.set_ylim()
手动设置y轴限制
ax1.set_ylim(0, 3.25)
ax1.set_ylim(0, 3.25)
Now, we can use a for loop to generate the rest of the data. Let’s leave out the code that generates the center text for now (otherwise each year will generate text at the same point and it’ll be very messy):
现在,我们可以使用for循环来生成其余数据。 让我们省去暂时生成中心文本的代码(否则每年将在同一点生成文本,这将非常混乱):
Here’s what that plot looks like:
该图如下所示:
Right now, the colors feel a bit random and don’t correspond to the gradual heating of the climate that the original visualization conveys well. In the original visualiation, the colors transition from blue / purple, to green, to yellow. This color scheme is known as a sequential colormap, because the progression of colors reflects some meaning from the data.
现在,颜色感觉有些随意,与原始可视化效果很好地传达出的气候逐渐变热不符。 在原始外观中,颜色从蓝色/紫色过渡到绿色,再过渡到黄色。 此颜色方案称为顺序颜色图 ,因为颜色的级数反映了数据中的某些含义。
While it’s easy to specify a color map when creating a scatter plot in matplotlib (using the cm
parameter from Axes.scatter()
, there’s no direct parameter to specify a colormap when creating a line plot. Tony Yu has an excellent short post on how to use a colormap when generating scatter plots, which we’ll use here.
虽然很容易指定彩色地图创造matplotlib散点图(使用时cm
,从参数Axes.scatter()
有创建线图时,没有直接的参数来指定一个颜色表。 贝宇对如何优良的短柱在生成散点图时使用颜色图,我们将在这里使用它。
Essentially, we use the color
(or c
) parameter when calling the Axes.plot()
method and draw colors from plt.cm.
. Here’s how we’d use the viridis
colormap:
本质上,我们在调用Axes.plot()
方法时使用color
(或c
)参数,并从plt.cm.
绘制颜色。 这是我们使用viridis
颜色图的方法:
ax1.plot(theta, r, c=plt.cm.viridis(index)) # Index is a counter variable
ax1.plot(theta, r, c=plt.cm.viridis(index)) # Index is a counter variable
This will result in the plot having sequential colors from blue to green, but to get to yellow we can actually multiply the counter variable by 2
:
这将导致绘图具有从蓝色到绿色的连续颜色,但是要变为黄色,我们实际上可以将计数器变量乘以2
:
Let’s reformat our code to incorporate this sequential colormap.
让我们重新格式化我们的代码以合并此顺序颜色图。
fig = plt.figure(figsize=(14,14)) ax1 = plt.subplot(111, projection='polar') ax1.axes.get_yaxis().set_ticklabels([]) ax1.axes.get_xaxis().set_ticklabels([]) fig.set_facecolor("#323331") for index, year in enumerate(years): r = hadcrut[hadcrut['year'] == year]['value'] + 1 theta = np.linspace(0, 2*np.pi, 12) ax1.grid(False) ax1.set_title("Global Temperature Change (1850-2017)", color='white', fontdict={'fontsize': 20}) ax1.set_ylim(0, 3.25) ax1.set_axis_bgcolor('#000100') # ax1.text(0,0, str(year), color='white', size=30, ha='center') ax1.plot(theta, r, c=plt.cm.viridis(index*2))
fig = plt.figure(figsize=(14,14)) ax1 = plt.subplot(111, projection='polar') ax1.axes.get_yaxis().set_ticklabels([]) ax1.axes.get_xaxis().set_ticklabels([]) fig.set_facecolor("#323331") for index, year in enumerate(years): r = hadcrut[hadcrut['year'] == year]['value'] + 1 theta = np.linspace(0, 2*np.pi, 12) ax1.grid(False) ax1.set_title("Global Temperature Change (1850-2017)", color='white', fontdict={'fontsize': 20}) ax1.set_ylim(0, 3.25) ax1.set_axis_bgcolor('#000100') # ax1.text(0,0, str(year), color='white', size=30, ha='center') ax1.plot(theta, r, c=plt.cm.viridis(index*2))
Here’s what the resulting plot looks like:
结果图如下所示:
While the plot we have right now is pretty, a viewer can’t actually understand the underlying data at all. There’s no indication of the underlying temperature values anywhere in the visulaization.
尽管我们现在所拥有的图很漂亮,但查看者实际上根本无法理解基础数据。 在可视化过程中的任何地方都没有指示底层温度值的迹象。
The original visualization had full, uniform rings at 0.0, 1.5, and 2.0 degrees Celsius to help with this. Because we added 1
to every temperature value, we need to do the same thing here when plotting these uniform rings as well.
原始可视化效果在0.0、1.5和2.0摄氏度下具有完整,均匀的环,以帮助解决此问题。 因为我们在每个温度值上加了1
,所以在绘制这些均匀环时也需要做同样的事情。
The blue ring was originally at 0.0 degrees Celsius, so we need to generate a ring where r=1
. The first red ring was originally at 1.5, so we need to plot it at 2.5. The last one at 2.0, so that needs to be 3.0.
蓝色环原本为0.0摄氏度,因此我们需要生成一个r=1
的环。 第一个红色环原为1.5,因此我们需要将其绘制为2.5。 最后一个为2.0,因此必须为3.0。
Lastly, we can add the text specifying the ring’s temperature values. All 3 of these text values are at the 0.5*pi angle, at varying distance values:
最后,我们可以添加指定环的温度值的文本。 这些文本值中的所有3个都是在0.5 * pi角度处,并且具有不同的距离值:
ax1.text(np.pi/2, 1.0, "0.0 C", color="blue", ha='center', fontdict={'fontsize': 20}) ax1.text(np.pi/2, 2.5, "1.5 C", color="red", ha='center', fontdict={'fontsize': 20}) ax1.text(np.pi/2, 3.0, "2.0 C", color="red", ha='center', fontdict={'fontsize': 20})
ax1.text(np.pi/2, 1.0, "0.0 C", color="blue", ha='center', fontdict={'fontsize': 20}) ax1.text(np.pi/2, 2.5, "1.5 C", color="red", ha='center', fontdict={'fontsize': 20}) ax1.text(np.pi/2, 3.0, "2.0 C", color="red", ha='center', fontdict={'fontsize': 20})
Because the text for “0.5 C” gets obscured by the data, we may want to consider hiding it for the static plot version.
由于数据掩盖了“ 0.5 C”的文本,因此我们可能要考虑为静态绘图版本隐藏它。
Now we’re ready to generate a GIF animation from the plot. An animation is a series of images that are displayed in rapid succession.
现在,我们准备从绘图中生成GIF动画。 动画是一系列连续显示的图像。
We’ll use the matplotlib.animation.FuncAnimation
function to help us with this. To take advantage of this function, we need to write code that:
我们将使用matplotlib.animation.FuncAnimation
函数来帮助我们。 要利用此功能,我们需要编写以下代码:
We’ll use the following required parameters when calling FuncAnimation()
:
调用FuncAnimation()
时,将使用以下必需参数:
fig
: the matplotlib Figure objectfunc
: the update function that’s called between each frameframes
: the number of frames (we want one for each year)interval
: the numer of milliseconds each frame is displayed (there are 1000 milliseconds in a second)fig
:matplotlib Figure对象 func
:每帧之间调用的更新函数 frames
:帧数(我们每年要一个) interval
:每帧显示的毫秒数(每秒有1000毫秒) This function will return a matplotlib.animation.FuncAnimation
object, which has a save()
method we can use to write the animation to a GIF file.
该函数将返回一个matplotlib.animation.FuncAnimation
对象,该对象具有一个save()
方法,可用于将动画写入GIF文件。
Here’s some skeleton code that reflects the workflow we’ll use:
以下是一些基本代码,它们反映了我们将使用的工作流程:
All that’s left now is to re-format our previous code and add it to the skeleton above. We encourage you to do this on your own, to practice programming using matplotlib. Here’s what the final animation looks like in lower resolution (to decrease loading time).
现在剩下的就是重新格式化我们之前的代码,并将其添加到上面的框架中。 我们鼓励您自己执行此操作,以练习使用matplotlib进行编程。 这是最终动画在较低分辨率下的样子(以减少加载时间)。
In this post, we explored:
在这篇文章中,我们探讨了:
You’re able to get most of the way to recreating the excellent climate spiral GIF Ed Hawkins originally released. Here are the few key things that are that we didn’t explore, but we strongly encourage you to do so on your own:
您将获得大部分方式来重新创建最初发布的出色的气候螺旋GIF Ed Hawkins。 以下是我们没有探索的一些关键事项,但我们强烈建议您自己进行:
FuncAcnimation()
method, you’ll notice that the year values are stacked on top of each other (instead of clearing out the previous year value and displaying a new year value).FuncAcnimation()
方法执行此操作,则会注意到年份值相互堆叠(而不是清除上一个年份值并显示一个新的年份值)。 翻译自: https://www.pybloggers.com/2018/05/generating-climate-temperature-spirals-in-python/
python 螺旋