matplotlib实际上是一种比较低级的工具。要绘制一张图表,你组装一些基本组件就行:数据展示(即图表类型:线型图、柱状图、盒形图、散布图、等值线图等)、图例、标题、刻度标签以及其他注解型信息。
在pandas中,我们有多列数据,还有行和列标签。pandas自身就有内置的方法,用于简化从DataFrame和Series绘制图形。另一个库seaborn(https://seaborn.pydata.org/),由Michael Waskom创建的静态图形库。Seaborn简化了许多常见可视类型的创建。
In [194]: import matplotlib.pyplot as plt
In [195]: plt.figure() #创建一个幕布
Out[195]:
注:在ipython中要写成如下代码才会显示图像:
import matplotlib.pyplot as plt
plt.figure()
df.plot()
plt.show()
该Series对象的索引index会被传给matplotlib,并用以绘制X轴。可以通过use_index=False禁用该功能。
DataFrame的plot方法会在一个subplot中为各列绘制一条线,并自动创建图例(如图9-14所示):
In [200]: df = pd.DataFrame(np.random.rand(10,4).cumsum(0),columns=['A','B','C','D'],index=n
...: p.arange(0,100,10))
In [201]: df
Out[201]:
A B C D
0 0.685788 0.297513 0.651090 0.445658
10 1.557753 0.813536 1.338988 0.993341
20 2.433134 1.193840 1.840116 1.262187
30 3.356239 1.889819 2.715777 1.779696
40 3.831967 2.843798 3.342842 2.269416
50 4.688547 3.843593 3.982264 2.882778
60 5.546727 4.640172 4.857665 3.075847
70 6.233858 5.135775 5.343059 3.692292
80 6.597572 5.355745 5.709290 3.919498
90 7.127534 6.151853 5.731094 4.640323
In [202]: df.plot()
Out[202]:
In [203]: plt.show()
plot.bar()和plot.barh()分别绘制水平和垂直的柱状图。这时,Series和DataFrame的索引将会被用作X(bar)或Y(barh)刻度(如图9-15所示):
In [206]: fig,axes = plt.subplots(2,1)
In [207]: data = pd.Series(np.random.randn(16),index = list('abcdefghijklmnop'))
In [208]: data.plot.bar(ax=axes[0],color='k',alpha=0.7) #alpha为图表的填充不透明度
Out[208]:
In [209]: data.plot.barh(ax=axes[1],color='k',alpha=0.7)
Out[209]:
In [210]: plt.show()
注意:plt.subplots(2,1)可以一下产生2x1个子窗口(两行一列),并且以numpy数组的方式保存在axes中,而fig仍然是整个图像对象,这样我们可以通过对axes进行索引来访问每个子窗口。
color='k’和alpha=0.7设定了图形的颜色为黑色,并使用部分的填充透明度。对于DataFrame,柱状图会将每一行的值分为一组,并排显示:
In [211]: df = pd.DataFrame(np.random.rand(6, 4), index=['one', 'two', 'three', 'four', 'fiv
...: e', 'six'],columns=pd.Index(['A', 'B', 'C', 'D'], name='Genus'))
In [212]: df
Out[212]:
Genus A B C D
one 0.783765 0.973372 0.397304 0.668468
two 0.849481 0.883813 0.059709 0.620467
three 0.188168 0.387766 0.975336 0.781791
four 0.996210 0.188114 0.205050 0.492547
five 0.404493 0.192918 0.305952 0.436618
six 0.475883 0.312828 0.720343 0.637083
In [213]: df.plot.bar()
Out[213]:
In [215]: plt.show()
传入stacked=True产生堆叠效果:
In [216]: df.plot.bar(stacked=True)
Out[216]:
In [217]: plt.show()
seaborn的distplot方法绘制直方图和密度图更加简单,还可以同时画出直方图和连续密度估计图。作为例子,考虑一个双峰分布,由两个不同的标准正态分布组成
In [219]: import seaborn as sns
In [220]: comp1 = np.random.normal(0,1,size=200)
In [221]: comp2 = np.random.normal(10,2,size=200)
In [222]: values = pd.Series(np.concatenate([comp1,comp2]))
In [223]: sns.distplot(values,bins=100,color='k')
Out[223]:
In [224]: plt.show()
以下是别的资料中的介绍:
正如你所知道的,Seaborn是比Matplotlib更高级的免费库,特别地以数据可视化为目标,但他要比这一切更进一步:他解决了用Matplotlib的2个最大问题,正如Michael Waskom所说的:Matplotlib试着让简单的事情更加简单,困难的事情变得可能,那么Seaborn就是让困难的东西更加简单。
用Matplotlib最大的困难是其默认的各种参数,而Seaborn则完全避免了这一问题。
In [227]: tips = pd.read_csv("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/tips.csv ")
In [233]: tips
Out[233]:
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
5 25.29 4.71 Male No Sun Dinner 4
6 8.77 2.00 Male No Sun Dinner 2
7 26.88 3.12 Male No Sun Dinner 4
8 15.04 1.96 Male No Sun Dinner 2
9 14.78 3.23 Male No Sun Dinner 2
10 10.27 1.71 Male No Sun Dinner 2
11 35.26 5.00 Female No Sun Dinner 4
12 15.42 1.57 Male No Sun Dinner 2
13 18.43 3.00 Male No Sun Dinner 4
14 14.83 3.02 Female No Sun Dinner 2
15 21.58 3.92 Male No Sun Dinner 2
16 10.33 1.67 Female No Sun Dinner 3
17 16.29 3.71 Male No Sun Dinner 3
18 16.97 3.50 Female No Sun Dinner 3
19 20.65 3.35 Male No Sat Dinner 3
20 17.92 4.08 Male No Sat Dinner 2
21 20.29 2.75 Female No Sat Dinner 2
22 15.77 2.23 Female No Sat Dinner 2
23 39.42 7.58 Male No Sat Dinner 4
24 19.82 3.18 Male No Sat Dinner 2
25 17.81 2.34 Male No Sat Dinner 4
26 13.37 2.00 Male No Sat Dinner 2
27 12.69 2.00 Male No Sat Dinner 2
28 21.70 4.30 Male No Sat Dinner 2
29 19.65 3.00 Female No Sat Dinner 2
.. ... ... ... ... ... ... ...
214 28.17 6.50 Female Yes Sat Dinner 3
215 12.90 1.10 Female Yes Sat Dinner 2
216 28.15 3.00 Male Yes Sat Dinner 5
217 11.59 1.50 Male Yes Sat Dinner 2
218 7.74 1.44 Male Yes Sat Dinner 2
219 30.14 3.09 Female Yes Sat Dinner 4
220 12.16 2.20 Male Yes Fri Lunch 2
221 13.42 3.48 Female Yes Fri Lunch 2
222 8.58 1.92 Male Yes Fri Lunch 1
223 15.98 3.00 Female No Fri Lunch 3
224 13.42 1.58 Male Yes Fri Lunch 2
225 16.27 2.50 Female Yes Fri Lunch 2
226 10.09 2.00 Female Yes Fri Lunch 2
227 20.45 3.00 Male No Sat Dinner 4
228 13.28 2.72 Male No Sat Dinner 2
229 22.12 2.88 Female Yes Sat Dinner 2
230 24.01 2.00 Male Yes Sat Dinner 4
231 15.69 3.00 Male Yes Sat Dinner 3
232 11.61 3.39 Male No Sat Dinner 2
233 10.77 1.47 Male No Sat Dinner 2
234 15.53 3.00 Male Yes Sat Dinner 2
235 10.07 1.25 Male No Sat Dinner 2
236 12.60 1.00 Male Yes Sat Dinner 2
237 32.83 1.17 Male Yes Sat Dinner 2
238 35.83 4.67 Female No Sat Dinner 3
239 29.03 5.92 Male No Sat Dinner 3
240 27.18 2.00 Female Yes Sat Dinner 2
241 22.67 2.00 Male Yes Sat Dinner 2
242 17.82 1.75 Male No Sat Dinner 2
243 18.78 3.00 Female No Thur Dinner 2
[244 rows x 7 columns]
In [226]: fig,ax = plt.subplots()
In [229]: ax.violinplot(tips['total_bill'],vert=False)
Out[229]:
{'bodies': [],
'cmaxes': ,
'cmins': ,
'cbars': }
In [230]: plt.show()
In [231]: sns.violinplot(x ='total_bill',data=tips)
Out[231]:
In [232]: plt.show()
In [235]: iris = sns.load_dataset('iris')
In [236]: sns.swarmplot(x='species',y='petal_length',data=iris)
Out[236]:
In [237]: plt.show()