Python作图简介

设置作图方式
工具简介
- Matplotlib
- Seaborn
基本用法
- 设置坐标轴
  - 调整名字和间隔
- Legend 图例
  - 添加图例
  - 调整位置和名称
使用python作图进行基本的数据探索
- 散点图 scatter
- 分组绘图 FacetGrid
- 箱线图 boxplot
- 计数条形图 countplot
- 直方图和分布密度图 histogram
  density
- 饼图
- 热力图

设置作图方式

如果你使用的IDE是spyder，可以通过如图的位置进行设置，设置完成后重启spyder以使设置生效

image

image

或者你可以通过运行下面的命令进行快速设置

#在console中输出图片
%matplotlib inline

#在单独窗口中输出图片
%matplotlib qt5

工具简介

下面我会基于两个工具来介绍python的基本作图。

Matplotlib

Matplotlib 是一个非常强大的 Python 画图工具;

它能帮你画出美丽的:

线图;
散点图;
等高线图;
条形图;
柱状图;
3D 图形,
甚至是图形动画等等.

Seaborn

Seaborn其实是在matplotlib的基础上进行了更高级的API封装，从而使得作图更加容易，
在大多数情况下使用seaborn就能做出很具有吸引力的图，而使用matplotlib能制作具有更多特色的图。
应该把Seaborn视为matplotlib的补充，而不是替代物。

基本用法

import matplotlib.pyplot as plt  # 使用import导入模块matplotlib.pyplot，并简写成plt  
import numpy as np  # 使用import导入模块numpy，并简写成np

x = np.linspace(-1, 1, 50)  # 使用np.linspace定义x：范围是(-1,1);个数是50.   
y = 2*x + 1  # 仿真一维数据组(x ,y)表示曲线1.

plt.figure()  # 使用plt.figure定义一个图像窗口. 
plt.plot(x, y)  # 使用plt.plot画(x ,y)曲线.
plt.title('fig1')  # 设置标题
plt.show()  #  使用plt.show显示图像.

image

设置坐标轴

调整名字和间隔

x = np.linspace(-3, 3, 50)  # 使用np.linspace定义x：范围是(-3,3); 个数是50.   
y1 = 2*x + 1  # 仿真一维数据组(x ,y1)表示曲线1.   
y2 = x**2  # 仿真一维数据组(x ,y2)表示曲线2.

plt.figure()  # 使用plt.figure定义一个图像窗口. 
plt.plot(x, y2)  # 使用plt.plot画(x ,y2)曲线.
# 使用plt.plot画(x ,y1)曲线，曲线的颜色属性(color)为红色;
# 曲线的宽度(linewidth)为1.0；  
# 曲线的类型(linestyle)为虚线.
plt.plot(x, y1, color='red', linewidth=1.0, linestyle='--')  

plt.xlim((-1, 2))  # 使用plt.xlim设置x坐标轴范围：(-1, 2)；
plt.ylim((-2, 3))  # 使用plt.ylim设置y坐标轴范围：(-2, 3)； 
plt.xlabel('I am x')  # 使用plt.xlabel设置x坐标轴名称：’I am x’；
plt.ylabel('I am y')  # 使用plt.ylabel设置y坐标轴名称：’I am y’；
plt.show()  #

image

Legend 图例

添加图例

matplotlib 中的 legend 图例就是为了帮我们展示出每个数据对应的图像名称.
更好的让读者认识到你的数据结构.

首先我们设置两条线的类型等信息（蓝色实线与红色虚线).

# set line syles
l1, = plt.plot(x, y1, label='linear line')
l2, = plt.plot(x, y2, color='red', linewidth=1.0, linestyle='--', label='square line')

# legend将要显示的信息来自于上面代码中的 label. 所以我们只需要简单写下一下代码, plt 就能自动的为我们添加图例.
plt.legend(loc='upper right')
plt.show()

image

参数 loc='upper right' 表示图例将添加在图中的右上角.

调整位置和名称

如果我们想单独修改之前的 label 信息, 给不同类型的线条设置图例信息.
我们可以在 plt.legend 输入更多参数. 如果以下面这种形式添加 legend,
我们需要确保, 在上面的代码 plt.plot(x, y2, label='linear line') 和
plt.plot(x, y1, label='square line') 中有用变量 l1 和 l2 分别存储起来.
而且需要注意的是 l1, l2,要以逗号结尾, 因为plt.plot() 返回的是一个列表.

l1, = plt.plot(x, y1, label='linear line')
l2, = plt.plot(x, y2, color='red', linewidth=1.0, linestyle='--', label='square line')
plt.legend(handles=[l1, l2], labels=['up', 'down'],  loc='best')
plt.show()

image

这样我们就能分别重新设置线条对应的 label 了.

最后我们得到带有图例信息的图片.

其中'loc'参数有多种，'best'表示自动分配最佳位置，其余的如下：

'best' : 0,
'upper right' : 1,
'upper left' : 2,
'lower left' : 3,
'lower right' : 4,
'right' : 5,
'center left' : 6,
'center right' : 7,
'lower center' : 8,
'upper center' : 9,
'center' : 10

使用python作图进行基本的数据探索

from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

我们使用diamonds 数据集进行数据探索

diamonds=pd.read_csv('https://raw.githubusercontent.com/huangchaosp/hc-blog-attachment/master/data/diamonds.csv')

通过head() 和 shape 查看数据的结构

diamonds.head()

	carat	cut	color	clarity	depth	table	price	x	y	z
0	0.23	Ideal	E	SI2	61.5	55	326	3.95	3.98	2.43
1	0.21	Premium	E	SI1	59.8	61	326	3.89	3.84	2.31
2	0.23	Good	E	VS1	56.9	65	327	4.05	4.07	2.31
3	0.29	Premium	I	VS2	62.4	58	334	4.2	4.23	2.63
4	0.31	Good	J	SI2	63.3	58	335	4.34	4.35	2.75

diamonds.shape

(53940, 10)

使用sample() 随机抽取样本数据

ds=diamonds.sample(frac=0.1) #随机抽取10%的数据
ds.shape

(5394, 10)

散点图 scatter

plt.scatter(ds.carat,ds.price)
plt.show()

image

根据散点图，我们猜测钻石价格与克拉数是指数型关系
下面我们来验证一下我们的猜测

plt.scatter(np.log(ds.carat),np.log(ds.price)) 
plt.show()

image

取了log之后，发现log(price)与log(carat)是线性关系

分组绘图 FacetGrid

如何在一幅图上体现多个维度：比如通过颜色

g = sns.FacetGrid(ds, hue='color', size=7.5)
g.map(plt.scatter, 'carat', 'price').add_legend()
plt.show()

image

箱线图 boxplot

使用箱线图查看不同色泽的钻石价格的分布情况

sns.boxplot(x="color", y="price", data=ds) 
plt.show()

image

计数条形图 countplot

通过countplot 我可以看出那个色泽的钻石个数最多

sns.countplot(x='color',data=ds)
plt.show()

image

直方图和分布密度图 histogram density

通过distplot 可以看出不同重量的钻石的分布情况

# 直方图
sns.distplot(ds.carat, kde=False)
plt.show()

image

# 密度图
sns.distplot(ds.carat, kde=True)
plt.show()

image

# 分组查看
g=sns.FacetGrid(data=ds,col='color',col_wrap=3)   # 这里相当于groupby
g=g.map(sns.distplot,'carat')
plt.show()

image

饼图

# The slices will be ordered and plotted counter-clockwise.
labels = 'Frogs', 'Hogs', 'Dogs', 'Logs' # 定义标签
sizes = [15, 30, 45, 10] # 每一块的比例
colors = ['yellowgreen', 'gold', 'lightskyblue', 'lightcoral'] # 每一块的颜色
explode = (0, 0.1, 0, 0) # 突出显示，这里仅仅突出显示第二块（即 'Hogs' ）
plt.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.1f%%',
shadow=True, startangle=90)
plt.axis('equal') # 显示为圆（避免比例压缩为椭圆）
plt.show()

image

热力图

flights = sns.load_dataset('flights')
flights.head()

	year	month	passengers
0	1949	January	112
1	1949	February	118
2	1949	March	132
3	1949	April	129
4	1949	May	121

# pivot() 可以将dataframe转换为行列式矩阵 并指定每个元素的存储值
flights = flights.pivot(index='month', columns='year',  values='passengers')
flights.head()

year	1949	1950	1951	1952	1953	1954	1955	1956	1957	1958	1959	1960
month
January	112	115	145	171	196	204	242	284	315	340	360	417
February	118	126	150	180	196	188	233	277	301	318	342	391
March	132	141	178	193	236	235	267	317	356	362	406	419
April	129	135	163	181	235	227	269	313	348	348	396	461
May	121	125	172	183	229	234	270	318	355	363	420	472

plt.figure(figsize=(10,6))
sns.heatmap(flights, fmt='d', linewidths=.5)
# fmt设置字体模式  linewidth设置每个小方格的间距 线宽
plt.show()

image

Python作图简介

目录

设置作图方式

工具简介

Matplotlib

Seaborn

基本用法

设置坐标轴

调整名字和间隔

Legend 图例

添加图例

调整位置和名称

使用python作图进行基本的数据探索

散点图 scatter

分组绘图 FacetGrid

箱线图 boxplot

计数条形图 countplot

直方图和分布密度图 histogram density

饼图

热力图

你可能感兴趣的:(Python作图简介)