数据科学技术练习 4-数据可视化

思考与练习1

1. 2012—2020年我国人均可支配收入为[1.47, 1.62, 1.78, 1.94, 2.38, 2.60,2.82, 3.07, 3.21](单位:万元)单 位:万元)。按照要求绘制以下图形。

1)模仿例4-1和4-3,绘制人均可支配收入折线图。用小矩形标记数据点,红色虚线,用注解标注最高点, 图标题“Income chart”,设置坐标轴标题,最后将图形保存为JPG文件。一维数组访问。

2)模仿例4-2,使用多个子图分别绘制人均可支配收入的折线图、箱须图以及柱状图。

【提示:】

1)创建3个子图分别使用(2,2,1)、(2,2,2)和(2,1,2)作为参数。

2)使用plt.subplots_adjust()函数调整子图间距离,以便添加图标题。

import matplotlib.pyplot as plt
import pandas as pd
from pandas import DataFrame

#1)
income = [1.47,1.62,1.78,1.94,2.38,2.60,2.82,3.07,3.21]
data = DataFrame({'Income':income},index = ['2012','2013','2014','2015','2016','2017','2018','2019','2020'])
plt.plot(marker = 's', linestyle = ':', color = 'r')
plt.annotate('Largest!',xy = (9,98.3), xytext = (7,96), arrowprops = dict(arrowstyle = '->'), fontsize = 16)
plt.title('Income chart')
plt.xlabel('Year',fontsize = 9)
plt.ylabel('Income(RMB Ten Thousand)',fontsize = 8)
plt.savefig('2012-2020人均可支配收入.jpg')
data.plot()
plt.show()

#2)
fig = plt.figure(figsize = (9,6))
ax1 = fig.add_subplot(2,2,1)
ax1.plot(data)
ax2 = fig.add_subplot(2,2,2)
data.plot(kind = 'box',fontsize = 'small',xticks = [],ax = ax2)
ax3 = fig.add_subplot(2,1,2)
data.plot(kind = 'bar',use_index = True,ax = ax3)

思考与练习2

思考与练习 1. 数据文件high-speed rail.csv存放着世界各国高速铁路的情况

1)各国运营里程对比柱状图, 标注China为“Longest”

2)各国运营里程现状和发展堆叠柱状图

3)各国运营里程占比饼图,China扇形离开中心点

【提示】: 从文件中读取数据时,使用第一列数据作为index data = pd.read_csv(‘High-speed rail.csv’, index_col =‘Country’) ,获取中国对应的数据行,使用data ['China’]

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame
data = pd.read_csv('D:/20222023第一学期/数据科学技术/High-speed rail.csv', index_col = 'Country')
plt.show()

#1)
data['Operation'].plot(kind = 'bar', title = 'Operation Mileage')
plt.annotate('Largest!', xy = (0,20000), xytext = (1,20000), arrowprops = dict(arrowstyle = '->', color = 'r'), fontsize = 16, color = 'r')

#2)
data.plot(kind = 'barh', stacked = True, title = 'Globe trends of high-speed rail')

#3)
data['Operation'].plot(kind = 'pie', title = 'Operation Mileage', autopct = '%1.1f%%', shadow = True, explode = [0.2,0,0,0,0,0])

课后作业

文件bankpep.csv存放着银行储户的基本信息

请通过绘图对这些客户数据进行探索性分析。

1)客户年龄分布的直方图和密度图

2)客户年龄和收入关系的散点图

3)绘制散点图观察账户(年龄,收入,孩子数)之间的关系,对角线显示直方图

4)按区域展示平均收入的柱状图,并显示标准差

5)多子图绘制:账户中性别占比饼图,有车的性别占比饼图,按孩子数的账户占比饼图

6)各性别收入的箱须图

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas import DataFrame
stData = pd.read_csv('D:/20222023第一学期/数据科学技术/bankpep.csv')
plt.show()

#1)
stData['age'].plot(kind = 'hist', bins = 9, title = 'Customer Age')
stData['age'].plot(kind = 'kde', title = 'Customer Age', xlim = [0,80], style = 'k--')

#2)
stData.plot(kind = 'scatter', x = 'age', y = 'income' , title = 'Customer Income', xlim = [0,80], ylim = [0,65000], label = '(age,income')

#3)
data = stData[['age','income','children']]
pd.plotting.scatter_matrix(data, diagonal = 'hist', color = 'k')

#4)
mean = stData.groupby('region')['income'].mean()
std = stData.groupby('region')['income'].std()
fig = plt.figure(figsize = (6,6)) #设置图片大小
mean.plot(kind = 'bar', title = 'Customer Income', color = 'r', yerr = std)

#5)
sex_data = stData.groupby(['sex'])['sex'].count()
car_data = stData[stData['car'] =='YES'].groupby(['sex'])['sex'].count()
children_data = stData.groupby(['children'])['children'].count()
fig = plt.figure(figsize = (6,6))
ax1 = fig.add_subplot(2,2,1)
sex_data.plot(kind = 'pie', title = 'Customer Sex', autopct = '%1.1f%%', ax = ax1)
ax2 = fig.add_subplot(2,2,2)
car_data.plot(kind = 'pie', title = 'Customer Car Sex', autopct = '%1.1f%%', ax = ax2)
ax3 = fig.add_subplot(2,2,3)
children_data.plot(kind = 'pie', title = 'Customer Children', autopct = '%1.1f%%', ax = ax3)

#6)
Data = stData[['income','sex']]
Data.plot(kind = 'box', by = 'sex', figsize = (6,6), title = 'Boxplot grouped by sex income')
plt.grid()

你可能感兴趣的:(numpy,python,开发语言,pandas)