python绘制分布直方图

(一)绘制频数分布直方图

频数分布直方图使用matplotlib.pyplot下的hist方法
(1)hist(a,bins)方法(a数据列表,bins为组数)
(2)组数的计算方法:
a)组距:指每个小组的两个端点的距离
b)组数=极差/组距

题目:假设你爬取250部电影的播放时长(如a),统计出数据的分布状态(如100-120分钟出现的数量,或频率)
a=[131,98,125,131,124,139,131,117,128,108,135,138,131,102,107
,114,119,128,121,142,127,130,124,101,110,116,117,110,128,128,115,99,136,126,134,95,138,117,111,78,132,124,113,150,110,117,86,95,144,105,126,130,126,130,126,116,123,106,112,138,123,86,101,99,136,123,117,119,105,137,123,128,125,104,109,134,125,127,105,120,107,129,116,108,132,103,136,118,112,135,115,146,137,116,103,144,83,123,111,110,111,100,154,136,100,118,119,133,134,106,129,126,110,111,109,141,120,117,106,149,122,122,110,118,127,121,114,125,126,114,140,103,130,141,117,106,114,121,114,133,137,92,121,112,146,97,137,105,98,117,112,81,97,139,113,134,106,144,110,137,137,111,104,117,100,111,101,110,105,129,137,112,120,113,133,112,83,94,146,133,101,131,116,111,84,137,115,122,106,144,109,123,116,111,111,133,150]

代码如下

#绘制频数分布直方图
from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font = font_manager.FontProperties(fname=r'./shuxing.TTF')

a = [131,98,125,131,124,139,131,117,128,108,135,138,131,102,107,114,119,128,121,142,127,130,124,101,110,
     116,117,110,128,128,115,99,136,126,134,95,138,117,111,78,132,124,113,150,110,117,86,95,144,105,126,
     130,126,130,126,116,123,106,112,138,123,86,101,99,136,123,117,119,105,137,123,128,125,104,109,134,
     125,127,105,120,107,129,116,108,132,103,136,118,112,135,115,146,137,116,103,144,83,123,111,110,111,
     100,154,136,100,118,119,133,134,106,129,126,110,111,109,141,120,117,106,149,122,122,110,118,127,121,114,
     125,126,114,140,103,130,141,117,106,114,121,114,133,137,92,121,112,146,97,137,105,98,117,112,81,97,
     139,113,134,106,144,110,137,137,111,104,117,100,111,101,110,105,129,137,112,120,113,133,112,83,94,146,
     133,101,131,116,111,84,137,115,122,106,144,109,123,116,111,111,133,150]
#设置组距
d = 3
#组数
num_bin = range(min(a),max(a),d)
#print(max(a)-min(a),max(a),min(a))
plt.figure(figsize=(20,8),dpi=80)

plt.hist(a, num_bin)
plt.xticks(num_bin)
plt.xlabel('时长/分钟',fontproperties=my_font)
plt.ylabel('频数',fontproperties=my_font)
plt.title('250部电影播放时长频数分布直方图',fontproperties=my_font,size=20)
plt.grid(alpha=0.4)


plt.show()

效果展示python绘制分布直方图_第1张图片

(二)绘制频率分布直方图

与绘制频数分布直方图相同,都是使用hist方法,只是多调用其内的normed参数

代码如下

#绘制频数分布直方图
from matplotlib import pyplot as plt
from matplotlib import font_manager

my_font = font_manager.FontProperties(fname=r'./shuxing.TTF')

a = [131,98,125,131,124,139,131,117,128,108,135,138,131,102,107,114,119,128,121,142,127,130,124,101,110,
     116,117,110,128,128,115,99,136,126,134,95,138,117,111,78,132,124,113,150,110,117,86,95,144,105,126,
     130,126,130,126,116,123,106,112,138,123,86,101,99,136,123,117,119,105,137,123,128,125,104,109,134,
     125,127,105,120,107,129,116,108,132,103,136,118,112,135,115,146,137,116,103,144,83,123,111,110,111,
     100,154,136,100,118,119,133,134,106,129,126,110,111,109,141,120,117,106,149,122,122,110,118,127,121,114,
     125,126,114,140,103,130,141,117,106,114,121,114,133,137,92,121,112,146,97,137,105,98,117,112,81,97,
     139,113,134,106,144,110,137,137,111,104,117,100,111,101,110,105,129,137,112,120,113,133,112,83,94,146,
     133,101,131,116,111,84,137,115,122,106,144,109,123,116,111,111,133,150]
#设置组距
d = 3
#组数
num_bin = range(min(a),max(a),d)
#print(max(a)-min(a),max(a),min(a))
plt.figure(figsize=(20,8),dpi=80)

plt.hist(a, num_bin,normed=1)
plt.xticks(num_bin)
plt.xlabel('时长/分钟',fontproperties=my_font)
plt.ylabel('频数',fontproperties=my_font)
plt.title('250部电影播放时长频数分布直方图',fontproperties=my_font,size=20)
plt.grid(alpha=0.4)


plt.show()

效果展示
python绘制分布直方图_第2张图片
数据分析
通过直方图我们能清晰的得知250部电影播放时长大部分落在100-140分钟之间,频数最大的是播放时长111-114分钟,大多数电影都能将播放时长把控在一个半小时左右。

你可能感兴趣的:(zqh随笔)