4.数据可视化:Visualing earnings based on college majors.(2010-2012)

The dataset is stored in recent-grads.csv file.It contains information on earnings of college majors in US from 2010 to 2012.

It can be download form here:https://github.com/fivethirtyeight/data/tree/master/college-majors

In this project,I will explore the dataset and try to find some patterns in the earning of majors then plot it use matplotlib library.

代码使用jupyter完成:
读取数据:

import pandas as pd

recent_grads=pd.read_csv('./data/recent-grads.csv')
recent_grads.columns
print(recent_grads.info())
print(recent_grads.describe())
print(recent_grads.head(1))

处理缺失值:

raw_data_count=recent_grads.shape[0]
print(raw_data_count)
cleaned_data_count=recent_grads.dropna().shape[0]
print(cleaned_data_count)

==>>173
172
绘制散点图,查看各属性之间的关系:

import matplotlib.pyplot as plt
%matplotlib inline

recent_grads.plot(x='Full_time',y='Median',kind='scatter')
recent_grads.plot(x='Unemployed',y='Median',kind='scatter')
recent_grads.plot(x='Men',y='Median',kind='scatter')
recent_grads.plot(x='Women',y='Median',kind='scatter')

得到


4.数据可视化:Visualing earnings based on college majors.(2010-2012)_第1张图片

我们继续绘制柱状图,查看各属性的分布情况:

columns=['Median','Employed','Employed','Unemployment_rate','Women','Men']
['Men'].hist()
fig=plt.figure(figsize=(6,18))
for i,col in enumerate(columns):
    ax=fig.add_subplot(6,1,i+1)
    ax=recent_grads[col].hist(color='orange')
plt.show()
4.数据可视化:Visualing earnings based on college majors.(2010-2012)_第2张图片

为了更方便的查看就业人数与薪资的关系,使用scatter_matrix函数来构建散点图矩阵:

from pandas.tools.plotting import scatter_matrix
scatter_matrix(recent_grads[['Employed','Median']],figsize=(10,10),c=['red','blue'])
4.数据可视化:Visualing earnings based on college majors.(2010-2012)_第3张图片

关于该矩阵的说明:

4.数据可视化:Visualing earnings based on college majors.(2010-2012)_第4张图片

接下来不妨做些有意思的事情,分析一下薪资前10以及后10的专业中女生所占比例:

recent_grads[:10].plot.bar(x='Major',y='ShareWomen')
plt.legend(loc='upper left')
plt.title('The 10 highest paying majors.')
recent_grads[162:].plot(x='Major',y='ShareWomen',kind='bar')
plt.title('The 10 lowest paying majors.')
4.数据可视化:Visualing earnings based on college majors.(2010-2012)_第5张图片
4.数据可视化:Visualing earnings based on college majors.(2010-2012)_第6张图片

分析薪资较高的专业中的男女性别比例:

recent_grads[:10].plot.bar(x='Major',y=['Men','Women'])
4.数据可视化:Visualing earnings based on college majors.(2010-2012)_第7张图片

你可能感兴趣的:(4.数据可视化:Visualing earnings based on college majors.(2010-2012))