标普100案例分析 —— 带着Python玩金融(5)

本文将带着你使用Python对标普100数据进行简单的分析,你会学到:

  • NumPy数组及其运算
  • 布尔索引筛选数据
  • 散点图和直方图的绘制

标普100数据

标准普尔100指数用来衡量大公司的股票表现,它由多个行业的100家主要公司构成。2017年标普100在各行业的比例如下图所示。

本文将要分析的数据如下表所示,它由四列数据构成,分别是公司名(Name),行业(Sector),股价(Price)和每股盈余(EPS)。

我们将这四列数据分别存储在四个Python列表中。

names = ['Apple Inc', 'Abbvie Inc', 'Abbott Laboratories', 'Accenture Plc', 'Allergan Plc', 'American International Group', 'Allstate Corp', 'Amgen', 'Amazon.Com Inc.', 'American Express Company', 'Boeing Company', 'Bank of America Corp', 'Biogen Inc', 'Bank of New York Mellon Corp', 'Blackrock', 'Bristol-Myers Squibb Company', 'Berkshire Hath Hld B', 'Citigroup Inc', 'Caterpillar Inc', 'Celgene Corp', 'Charter Communicatio', 'Colgate-Palmolive Company', 'Comcast Corp A', 'Capital One Financial Corp', 'Conocophillips', 'Costco Wholesale', 'Cisco Systems Inc', 'CVS Corp', 'Chevron Corp', 'Danaher Corp', 'Walt Disney Company', 'Duke Energy Corp', 'Dowdupont Inc.', 'Emerson Electric Company', 'Exelon Corp', 'Ford Motor Company', 'Facebook Inc', 'Fedex Corp', '21st Centry Fox Class B', '21st Centry Fox Class A', 'General Dynamics Corp', 'General Electric Company', 'Gilead Sciences Inc', 'General Motors Company', 'Alphabet Class C', 'Alphabet Class A', 'Goldman Sachs Group', 'Halliburton Company', 'Home Depot', 'Honeywell International Inc', 'International Business Machines', 'Intel Corp', 'Johnson & Johnson', 'JP Morgan Chase & Co', 'Kraft Heinz Co', 'Kinder Morgan', 'Coca-Cola Company', 'Eli Lilly and Company', 'Lockheed Martin Corp', "Lowe's Companies", 'Mastercard Inc', "McDonald's Corp", 'Mondelez Intl Cmn A', 'Medtronic Inc', 'Metlife Inc', '3M Company', 'Altria Group', 'Monsanto Company', 'Merck & Company', 'Morgan Stanley', 'Microsoft Corp', 'Nextera Energy', 'Nike Inc', 'Oracle Corp', 'Occidental Petroleum Corp', 'Priceline Group', 'Pepsico Inc', 'Pfizer Inc', 'Procter & Gamble Company', 'Philip Morris International Inc', 'Paypal Holdings', 'Qualcomm Inc', 'Raytheon Company', 'Starbucks Corp', 'Schlumberger N.V.', 'Southern Company', 'Simon Property Group', 'AT&T Inc', 'Target Corp', 'Time Warner Inc', 'Texas Instruments', 'Unitedhealth Group Inc', 'Union Pacific Corp', 'United Parcel Service', 'U.S. Bancorp', 'United Technologies Corp', 'Visa Inc', 'Verizon Communications Inc', 'Walgreens Boots Alliance', 'Wells Fargo & Company', 'Wal-Mart Stores', 'Exxon Mobil Corp']
prices = [170.12, 93.29, 55.28, 145.3, 171.81, 59.5, 100.5, 168.93, 1126.82, 93.92, 265.04, 26.7, 311.92, 52.73, 474.05, 60.48, 181.27, 71.87, 137.37, 102.88, 346.2, 72.16, 36.13, 88.26, 49.89, 171.22, 36.38, 70.18, 114.84, 93.45, 103.02, 88.61, 71.12, 60.14, 41.32, 12.11, 179.14, 217.75, 30.42, 31.14, 198.7, 17.91, 71.63, 44.74, 1018.48, 1034.09, 238.05, 41.57, 170.13, 148.04, 151.4, 44.88, 138.54, 98.58, 80.59, 17.04, 45.6, 82.97, 312.93, 81.43, 149.93, 167.01, 42.49, 79.52, 51.85, 232.49, 66.51, 118.19, 53.74, 49.06, 82.49, 155.7, 59.46, 48.97, 68.17, 1762.23, 115.5, 35.38, 88.33, 103.35, 76.55, 66.83, 184.22, 56.83, 61.53, 51.12, 159.25, 34.59, 57.77, 88.62, 98.59, 209.75, 115.58, 113.2, 51.88, 117.05, 110.27, 45.85, 70.25, 54.02, 96.08, 80.31]
earnings = [9.2, 5.31, 2.41, 5.91, 15.42, 2.51, 6.79, 12.58, 3.94, 5.22, 9.75, 1.75, 21.59, 3.47, 21.55, 2.96, 6.29, 5.19, 5.55, 6.4, 1.61, 2.87, 2.02, 7.58, 0.02, 5.82, 2.17, 5.71, 3.57, 3.89, 5.7, 4.45, 3.66, 2.58, 2.48, 1.68, 5.19, 11.91, 1.92, 1.92, 10.07, 1.24, 9.58, 6.19, 29.87, 29.87, 19.2, 0.73, 6.96, 6.95, 13.66, 3.18, 7.14, 6.94, 3.56, 0.65, 1.89, 4.09, 12.72, 4.34, 4.31, 6.4, 2.05, 4.69, 5.2, 8.95, 3.16, 5.53, 3.89, 3.61, 3.38, 6.67, 2.35, 2.55, 0.35, 74.45, 5.12, 2.5, 3.98, 4.49, 1.4, 3.78, 7.56, 2.07, 1.29, 2.75, 6.05, 2.93, 4.93, 6.06, 4.06, 9.6, 5.66, 5.98, 3.37, 6.62, 3.48, 3.75, 5.1, 4.14, 4.36, 3.56]
sectors = ['Information Technology', 'Health Care', 'Health Care', 'Information Technology', 'Health Care', 'Financials', 'Financials', 'Health Care', 'Consumer Discretionary', 'Financials', 'Industrials', 'Financials', 'Health Care', 'Financials', 'Financials', 'Health Care', 'Financials', 'Financials', 'Industrials', 'Health Care', 'Consumer Discretionary', 'Consumer Staples', 'Consumer Discretionary', 'Financials', 'Energy', 'Consumer Staples', 'Information Technology', 'Consumer Staples', 'Energy', 'Health Care', 'Consumer Discretionary', 'Utilities', 'Materials', 'Industrials', 'Utilities', 'Consumer Discretionary', 'Information Technology', 'Industrials', 'Consumer Discretionary', 'Consumer Discretionary', 'Industrials', 'Industrials', 'Health Care', 'Consumer Discretionary', 'Information Technology', 'Information Technology', 'Financials', 'Energy', 'Consumer Discretionary', 'Industrials', 'Information Technology', 'Information Technology', 'Health Care', 'Financials', 'Consumer Staples', 'Energy', 'Consumer Staples', 'Health Care', 'Industrials', 'Consumer Discretionary', 'Information Technology', 'Consumer Discretionary', 'Consumer Staples', 'Health Care', 'Financials', 'Industrials', 'Consumer Staples', 'Materials', 'Health Care', 'Financials', 'Information Technology', 'Utilities', 'Consumer Discretionary', 'Information Technology', 'Energy', 'Consumer Discretionary', 'Consumer Staples', 'Health Care', 'Consumer Staples', 'Consumer Staples', 'Information Technology', 'Information Technology', 'Industrials', 'Consumer Discretionary', 'Energy', 'Utilities', 'Real Estate', 'Telecommunications', 'Consumer Discretionary', 'Consumer Discretionary', 'Information Technology', 'Health Care', 'Industrials', 'Industrials', 'Financials', 'Industrials', 'Information Technology', 'Telecommunications', 'Consumer Staples', 'Financials', 'Consumer Staples', 'Energy']

先来用切片的方法观察下数据。比如查看前四家公司的名称。

print(names[:4])
['Apple Inc', 'Abbvie Inc', 'Abbott Laboratories', 'Accenture Plc']

或者输出最后一家公司的所有信息。

print("公司名:", names[-1])
print("股价:", prices[-1])
print("每股盈余:", earnings[-1])
print("行业:", sectors[-1])
公司名: Exxon Mobil Corp
股价: 80.31
每股盈余: 3.56
行业: Energy

计算市盈率

市盈率(Price to Earnings ratio),也称股价收益比率,由股价除以每年度每股盈余(EPS)得到,它是用来衡量股价水平是否合理的指标之一。

为了方便计算市盈率,我们首先将数据从Python列表类型转换为NumPy数组。

numpy.array() 函数创建numpy数组。

# 导入科学计算包NumPy
import numpy as np

# 将列表转换成numpy数组
names = np.array(names)
prices = np.array(prices)
earnings = np.array(earnings)
sectors = np.array(sectors)

NumPy数组的优势是它可以直接对数组进行运算,而这一点Python列表是做不到的。比如计算市盈率 pe,我们可以直接将数组 prices 除以数组 earnings

# 计算市盈率(P/E)
pe = prices / earnings

# 输出市盈率的前5个值
print(pe[:5])
[ 18.49130435  17.56873823  22.93775934  24.58544839  11.14202335]

接下来我们就具体行业来进行分析,比如对于IT行业,我们首先需要筛选出哪些公司属于这一行业。

这里需要使用布尔型索引。比如在数组 numbers 中找到大于3的数,首先使用 numbers > 3 来得到一个只含有 True 和 False的布尔数组。
numbers = np.array([1,2,3,4,5])
boolean_array = (numbers > 3)
print(boolean_array)
输出: [False False False True True]
然后利用这一布尔数组,筛选出 True 对应的元素,就可以得到大于3的数了。
large_number = numbers[boolean_array]
print(large_number)
输出: [4 5]

# 创建IT行业的布尔数组
boolean_array = (sectors == 'Information Technology')

# 选取IT行业的子集数据
it_names = names[boolean_array]
it_pe = pe[boolean_array]

# 输出IT行业的公司名和市盈率
print(it_names)
print(it_pe)
['Apple Inc' 'Accenture Plc' 'Cisco Systems Inc' 'Facebook Inc'
 'Alphabet Class C' 'Alphabet Class A' 'International Business Machines'
 'Intel Corp' 'Mastercard Inc' 'Microsoft Corp' 'Oracle Corp'
 'Paypal Holdings' 'Qualcomm Inc' 'Texas Instruments' 'Visa Inc']
[ 18.49130435  24.58544839  16.76497696  34.51637765  34.09708738
  34.6196853   11.08345534  14.11320755  34.78654292  24.40532544
  19.20392157  54.67857143  17.67989418  24.28325123  31.68678161]

用同样的方法,筛选出必需消费品行业的公司和市盈率。

# 创建必需消费品(CS)行业的布尔数组
boolean_array = (sectors == 'Consumer Staples')

# 选取CS行业的子集数据
cs_names = names[boolean_array]
cs_pe = pe[boolean_array]

# 输出CS行业的公司名和市盈率
print(cs_names)
print(cs_pe)
['Colgate-Palmolive Company' 'Costco Wholesale' 'CVS Corp' 'Kraft Heinz Co'
 'Coca-Cola Company' 'Mondelez Intl Cmn A' 'Altria Group' 'Pepsico Inc'
 'Procter & Gamble Company' 'Philip Morris International Inc'
 'Walgreens Boots Alliance' 'Wal-Mart Stores']
[ 25.14285714  29.41924399  12.29071804  22.63764045  24.12698413
  20.72682927  21.04746835  22.55859375  22.19346734  23.01781737
  13.7745098   22.03669725]

筛选出IT和必需消费品行业的数据后,我们来计算这两个行业市盈率的均值和标准差。

numpy.mean(array) 函数计算数组array的均值。
numpy.std(array) 函数计算数组array的标准差。

# 计算IT行业市盈率的均值和标准差
it_pe_mean = np.mean(it_pe)
it_pe_std = np.std(it_pe)

print("IT行业市盈率的均值:", it_pe_mean)
print("IT行业市盈率的标准差:", it_pe_std)
IT行业市盈率的均值: 26.3330554204
IT行业市盈率的标准差: 10.8661467927
# 计算必需消费品行业市盈率的均值和标准差
cs_pe_mean = np.mean(cs_pe)
cs_pe_std = np.std(cs_pe)

print("必需消费品行业市盈率的均值:", cs_pe_mean)
print("必需消费品行业市盈率的标准差:", cs_pe_std)
必需消费品行业市盈率的均值: 21.5810689064
必需消费品行业市盈率的标准差: 4.41202165427

绘图

首先用散点图来观察这两个行业中每一家公司的市盈率。这里使用Python中常用的绘图工具包 matplotlib

matplotlib.pyplot.scatter() 函数绘制散点图。

# 导入 matplotlib.pyplot 模块
import matplotlib.pyplot as plt 

# 设置公司id
it_id = np.arange(len(it_pe))
cs_id = np.arange(len(cs_pe))

# 绘制市盈率的散点图
plt.scatter(it_id, it_pe, color='red', label='IT')
plt.scatter(cs_id, cs_pe, color='green', label='CS')

# 增加图例
plt.legend()

# 增加坐标轴标签
plt.xlabel('Company ID')
plt.ylabel('P/E Ratio')

# 输出图
plt.show()

我们注意到,上图的右上角有一IT公司的市盈率特别高。若某股票的市盈率高于同类股票,往往意味着该股有较高的增长预期。所以让我们进一步来观察IT行业的市盈率分布,在这里直方图可以用来查看数据的分布情况。

matplotlib.pyplot.hist() 函数绘制直方图。

# 绘制IT行业市盈率的直方图,将数值分成8个区间
plt.hist(it_pe, bins=8)

# 增加坐标轴标签
plt.xlabel('P/E ratio')
plt.ylabel('Frequency')

# 输出图
plt.show()

现在可以更直观的看到在直方图的右侧有一离群值,它具有很高的市盈率。我们可以使用布尔索引找到这家市盈率很高的公司。

# 找出市盈率大于50的值
outlier_price = it_pe[it_pe > 50]

# 找出市盈率大于50的公司
outlier_name = it_names[it_pe > 50]

# 输出结果, round()函数用于四舍五入
print(str(outlier_name[0]) + " 公司的市盈率是" + str(round(outlier_price[0],2)))
Paypal Holdings 公司的市盈率是54.68

注:本文是 DataCamp 课程 Intro to Python for Finance 的学习笔记。

你可能感兴趣的:(标普100案例分析 —— 带着Python玩金融(5))