chardman

Python数据可视化Matplotlib学习

Matplotlib 是 Python 中最基本的可视化工具。类比一下人类和 Matplotlib 画图过程，人类画图需要三个步骤：

找画板
用调色板
画画

Matplotlib 模拟了类似过程，也分三步

FigureCanvas
Renderer
Artist

上面是 Matplotlib 里的三层 API：

FigureCanvas 帮你确定画图的地方
Renderer 帮你把想画的东西展示在屏幕上
Artist 帮你用 Renderer 在 Canvas 上画图

一般用户只需用 Artist 就能自由的在电脑上画图了。

import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline

注：%matplotlib inline 就是在 Jupyter notebook 里面内嵌画图的

# 可以将自己喜欢的颜色代码定义出来，然后再使用。
r_hex = '#dc2624'     # red,       RGB = 220,38,36
dt_hex = '#2b4750'    # dark teal, RGB = 43,71,80
tl_hex = '#45a0a2'    # teal,      RGB = 69,160,162
r1_hex = '#e87a59'    # red,       RGB = 232,122,89
tl1_hex = '#7dcaa9'   # teal,      RGB = 125,202,169
g_hex = '#649E7D'     # green,     RGB = 100,158,125
o_hex = '#dc8018'     # orange,    RGB = 220,128,24
tn_hex = '#C89F91'    # tan,       RGB = 200,159,145
g50_hex = '#6c6d6c'   # grey-50,   RGB = 108,109,108
bg_hex = '#4f6268'    # blue grey, RGB = 79,98,104
g25_hex = '#c7cccf'   # grey-25,   RGB = 199,204,207

1 Matplotlib基础介绍

1.1 概览

Matplotlib 包含两类元素：

基础 (primitives) 类：线 (line), 点 (marker), 文字 (text), 图例 (legend), 网格 (grid), 标题 (title), 图片 (image) 等。
容器 (containers) 类：图 (figure), 坐标系 (axes), 坐标轴 (axis) 和刻度 (tick)

基础类元素是我们想画出的标准对象，而容器类元素是基础类元素的寄居处，它们也有层级结构。

    图 → 坐标系 → 坐标轴 → 刻度

由上图看出：

图包含着坐标系 (多个)
坐标系由坐标轴组成 (横轴 xAxis 和纵轴 yAxis)
坐标轴上面有刻度 (主刻度 MajorTicks 和副刻度 MinorTicks)

Python里面“万物皆对象”，坐标系、坐标轴和刻度都是对象。

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
plt.show()

xax = ax.xaxis
yax = ax.yaxis

print( 'fig.axes:', fig.axes, '\n')
print( 'ax.xaxis:', xax )
print( 'ax.yaxis:', yax, '\n' )
print( 'ax.xaxis.majorTicks:', xax.majorTicks, '\n' )
print( 'ax.yaxis.majorTicks:', yax.majorTicks, '\n')
print( 'ax.xaxis.minorTicks:', xax.minorTicks )
print( 'ax.yaxis.minorTicks:', yax.minorTicks )

fig.axes: [] 

ax.xaxis: XAxis(54.0,36.0)
ax.yaxis: YAxis(54.0,36.0) 

ax.xaxis.majorTicks: [, , , , , ] 

ax.yaxis.majorTicks: [, , , , , ] 

ax.xaxis.minorTicks: []
ax.yaxis.minorTicks: []

坐标系和坐标轴指向同一个图 (侧面验证了图、坐标系和坐标轴的层级性)。

print( 'axes.figure:', ax.figure )
print( 'xaxis.figure:', xax.figure )
print( 'yaxis.figure:', yax.figure )

axes.figure: Figure(432x288)
xaxis.figure: Figure(432x288)
yaxis.figure: Figure(432x288)

创造完以上四个容器元素后，我们可在上面添加各种基础元素，比如：

在坐标轴和刻度上添加标签
在坐标系中添加线、点、网格、图例和文字
在图中添加图例

1.2 图

# 图是整个层级的顶部，在图中可以添加基本元素「文字」。
plt.figure()
plt.text( 0.5, 0.5, 'Figure', ha='center', 
          va='center', size=20, alpha=.5 )
plt.xticks([]), plt.yticks([])
plt.show()

用 plt.text() 函数，其参数解释如下：

第一、二个参数是指文字所处位置的横轴和纵轴坐标
第三个参数字符是指要显示的内容
ha, va 是横向和纵向位置，指文字与其所处“坐标”之间是左对齐、右对齐或居中对齐
size 设置字体大小
alpha 设置字体透明度 (0.5 是半透明)

# 在图中可以添加基本元素「折线」。
plt.figure()
plt.plot( [0,1],[0,1] )
plt.show()

当我们每次说画东西，看起来是在图 (Figure) 里面进行的，实际上是在坐标系 (Axes) 里面进行的。一幅图中可以有多个坐标系，因此在坐标系里画东西更方便 (有些设置使用起来也更灵活)。

1.3 坐标系与子图

一幅图 (Figure) 中可以有多个坐标系 (Axes)，那不是说一幅图中有多幅子图 (Subplot)，因此坐标系和子图是不是同样的概念？
在绝大多数情况下是的，两者有一点细微差别：

子图在母图中的网格结构一定是规则的
坐标系在母图中的网格结构可以是不规则的

子图

把图想成矩阵，那么子图就是矩阵中的元素，因此可像定义矩阵那样定义子图 - (行数、列数、第几个子图)。

subplot(rows, columns, i-th plots)

plt.subplot(2, 1, 1)
plt.xticks([])
plt.yticks([])
plt.text(0.5, 0.5, 'subplot(2, 1, 1)', ha='center', va='center', size=20, alpha=0.5)

plt.subplot(2, 1, 2)
plt.xticks([])
plt.yticks([])
plt.text(0.5, 0.5, 'subplot(2, 1, 2)', ha='center', va='center', size=20, alpha=0.5)

Text(0.5, 0.5, 'subplot(2, 1, 2)')

这两个子图类似于一个列向量

subplot(2,1,1) 是第一幅
subplot(2,1,2) 是第二幅

plt.subplot(1, 2, 1)
plt.xticks([])
plt.yticks([])
plt.text(0.5, 0.5, 'subplot(1, 2, 1)', ha='center', va='center', size=20, alpha=0.5)

plt.subplot(1, 2, 2)
plt.xticks([])
plt.yticks([])
plt.text(0.5, 0.5, 'subplot(1, 2, 2)', ha='center', va='center', size=20, alpha=0.5)

Text(0.5, 0.5, 'subplot(1, 2, 2)')

这两个子图类似于一个行向量

subplot(1,2,1) 是第一幅
subplot(1,2,2) 是第二幅

fig, axes = plt.subplots(nrows=2, ncols=2)

for i, ax in enumerate(axes.flat):
    ax.set(xticks=[], yticks=[])
    s = 'subplot(2, 2,' + str(i) + ')'
    ax.text(0.5, 0.5, s, ha='center', va='center', size=20, alpha=0.5)
plt.show()

这次我们用过坐标系来生成子图 (子图是坐标系的特例嘛)，第 1 行

fig, axes = plt.subplots(nrows=2, ncols=2)

得到的 axes 是一个 2×2 的对象。在第 3 行的 for 循环中用 axes.flat 将其打平，然后在每个 ax 上生成子图。

坐标系

坐标系比子图更通用，有两种生成方式

用 gridspec 包加上 subplot()
用 plt.axes()

不规则网格

import matplotlib.gridspec as gridspec
G = gridspec.GridSpec(3, 3) # 将整幅图分成 3×3 份赋值给 G

ax1 = plt.subplot(G[0, :])
plt.xticks([]), plt.yticks([])
plt.text(0.5, 0.5, 'Axes 1', ha='center', va='center', size=20, alpha=0.5)

ax2 = plt.subplot(G[1,:-1])
plt.xticks([]), plt.yticks([])
plt.text(0.5, 0.5, 'Axes 2', ha='center', va='center', size=20, alpha=0.5)

ax3 = plt.subplot(G[1:,-1])
plt.xticks([]), plt.yticks([])
plt.text(0.5, 0.5, 'Axes 3', ha='center', va='center', size=20, alpha=0.5)

ax4 = plt.subplot(G[-1, 0])
plt.xticks([]), plt.yticks([])
plt.text(0.5, 0.5, 'Axes 4', ha='center', va='center', size=20, alpha=0.5)

ax5 = plt.subplot(G[-1,1])
plt.xticks([]), plt.yticks([])
plt.text(0.5, 0.5, 'Axes 5', ha='center', va='center', size=20, alpha=0.5)

plt.show()

plt.subplot(G[]) 函数生成五个坐标系。G[] 里面的切片和 Numpy 数组用法一样：

G[0, :] = 图的第一行 (Axes 1)
G[1, :-1] = 图的第二行，第一二列 (Axes 2)
G[1:, -1] = 图的第二三行，第三列 (Axes 3)
G[-1, 0] = 图的第三行，第一列 (Axes 4)
G[-1, 1] = 图的第三行，第二列 (Axes 5)

大图套小图

plt.axes([0.1, 0.1, 0.8, 0.8])
plt.xticks([]), plt.yticks([])
plt.text(0.6, 0.6, 'axes([0.1,0.1,0.8,0.8])', ha='center', va='center', size=20, alpha=0.5)

plt.axes([0.2, 0.2, 0.3, 0.3])
plt.xticks([]), plt.yticks([])
plt.text(0.5, 0.5, 'axes([0.2,0.2,0.3,0.3])', ha='center', va='center', size=10, alpha=0.5)

plt.show()

plt.axes([l,b,w,h]) 函数，其中 [l, b, w, h] 可以定义坐标系

l 代表坐标系左边到 Figure 左边的水平距离
b 代表坐标系底边到 Figure 底边的垂直距离
w 代表坐标系的宽度
h 代表坐标系的高度

如果 l, b, w, h 都小于 1，那它们是标准化 (normalized) 后的距离。比如 Figure 底边长度为 10，坐标系底边到它的垂直距离是 2，那么 b = 2/10 = 0.2。

生成坐标系的2种方式

# 1.同时生成图和坐标系
fig, ax = plt.subplots()
plt.xticks([]), plt.yticks([])
s = 'Style 1\n\nfig,ax=plt.subplots()\nax,plot()'
ax.text(0.5, 0.5, s, ha='center', va='center',size=20,alpha=0.5)
plt.show()

# 2.先生成图，再添加坐标系
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.set(xticks=[],yticks=[])
s = 'Style 2\n\nfig=plt.figure()\nax=fig.add_subplot()\nax.plot()'
ax.text(0.5,0.5,s,ha='center',va='center',size=20,alpha=0.5)
plt.show()

1.4 坐标轴

一个坐标系 (Axes)，通常是二维，有两条坐标轴 (Axis)：

横轴：XAxis
纵轴：YAxis

每个坐标轴都包含两个元素

容器类元素「刻度」，该对象里还包含刻度本身和刻度标签
基础类元素「标签」，该对象包含的是坐标轴标签

「刻度」和「标签」都是对象。

r_hex

'#dc2624'

fig, ax = plt.subplots()
ax.set_xlabel('Label on x-axis')
ax.set_ylabel('Label on y-axis')

for label in ax.xaxis.get_ticklabels():
    # label is a text instance
    # 标签是一个文本对象
    label.set_color(dt_hex)
    label.set_rotation(45)
    label.set_fontsize(20)

for line in ax.yaxis.get_ticklines():
    # line is a line2D instance
    # 刻度是一个二维线段对象

    line.set_markersize(20)
    line.set_markeredgewidth(3)

plt.show()

第 2 和 3 行打印出 x 轴和 y 轴的标签。

第 5 到 9 行处理「刻度」对象里的刻度标签，将它颜色设定为深青色，字体大小为 20，旋转度 45 度。

第 11 到 15 行处理「标签」对象的刻度本身 (即一条短线)，标记长度和宽度为 20 和 3。

1.5 刻度

刻度 (Tick) 的核心内容就是

一条短线 (刻度本身)
一串字符 (刻度标签)

首先定义一个 setup(ax) 函数，主要功能有

去除左纵轴 (y 轴)、右纵轴和上横轴
去除 y 轴上的刻度
将 x 轴上的刻度位置定在轴底
设置主刻度和副刻度的长度和宽度
设置 x 轴和 y 轴的边界
将图中 patch 设成完全透明

将上面效果全部合并，这个 setup(ax) 就是把坐标系里所有元素都去掉，只留 x 轴来添加各种刻度。

import matplotlib.ticker as ticker

def setup(ax):
    ax.spines['right'].set_color('none') # 去除左纵轴 (y 轴)
    ax.spines['left'].set_color('none') # 去除右纵轴
    ax.spines['top'].set_color('none') # 去除上横轴
    ax.yaxis.set_major_locator(ticker.NullLocator()) # 去除y轴上的刻度
    ax.xaxis.set_ticks_position('bottom') # 把x轴上的刻度位置定在轴底
    
    ax.tick_params(which='major', width=2.00) #设置主刻度和副刻度的长度和宽度
    ax.tick_params(which='major', length=10)
    ax.tick_params(which='minor', width=0.75)
    ax.tick_params(which='minor', length=2.5)
    
    ax.set_xlim(0, 5) # 设置 x 轴和 y 轴的边界
    ax.set_ylim(0, 1)
    
    ax.patch.set_alpha(0.0)# 将图中 patch 设成完全透明

fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(8, 4))

axes[0, 0].set_title('Original')

axes[0, 1].spines['right'].set_color('none')
axes[0, 1].spines['left'].set_color('none')
axes[0, 1].spines['top'].set_color('none')
axes[0, 1].set_title('Handle Spines')

axes[0, 2].yaxis.set_major_locator(ticker.NullLocator())
axes[0, 2].xaxis.set_ticks_position('bottom')
axes[0, 2].set_title('Handel Tick Labels')

axes[1, 0].tick_params(which='major', width=2.00)
axes[1, 0].tick_params(which='major', length=10)
axes[1, 0].tick_params(which='minor', width=0.75)
axes[1, 0].tick_params(which='minor', length=2.5)
axes[1, 0].set_title('Handle Tick Width/Length')

axes[1, 1].set_xlim(0, 5)
axes[1, 1].set_ylim(0, 1)
axes[1, 1].set_title('Handle Axis Limit')

axes[1, 2].patch.set_color('black')
axes[1, 2].patch.set_alpha(0.3)
axes[1, 2].set_title('Handle Patch Color')

plt.tight_layout()
plt.show()

刻度展示

不同的 locator() 可以生成不同的刻度对象，我们来研究以下 8 种：

NullLocator(): 空刻度
MultipleLocator(a): 刻度间隔 = 标量 a
FixedLocator(a): 刻度位置由数组 a 决定
LinearLocator(a): 刻度数目 = a, a 是标量
IndexLocator(b, o): 刻度间隔 = 标量 b，偏移量 = 标量 o
AutoLocator(): 根据默认设置决定
MaxNLocator(a): 最大刻度数目 = 标量 a
LogLocator(b, n): 基数 = 标量 b，刻度数目 = 标量 n

plt.figure(figsize=(8,6))
n = 8

# Null Locator 空刻度
ax = plt.subplot(n, 1, 1)
setup(ax)
ax.xaxis.set_major_locator(ticker.NullLocator())
ax.xaxis.set_minor_locator(ticker.NullLocator())
ax.text(0.0, 0.1, 'NullLocator()', fontsize=14, transform=ax.transAxes)

# Multiple Locator 刻度间隔0.5
ax = plt.subplot(n, 1, 2)
setup(ax)
ax.xaxis.set_major_locator(ticker.MultipleLocator(0.5))
ax.xaxis.set_minor_locator(ticker.MultipleLocator(0.1))
ax.text(0.0, 0.1, 'MultipleLocator(0.5)', fontsize=14, transform=ax.transAxes)

# Fixed Locator 固定刻度，传入一个列表或数组作为刻度
ax = plt.subplot(n, 1, 3)
setup(ax)
majors = [0, 1, 5]
ax.xaxis.set_major_locator(ticker.FixedLocator(majors))
import numpy as np
minors = np.linspace(0,1,11)[1:-1]
ax.xaxis.set_minor_locator(ticker.FixedLocator(minors))
ax.text(0.0, 0.1, 'FixedLocator([0,1,5])', fontsize=14, transform=ax.transAxes)

# Linear Locator 线性刻度，传入刻度数量
ax = plt.subplot(n, 1, 4)
setup(ax)
ax.xaxis.set_major_locator(ticker.LinearLocator(3))
ax.xaxis.set_minor_locator(ticker.LinearLocator(31))
ax.text(0.0, 0.1, 'LinearLocator(numticks=3)', fontsize=14, transform=ax.transAxes)

# Index Locator 间隔刻度
ax = plt.subplot(n, 1, 5)
setup(ax)
ax.plot(range(0, 5), [0]*5, color='white')
ax.xaxis.set_major_locator(ticker.IndexLocator(base=0.5, offset=0.25))
ax.text(0.0, 0.1, 'IndexLocator(base=0.5, offset=0.25)', fontsize=14, transform=ax.transAxes)

# Auto Locator 自动刻度
ax = plt.subplot(n,1,6)
setup(ax)
ax.xaxis.set_major_locator(ticker.AutoLocator())
ax.xaxis.set_minor_locator(ticker.AutoLocator())
ax.text(0.0, 0.1, 'AutoLocator()', fontsize=14, transform=ax.transAxes)

# MaxN Locator 最大数量刻度
ax = plt.subplot(n, 1, 7)
setup(ax)
ax.xaxis.set_major_locator(ticker.MaxNLocator(4))
ax.xaxis.set_minor_locator(ticker.MaxNLocator(40))
ax.text(0.0, 0.1, 'MaxNLocator(n=4)', fontsize=14, transform=ax.transAxes)

# Log Locator
ax = plt.subplot(n, 1, 8)
setup(ax)
ax.set_xlim(10**3, 10**10)
ax.set_xscale('log')
ax.xaxis.set_major_locator(ticker.LogLocator(base=10.0, numticks=15))
ax.text(0.0, 0.1, 'LogLocator(base-10,numticks=15)', fontsize=14,transform=ax.transAxes)

# 因为只看底部的x轴，所以调整坐标系的位置。
plt.subplots_adjust(left=0.05, right=0.95, bottom=0.05, top=1.05)
plt.show()

1.6 基础元素

我们已经介绍四个最重要的容器以及它们之间的层级

Figure → Axes → Axis → Ticks

图 → 坐标系 → 坐标轴 → 刻度

但要画出一幅有内容的图，还需要在容器里添加基础元素比如线 (line), 点(marker), 文字 (text), 图例 (legend), 网格 (grid), 标题 (title), 图片 (image) 等，具体来说

画一条线，用 plt.plot() 或 ax.plot()
画个记号，用 plt.scatter() 或 ax.scatter()
添加文字，用 plt.text() 或 ax.text()
添加图例，用 plt.legend() 或 ax.legend()
添加图片，用 plt.imshow() 或 ax.imshow()

2 画图

2.1 画第一幅图

画一幅标准普尔 500 指数在 2007-2010 的走势图。

#首先用 pd.read_csv 函数读取 S&P500.csv
import pandas as pd
data = pd.read_csv('S&P500.csv', parse_dates=True, 
                   index_col='Date', dayfirst=True)
data.head(3).append(data.tail(3))

	Open	High	Low	Close	Adj Close	Volume
Date
1950-01-03	16.660000	16.660000	16.660000	16.660000	16.660000	1260000
1950-01-04	16.850000	16.850000	16.850000	16.850000	16.850000	1890000
1950-01-05	16.930000	16.930000	16.930000	16.930000	16.930000	2550000
2019-04-22	2898.780029	2909.510010	2896.350098	2907.969971	2907.969971	2997950000
2019-04-23	2909.989990	2936.310059	2908.530029	2933.679932	2933.679932	3635030000
2019-04-24	2934.000000	2936.830078	2926.050049	2927.250000	2927.250000	3448960000

# 截取2007年~2010年部分的数据
spx = data.loc['2007-01-01':'2010-12-31', 'Adj Close'] 
# 'Close'不带[]，获得的是一个Series，带上[]，获得的是一个DataFrame
spx.head(3).append(spx.tail(3))

Date
2007-01-03    1416.599976
2007-01-04    1418.339966
2007-01-05    1409.709961
2010-12-29    1259.780029
2010-12-30    1257.880005
2010-12-31    1257.640015
Name: Adj Close, dtype: float64

plt.plot(spx.values)
plt.show()

**注：**在 plot() 函数里面只有变量 y 时 (y = spx.values)，那么自变量就是默认赋值为 range(len(y))。

此外我们没有设置图的尺寸，像素、线的颜色宽度、坐标轴的刻度和标签、图例、标题等等，所有设置都用的是 matplotlib 的默认设置。

2.2 图的默认设置

plt.rcParams # 可查看上图的所有默认属性

RcParams({'_internal.classic_mode': False,
          'agg.path.chunksize': 0,
          'animation.avconv_args': [],
          'animation.avconv_path': 'avconv',
          'animation.bitrate': -1,
          'animation.codec': 'h264',
          'animation.convert_args': [],
          'animation.convert_path': 'convert',
          'animation.embed_limit': 20.0,
          'animation.ffmpeg_args': [],
          'animation.ffmpeg_path': 'ffmpeg',
          'animation.frame_format': 'png',
          'animation.html': 'none',
          'animation.html_args': [],
          'animation.writer': 'ffmpeg',
          'axes.autolimit_mode': 'data',
          'axes.axisbelow': 'line',
          'axes.edgecolor': 'black',
          'axes.facecolor': 'white',
          'axes.formatter.limits': [-5, 6],
          'axes.formatter.min_exponent': 0,
          'axes.formatter.offset_threshold': 4,
          'axes.formatter.use_locale': False,
          'axes.formatter.use_mathtext': False,
          'axes.formatter.useoffset': True,
          'axes.grid': False,
          'axes.grid.axis': 'both',
          'axes.grid.which': 'major',
          'axes.labelcolor': 'black',
          'axes.labelpad': 4.0,
          'axes.labelsize': 'medium',
          'axes.labelweight': 'normal',
          'axes.linewidth': 0.8,
          'axes.prop_cycle': cycler('color', ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']),
          'axes.spines.bottom': True,
          'axes.spines.left': True,
          'axes.spines.right': True,
          'axes.spines.top': True,
          'axes.titlecolor': 'auto',
          'axes.titlelocation': 'center',
          'axes.titlepad': 6.0,
          'axes.titlesize': 'large',
          'axes.titleweight': 'normal',
          'axes.titley': None,
          'axes.unicode_minus': True,
          'axes.xmargin': 0.05,
          'axes.ymargin': 0.05,
          'axes3d.grid': True,
          'backend': 'module://ipykernel.pylab.backend_inline',
          'backend_fallback': True,
          'boxplot.bootstrap': None,
          'boxplot.boxprops.color': 'black',
          'boxplot.boxprops.linestyle': '-',
          'boxplot.boxprops.linewidth': 1.0,
          'boxplot.capprops.color': 'black',
          'boxplot.capprops.linestyle': '-',
          'boxplot.capprops.linewidth': 1.0,
          'boxplot.flierprops.color': 'black',
          'boxplot.flierprops.linestyle': 'none',
          'boxplot.flierprops.linewidth': 1.0,
          'boxplot.flierprops.marker': 'o',
          'boxplot.flierprops.markeredgecolor': 'black',
          'boxplot.flierprops.markeredgewidth': 1.0,
          'boxplot.flierprops.markerfacecolor': 'none',
          'boxplot.flierprops.markersize': 6.0,
          'boxplot.meanline': False,
          'boxplot.meanprops.color': 'C2',
          'boxplot.meanprops.linestyle': '--',
          'boxplot.meanprops.linewidth': 1.0,
          'boxplot.meanprops.marker': '^',
          'boxplot.meanprops.markeredgecolor': 'C2',
          'boxplot.meanprops.markerfacecolor': 'C2',
          'boxplot.meanprops.markersize': 6.0,
          'boxplot.medianprops.color': 'C1',
          'boxplot.medianprops.linestyle': '-',
          'boxplot.medianprops.linewidth': 1.0,
          'boxplot.notch': False,
          'boxplot.patchartist': False,
          'boxplot.showbox': True,
          'boxplot.showcaps': True,
          'boxplot.showfliers': True,
          'boxplot.showmeans': False,
          'boxplot.vertical': True,
          'boxplot.whiskerprops.color': 'black',
          'boxplot.whiskerprops.linestyle': '-',
          'boxplot.whiskerprops.linewidth': 1.0,
          'boxplot.whiskers': 1.5,
          'contour.corner_mask': True,
          'contour.linewidth': None,
          'contour.negative_linestyle': 'dashed',
          'date.autoformatter.day': '%Y-%m-%d',
          'date.autoformatter.hour': '%m-%d %H',
          'date.autoformatter.microsecond': '%M:%S.%f',
          'date.autoformatter.minute': '%d %H:%M',
          'date.autoformatter.month': '%Y-%m',
          'date.autoformatter.second': '%H:%M:%S',
          'date.autoformatter.year': '%Y',
          'date.epoch': '1970-01-01T00:00:00',
          'docstring.hardcopy': False,
          'errorbar.capsize': 0.0,
          'figure.autolayout': False,
          'figure.constrained_layout.h_pad': 0.04167,
          'figure.constrained_layout.hspace': 0.02,
          'figure.constrained_layout.use': False,
          'figure.constrained_layout.w_pad': 0.04167,
          'figure.constrained_layout.wspace': 0.02,
          'figure.dpi': 72.0,
          'figure.edgecolor': (1, 1, 1, 0),
          'figure.facecolor': (1, 1, 1, 0),
          'figure.figsize': [6.0, 4.0],
          'figure.frameon': True,
          'figure.max_open_warning': 20,
          'figure.raise_window': True,
          'figure.subplot.bottom': 0.125,
          'figure.subplot.hspace': 0.2,
          'figure.subplot.left': 0.125,
          'figure.subplot.right': 0.9,
          'figure.subplot.top': 0.88,
          'figure.subplot.wspace': 0.2,
          'figure.titlesize': 'large',
          'figure.titleweight': 'normal',
          'font.cursive': ['Apple Chancery',
                           'Textile',
                           'Zapf Chancery',
                           'Sand',
                           'Script MT',
                           'Felipa',
                           'cursive'],
          'font.family': ['sans-serif'],
          'font.fantasy': ['Comic Neue',
                           'Comic Sans MS',
                           'Chicago',
                           'Charcoal',
                           'ImpactWestern',
                           'Humor Sans',
                           'xkcd',
                           'fantasy'],
          'font.monospace': ['DejaVu Sans Mono',
                             'Bitstream Vera Sans Mono',
                             'Computer Modern Typewriter',
                             'Andale Mono',
                             'Nimbus Mono L',
                             'Courier New',
                             'Courier',
                             'Fixed',
                             'Terminal',
                             'monospace'],
          'font.sans-serif': ['DejaVu Sans',
                              'Bitstream Vera Sans',
                              'Computer Modern Sans Serif',
                              'Lucida Grande',
                              'Verdana',
                              'Geneva',
                              'Lucid',
                              'Arial',
                              'Helvetica',
                              'Avant Garde',
                              'sans-serif'],
          'font.serif': ['DejaVu Serif',
                         'Bitstream Vera Serif',
                         'Computer Modern Roman',
                         'New Century Schoolbook',
                         'Century Schoolbook L',
                         'Utopia',
                         'ITC Bookman',
                         'Bookman',
                         'Nimbus Roman No9 L',
                         'Times New Roman',
                         'Times',
                         'Palatino',
                         'Charter',
                         'serif'],
          'font.size': 10.0,
          'font.stretch': 'normal',
          'font.style': 'normal',
          'font.variant': 'normal',
          'font.weight': 'normal',
          'grid.alpha': 1.0,
          'grid.color': '#b0b0b0',
          'grid.linestyle': '-',
          'grid.linewidth': 0.8,
          'hatch.color': 'black',
          'hatch.linewidth': 1.0,
          'hist.bins': 10,
          'image.aspect': 'equal',
          'image.cmap': 'viridis',
          'image.composite_image': True,
          'image.interpolation': 'antialiased',
          'image.lut': 256,
          'image.origin': 'upper',
          'image.resample': True,
          'interactive': True,
          'keymap.all_axes': ['a'],
          'keymap.back': ['left', 'c', 'backspace', 'MouseButton.BACK'],
          'keymap.copy': ['ctrl+c', 'cmd+c'],
          'keymap.forward': ['right', 'v', 'MouseButton.FORWARD'],
          'keymap.fullscreen': ['f', 'ctrl+f'],
          'keymap.grid': ['g'],
          'keymap.grid_minor': ['G'],
          'keymap.help': ['f1'],
          'keymap.home': ['h', 'r', 'home'],
          'keymap.pan': ['p'],
          'keymap.quit': ['ctrl+w', 'cmd+w', 'q'],
          'keymap.quit_all': [],
          'keymap.save': ['s', 'ctrl+s'],
          'keymap.xscale': ['k', 'L'],
          'keymap.yscale': ['l'],
          'keymap.zoom': ['o'],
          'legend.borderaxespad': 0.5,
          'legend.borderpad': 0.4,
          'legend.columnspacing': 2.0,
          'legend.edgecolor': '0.8',
          'legend.facecolor': 'inherit',
          'legend.fancybox': True,
          'legend.fontsize': 'medium',
          'legend.framealpha': 0.8,
          'legend.frameon': True,
          'legend.handleheight': 0.7,
          'legend.handlelength': 2.0,
          'legend.handletextpad': 0.8,
          'legend.labelspacing': 0.5,
          'legend.loc': 'best',
          'legend.markerscale': 1.0,
          'legend.numpoints': 1,
          'legend.scatterpoints': 1,
          'legend.shadow': False,
          'legend.title_fontsize': None,
          'lines.antialiased': True,
          'lines.color': 'C0',
          'lines.dash_capstyle': 'butt',
          'lines.dash_joinstyle': 'round',
          'lines.dashdot_pattern': [6.4, 1.6, 1.0, 1.6],
          'lines.dashed_pattern': [3.7, 1.6],
          'lines.dotted_pattern': [1.0, 1.65],
          'lines.linestyle': '-',
          'lines.linewidth': 1.5,
          'lines.marker': 'None',
          'lines.markeredgecolor': 'auto',
          'lines.markeredgewidth': 1.0,
          'lines.markerfacecolor': 'auto',
          'lines.markersize': 6.0,
          'lines.scale_dashes': True,
          'lines.solid_capstyle': 'projecting',
          'lines.solid_joinstyle': 'round',
          'markers.fillstyle': 'full',
          'mathtext.bf': 'sans:bold',
          'mathtext.cal': 'cursive',
          'mathtext.default': 'it',
          'mathtext.fallback': 'cm',
          'mathtext.fallback_to_cm': None,
          'mathtext.fontset': 'dejavusans',
          'mathtext.it': 'sans:italic',
          'mathtext.rm': 'sans',
          'mathtext.sf': 'sans',
          'mathtext.tt': 'monospace',
          'mpl_toolkits.legacy_colorbar': True,
          'patch.antialiased': True,
          'patch.edgecolor': 'black',
          'patch.facecolor': 'C0',
          'patch.force_edgecolor': False,
          'patch.linewidth': 1.0,
          'path.effects': [],
          'path.simplify': True,
          'path.simplify_threshold': 0.111111111111,
          'path.sketch': None,
          'path.snap': True,
          'pcolor.shading': 'flat',
          'pdf.compression': 6,
          'pdf.fonttype': 3,
          'pdf.inheritcolor': False,
          'pdf.use14corefonts': False,
          'pgf.preamble': '',
          'pgf.rcfonts': True,
          'pgf.texsystem': 'xelatex',
          'polaraxes.grid': True,
          'ps.distiller.res': 6000,
          'ps.fonttype': 3,
          'ps.papersize': 'letter',
          'ps.useafm': False,
          'ps.usedistiller': None,
          'savefig.bbox': None,
          'savefig.directory': '~',
          'savefig.dpi': 'figure',
          'savefig.edgecolor': 'auto',
          'savefig.facecolor': 'auto',
          'savefig.format': 'png',
          'savefig.jpeg_quality': 95,
          'savefig.orientation': 'portrait',
          'savefig.pad_inches': 0.1,
          'savefig.transparent': False,
          'scatter.edgecolors': 'face',
          'scatter.marker': 'o',
          'svg.fonttype': 'path',
          'svg.hashsalt': None,
          'svg.image_inline': True,
          'text.antialiased': True,
          'text.color': 'black',
          'text.hinting': 'force_autohint',
          'text.hinting_factor': 8,
          'text.kerning_factor': 0,
          'text.latex.preamble': '',
          'text.latex.preview': False,
          'text.usetex': False,
          'timezone': 'UTC',
          'tk.window_focus': False,
          'toolbar': 'toolbar2',
          'webagg.address': '127.0.0.1',
          'webagg.open_in_browser': True,
          'webagg.port': 8988,
          'webagg.port_retries': 50,
          'xaxis.labellocation': 'center',
          'xtick.alignment': 'center',
          'xtick.bottom': True,
          'xtick.color': 'black',
          'xtick.direction': 'out',
          'xtick.labelbottom': True,
          'xtick.labelsize': 'medium',
          'xtick.labeltop': False,
          'xtick.major.bottom': True,
          'xtick.major.pad': 3.5,
          'xtick.major.size': 3.5,
          'xtick.major.top': True,
          'xtick.major.width': 0.8,
          'xtick.minor.bottom': True,
          'xtick.minor.pad': 3.4,
          'xtick.minor.size': 2.0,
          'xtick.minor.top': True,
          'xtick.minor.visible': False,
          'xtick.minor.width': 0.6,
          'xtick.top': False,
          'yaxis.labellocation': 'center',
          'ytick.alignment': 'center_baseline',
          'ytick.color': 'black',
          'ytick.direction': 'out',
          'ytick.labelleft': True,
          'ytick.labelright': False,
          'ytick.labelsize': 'medium',
          'ytick.left': True,
          'ytick.major.left': True,
          'ytick.major.pad': 3.5,
          'ytick.major.right': True,
          'ytick.major.size': 3.5,
          'ytick.major.width': 0.8,
          'ytick.minor.left': True,
          'ytick.minor.pad': 3.4,
          'ytick.minor.right': True,
          'ytick.minor.size': 2.0,
          'ytick.minor.visible': False,
          'ytick.minor.width': 0.6,
          'ytick.right': False})

在图表尺寸 (figsize)，每英寸像素点 (dpi)，线条颜色 (color)，线条风格 (linestyle)，线条宽度 (linewidth)，横纵轴刻度 (xticks, yticks)，横纵轴边界 (xlim, ylim) 做改进。

print( 'figure size:', plt.rcParams['figure.figsize'] )
print( 'figure dpi:',plt.rcParams['figure.dpi'] )
print( 'line color:',plt.rcParams['lines.color'] )
print( 'line style:',plt.rcParams['lines.linestyle'] )
print( 'line width:',plt.rcParams['lines.linewidth'] )

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.plot( spx.values )

print( 'xticks:', ax.get_xticks() )
print( 'yticks:', ax.get_yticks() )
print( 'xlim:', ax.get_xlim() )
print( 'ylim:', ax.get_ylim() )

figure size: [6.0, 4.0]
figure dpi: 72.0
line color: C0
line style: -
line width: 1.5
xticks: [-200.    0.  200.  400.  600.  800. 1000. 1200.]
yticks: [ 600.  800. 1000. 1200. 1400. 1600. 1800.]
xlim: (-50.35, 1057.35)
ylim: (632.0990292500001, 1609.58102375)

将属性值打印结果和图一起看一目了然。现在我们知道这张图大小是 6×4，每英寸像素有 72 个，线颜色 C0 代表是蓝色，风格 - 是连续线，宽度 1.5，等等

把这些默认属性值显性的在代码出写出来，画出来的跟什么设置都不写生成的图应该是一样的，以便于我们理解这些属性值。

# Creat a new figure of size 6×4 points, using 72 dots per inch
plt.figure(figsize=(6, 4), dpi=72)

# Plot using blue color (C0) with a continuous line of width 1.5 (pixels)
plt.plot(spx.values, color='C0', linewidth=1.5, linestyle='-')

# Set x ticks
plt.xticks(np.linspace(-100,800,10))

# Set y ticks
plt.yticks(np.linspace(600,1800,7))

# Set x limits
plt.xlim(-37.72,792.75)

# Set y limits
plt.ylim(632.099029250001, 1609.58102375)

# Show result on screen
plt.show()

2.3 设置尺寸和DPI

用 figsize 和 dpi 一起可以控制图的大小和像素。

函数 figsize(w,h) 决定图的宽和高 (单位是英寸)
属性 dpi 全称 dots per inches，测量每英寸多少像素。两个属性一起用，那么得到的图的像素为
```
  (w*dpi, h*dpi)
```

套用在下面代码中，我们其实将图的大小设置成 16×6 平方英寸，而像素设置成 (1600, 600)，因为 dpi = 100。

plt.figure( figsize=(16,6), dpi=100 )
plt.plot( spx.values )
plt.show()

2.4 设置颜色-风格-宽度

在 plt.plot() 用 color，linewidth 和 linestyle 属性一起可以控制折线的颜色、宽度 (2 像素) 和风格 (连续线)。

plt.figure( figsize=(16,6), dpi=100 )
plt.plot( spx.values, color=dt_hex, 
          linewidth=2, linestyle='-' )
plt.show()

2.5 设置边界

在图中 (fig) 添加了一个坐标系 (ax)，然后所有操作都在 ax 里面完成，比如用

ax.plot() 来画折线
ax.set_xlim(), ax_set_ylim() 来设置横轴和纵轴的边界

fig = plt.figure(figsize=(16, 6), dpi=100)
ax = fig.add_subplot(1, 1, 1)
x = spx.index
y = spx.values
ax.plot(x, y, color=dt_hex, linewidth=2, linestyle='-')

ax.set_ylim(y.min()*0.8, y.max()*1.2)
plt.show()

x.sort_values()

DatetimeIndex(['2007-01-02', '2007-01-03', '2007-01-05', '2007-01-06',
               '2007-01-08', '2007-01-10', '2007-01-11', '2007-01-16',
               '2007-01-17', '2007-01-18',
               ...
               '2010-12-17', '2010-12-20', '2010-12-21', '2010-12-22',
               '2010-12-23', '2010-12-27', '2010-12-28', '2010-12-29',
               '2010-12-30', '2010-12-31'],
              dtype='datetime64[ns]', name='Date', length=1008, freq=None)

2.6 设置刻度和标签

上图横轴的刻度个数和标签显示都是默认设置，为了显示年月日，可以用以下两个函数：

先用 ax.set_ticks() 设置出数值刻度
再用 ax.set_xticklabels() 在对应的数值刻度上写标签

fig = plt.figure(figsize=(16, 6), dpi=100)
ax = fig.add_subplot(1, 1, 1)
x = spx.index
y = spx.values
ax.plot(y, color=dt_hex, linewidth=2, linestyle='-')
# 需要去掉x
ax.set_xlim(-1, len(x)+1)
ax.set_ylim(y.min()*0.8, y.max()*1.2)

ax.set_xticks(range(0, len(x), 40))
ax.set_xticklabels([x[i].strftime('%Y-%m-%d') for i in ax.get_xticks()], 
                    rotation=45)
plt.show()

2.7 添加图例

添加图例 (legend) 非常简单，只需要在 ax.plot() 里多设定一个参数 label，然后用

ax.legend()

其中 loc = 0 表示 matplotlib 自动安排一个最好位置显示图例，而 frameon = True 给图例加了外框。

fig = plt.figure(figsize=(16, 6), dpi=100)
ax = fig.add_subplot(1, 1, 1)
x = spx.index
y = spx.values
ax.plot(y, color=dt_hex, linewidth=2, linestyle='-', label='S&P500')
ax.legend(loc=0, frameon=True)

ax.set_xlim(-1, len(x)+1)
ax.set_ylim(y.min()*0.8, y.max()*1.2)

ax.set_xticks(range(0, len(x), 40))
ax.set_xticklabels([x[i].strftime('%Y-%m-%d') for i in ax.get_xticks()], 
                    rotation=45)
plt.show()

2.8 添加第二幅图

添加恐慌指数VIX指数。

VIX 指数是芝加哥期权交易所 (CBOE) 市场波动率指数的交易代号，常见于衡量 S&P500 指数期权的隐含波动性，通常被称为「恐慌指数」，它是了解市场对未来30天市场波动性预期的一种衡量方法。

# 首先用 pd.read_csv 函数读取VIX.csv。
data = pd.read_csv( 'VIX.csv', index_col=0, 
                               parse_dates=True,
                               dayfirst=True )
vix = data.loc['2007-01-01':'2010-12-31', 'Adj Close']
vix.head(3).append(vix.tail(3))

Date
2007-01-03    12.040000
2007-01-04    11.510000
2007-01-05    12.140000
2010-12-29    17.280001
2010-12-30    17.520000
2010-12-31    17.750000
Name: Adj Close, dtype: float64

添加第二幅图很简单，用两次 plt.plot() 或者 ax.plot() 即可。

一般情况下，plt.plot() 或者 ax.plot()可以随意使用，但两者在使用「.methods」时存在一定差异：

plt.xlim
plt.ylim
plt.xticks

而

ax.set_xlim
ax.set_ylim
ax_set_xticks

fig = plt.figure(figsize=(16, 6), dpi=100)
x = spx.index
y1 = spx.values
y2 = vix.values
plt.plot(y1, color=dt_hex, linewidth=2, linestyle='-', label='S&P500')
plt.plot(y2, color=r_hex, linewidth=2, linestyle='-', label='VIX')
plt.legend(loc=0, frameon=True)

plt.xlim(-1, len(x)+1)
plt.ylim(np.vstack([y1,y2]).min()*0.8, np.vstack([y1,y2]).max()*1.2)

x_tick = range(0, len(x), 40)
x_label = [x[i].strftime('%Y-%m-%d') for i in ax.get_xticks()]
plt.xticks(x_tick, x_label, rotation=45)
plt.show()

VIX线几乎完全贴近横轴。

2.9 两个坐标系与两幅子图

S&P500 的量纲都是千位数，而 VIX 的量刚是两位数，两者放在一起，那可不是 VIX 就像一条水平线一样。两种改进方式：

用两个坐标系 (two axes)
用两幅子图 (two subplots)

两个坐标系

fig = plt.figure(figsize=(16, 6), dpi=100)
ax1 = fig.add_subplot(1,1,1)

x = spx.index
y1 = spx.values
y2 = vix.values

ax1.plot(y1, color=dt_hex, linewidth=2, linestyle='-', label='S&P500')
ax1.set_xlim(-1, len(x)+1)
ax1.set_ylim(y1.min()*0.8, y1.max()*1.2)

x_tick = range(0, len(x), 40)
x_label = [x[i].strftime('%Y-%m-%d') for i in ax.get_xticks()]
ax1.set_xticks(x_tick)
ax1.set_xticklabels(x_label, rotation=45)
ax1.legend(loc='upper left', frameon=True)

# Add a second axes
ax2 = ax1.twinx()
ax2.plot(y2, color=r_hex, linewidth=2, linestyle='-', label='VIX')
ax2.legend(loc='upper right', frameon=True)

plt.show()

用 ax1 和 ax2 就能实现在两个坐标系上画图，代码核心部分是第 19 行的

ax2 = ax1.twinx()

### 两幅子图
fig = plt.figure(figsize=(16, 12), dpi=100)

# subplot 1
plt.subplot(2, 1, 1)
x = spx.index
y1 = spx.values

plt.plot(y1, color=dt_hex, linewidth=2, linestyle='-', label='S&P500')
plt.xlim(-1, len(x)+1)
plt.ylim(y1.min()*0.8, y1.max()*1.2)

x_tick = range(0, len(x), 40)
x_label = [x[i].strftime('%Y-%m-%d') for i in ax.get_xticks()]
plt.xticks(x_tick, x_label, rotation=45)
plt.legend(loc='upper left', frameon=True)

# subplot2
plt.subplot(2, 1, 2)
y2 = vix.values

plt.plot(y2, color=r_hex, linewidth=2, linestyle='-', label='S&P500')
plt.xlim(-1, len(x)+1)
plt.ylim(y2.min()*0.8, y2.max()*1.2)

x_tick = range(0, len(x), 40)
x_label = [x[i].strftime('%Y-%m-%d') for i in ax.get_xticks()]
plt.xticks(x_tick, x_label, rotation=45)
plt.legend(loc='upper left', frameon=True)

plt.show()

这两种方法都可用，但在本例中，S&P500 和 VIX 放在一起 (用两个坐标系) 更能看出它们之间的关系，比如 2008 年 9 月到 2009 年 3 月的金融危机期间，S&P 500 在狂泻和 VIX 在飙升。

2.10 设置标注

在金融危机时期，市场发生了 5 件大事，分别是

2017-10-11: 牛市顶点
2008-03-12: 贝尔斯登倒闭
2008-09-15: 雷曼兄弟倒闭
2009-01-20: 苏格兰皇家银行股票抛售
2009-04-02: G20 峰会

fig = plt.figure(figsize=(16, 6), dpi=100)

from datetime import datetime
crisis_data = [(datetime(2007, 10, 11), 'Peak of bull market'),
              (datetime(2008, 3, 12), 'Bear Steans Fails'),
              (datetime(2008, 9, 15), 'Lehman Bankruptcy'),
              (datetime(2009, 1, 20), 'RBS Sell-off'),
              (datetime(2009, 4, 2), 'G20 Summit')]

ax1 = fig.add_subplot(1,1,1)

x = spx.index
y1 = spx.values
y2 = vix.values

ax1.plot(y1, color=dt_hex, linewidth=2, linestyle='-', label='S&P500')
ax1.set_xlim(-1, len(x)+1)
ax1.set_ylim(y1.min()*0.8, y1.max()*1.2)

x_tick = range(0, len(x), 40)
x_label = [x[i].strftime('%Y-%m-%d') for i in ax.get_xticks()]
ax1.set_xticks(x_tick)
ax1.set_xticklabels(x_label, rotation=45)
ax1.legend(loc='upper left', frameon=True)

for date, label in crisis_data:
    date = date.strftime('%Y-%m-%d')
    xi = x.get_loc(date)
    yi = spx.asof(date)
    ax1.scatter(xi, yi, 80, color=r_hex)
    ax1.annotate(label, xy=(xi, yi+60), 
                 xytext=(xi, yi+300),
                arrowprops=dict(facecolor='black', headwidth=4, width=1, headlength=6),
                horizontalalignment='left', verticalalignment='top')

# Add a second axes
ax2 = ax1.twinx()
ax2.plot(y2, color=r_hex, linewidth=2, linestyle='-', label='VIX')
ax2.legend(loc='upper right', frameon=True)

plt.show()

2.11 设置透明度

S&P 500 和 VIX 两条线画在一起太混乱了，而且事件标注也看不清楚。S&P 500 是主线，VIX 是副线，因此需要把副线的透明读调高点。

fig = plt.figure(figsize=(16, 6), dpi=100)

from datetime import datetime
crisis_data = [(datetime(2007, 10, 11), 'Peak of bull market'),
              (datetime(2008, 3, 12), 'Bear Steans Fails'),
              (datetime(2008, 9, 15), 'Lehman Bankruptcy'),
              (datetime(2009, 1, 20), 'RBS Sell-off'),
              (datetime(2009, 4, 2), 'G20 Summit')]

ax1 = fig.add_subplot(1,1,1)

x = spx.index
y1 = spx.values
y2 = vix.values

ax1.plot(y1, color=dt_hex, linewidth=2, linestyle='-', label='S&P500')
ax1.set_xlim(-1, len(x)+1)
ax1.set_ylim(y1.min()*0.8, y1.max()*1.2)

x_tick = range(0, len(x), 40)
x_label = [x[i].strftime('%Y-%m-%d') for i in ax.get_xticks()]
ax1.set_xticks(x_tick)
ax1.set_xticklabels(x_label, rotation=45)
ax1.legend(loc='upper left', frameon=True)

for date, label in crisis_data:
    date = date.strftime('%Y-%m-%d')
    xi = x.get_loc(date)
    yi = spx.asof(date)
    ax1.scatter(xi, yi, 80, color=r_hex)
    ax1.annotate(label, xy=(xi, yi+60), 
                 xytext=(xi, yi+300),
                arrowprops=dict(facecolor='black', headwidth=4, width=1, headlength=6),
                horizontalalignment='left', verticalalignment='top')

# Add a second axes
ax2 = ax1.twinx()
ax2.plot(y2, color=r_hex, linewidth=2, linestyle='-', label='VIX', alpha=0.3)
ax2.legend(loc='upper right', frameon=True)

plt.show()

3 画有效图

3.1 概览

在做图表设计时候经常面临着怎么选用合适的图表，图表展示的关系分为四大类 (点击下图放大)：

分布 (distribution)
联系 (relationship)
比较 (comparison)
构成 (composition)

在选用图表前首先要想清楚：你要表达什么样的数据关系。上面的图表分类太过繁多，接下来我们只讨论在量化金融中用的最多的几种类型，即

用直方图来展示股票价格和收益的分布
用散点图来展示两支股票之间的联系
用折线图来比较汇率在不同窗口的移动平均线
用饼状图来展示股票组合的构成成分

下面代码就是从 API 获取数据，股票用的是股票代号 (stock code)，而货币用的该 API 要求的格式，比如「欧元美元」用 EURUSD=X，而不是市场常见的 EURUSD，而「美元人民币」用 CNY=X 而不是 USDCNY，「美元日元」用 JPY=X 而不是 USDJPY。

from yahoofinancials import YahooFinancials

start_date = '2018-04-29'
end_date = '2019-04-29'
stock_code=['NVDA', 'AMZN', 'BABA', 'FB', 'AAPL']
currency_code = ['EURUSD=X', 'JPY=X', 'CNY=X']

stock = YahooFinancials(stock_code)
currency = YahooFinancials(currency_code)
stock_daily = stock.get_historical_price_data(start_date, end_date, 'daily')
currency_daily = currency.get_historical_price_data(start_date, end_date, 'daily')

该 API 返回结果 stock_daily 和 currency_daily 是「字典」格式

currency_daily

{'EURUSD=X': {'eventsData': {},
  'firstTradeDate': {'formatted_date': '2003-12-01', 'date': 1070236800},
  'currency': 'USD',
  'instrumentType': 'CURRENCY',
  'timeZone': {'gmtOffset': 0},
  'prices': [{'date': 1525042800,
    'high': 1.2138574123382568,
    'low': 1.2066364288330078,
    'open': 1.2128562927246094,
    'close': 1.2122827768325806,
    'volume': 0,
    'adjclose': 1.2122827768325806,
    'formatted_date': '2018-04-29'},
   {'date': 1525129200,
    'high': 1.2084592580795288,
    'low': 1.1983511447906494,
    'open': 1.208313226699829,
    'close': 1.2081234455108643,
    'volume': 0,
    'adjclose': 1.2081234455108643,
    'formatted_date': '2018-04-30'},
{...},
 'JPY=X': {'eventsData': {},
  'firstTradeDate': {'formatted_date': '1996-10-30', 'date': 846633600},
  'currency': 'JPY',
  'instrumentType': 'CURRENCY',
  'timeZone': {'gmtOffset': 0},
  'prices': [{'date': 1525042800,
    'high': 109.43949890136719,
    'low': 109.0199966430664,
    'open': 109.09500122070312,
    'close': 109.0979995727539,
    'volume': 0,
    'adjclose': 109.0979995727539,
    'formatted_date': '2018-04-29'},
   {...},

通过pandas，将上面的「原始数据」转换成 DataFrame

def data_converter(price_data, code, asset):
    # convert raw data to dataframe
    if asset == 'FX':
        # 如果 Asset 是股票类，直接用其股票代码；
        # 如果 Asset 是汇率类，一般参数写成 EURUSD 或 USDJPY
        code = str(code[3:] if code[3:] != 'USD' else code) + '=X'
    
    columns = ['open','close','low','high']
    # 定义好开盘价、收盘价、最低价和最高价的标签。
    price_dict = price_data[code]['prices']
    # 获取出一个「字典」格式的数据。
    index = [p['formatted_date'] for p in price_dict]
    # 用列表解析式 (list comprehension) 将获取出来。
    price = [[p[c] for c in columns] for p in price_dict]
    # 用列表解析式 (list comprehension) 将价格获取出来。
    data = pd.DataFrame(price,
                       index=pd.Index(index, name='date'),
                       columns = pd.Index(columns, name='OHLC'))
    return(data)

EURUSD = data_converter( currency_daily, 'EURCNY', 'FX' )
EURUSD.head(3).append(EURUSD.tail(3))

OHLC	open	close	low	high
date
2018-04-29	6.3348	6.3370	6.3233	6.3364
2018-04-30	6.3318	6.3321	6.3233	6.3324
2018-05-01	6.3233	6.3323	6.3233	6.3640
2019-04-24	6.7119	6.7209	6.7119	6.7479
2019-04-25	6.7331	6.7421	6.7194	6.7421
2019-04-28	6.7198	6.7288	6.7197	6.7348

NVDA = data_converter( stock_daily, 'NVDA',' EQ' )
NVDA.head(3).append(NVDA.tail(3))

OHLC	open	close	low	high
date
2018-04-30	226.990005	224.899994	224.119995	229.000000
2018-05-01	224.570007	227.139999	222.199997	227.250000
2018-05-02	227.000000	226.309998	225.250000	228.800003
2019-04-24	191.089996	191.169998	188.639999	192.809998
2019-04-25	189.550003	186.910004	183.699997	190.449997
2019-04-26	180.710007	178.089996	173.300003	180.889999

3.2 直方图

直方图 (histogram chart)，又称质量分布图，是一种统计报告图，由一系列高度不等的纵向条纹或线段表示数据分布的情况。一般用横轴表示数据类型，纵轴表示分布情况。在 Matplotlib 里的语法是

plt.hist()
ax.hist()

p_NVDA = NVDA['close']

fig = plt.figure(figsize=(8, 4))

plt.hist(p_NVDA, bins=30, color=dt_hex)
plt.xlabel('Nvidia Price')
plt.ylabel('Number of Days Observed')
plt.title('Frequency Distribution of Nvidia Prices, Apr-2018 to Apr-2019')

plt.show()

在本例中函数 hist() 里的参数有

p_NVDA：Series，也可以是 list 或者 ndarray
bins：分成多少堆
colors：用之前定义的深青色

在研究股票价格序列中，由于收益率有些好的统计性质，我们对其更感兴趣，接下来再看看英伟达 (NVDA) 的对数收益 (log-return) 的分布。

date = p_NVDA.index
price = p_NVDA.values
r_NVDA = pd.Series(np.log(price[1:]/price[:-1]), index=date[1:])

fig = plt.figure(figsize=(8, 4))

plt.hist(r_NVDA, bins=30, color=dt_hex)
plt.xlabel('Nvidia Daily Log-Return')
plt.ylabel('Number of Days Observed')
plt.title('Frequency Distribution of Nvidia Daily Log-Return, Apr-2018 to Apr-2019')

plt.show()

首先对数收益的计算公式为

r(t) = ln(P(t)/P(t-1))

得到 r_NVDA。计算一天的收益率需要两天的价格，因此用 p_NVDA 计算 r_NVDA 时，会丢失最新一天的数据，因此我们用 date[1:] 作为 r_NVDA 的行标签 (index)。

3.3 散点图

散点图 (scatter chart) 用两组数据构成多个坐标点，考察坐标点的分布，判断两变量之间是否存在某种联系的分布模式。在 Matplotlib 里的语法是

plt.scatter()
ax.scatter()

AMZN = data_converter( stock_daily, 'AMZN',' EQ' )
p_AMZN = AMZN['close']
date = p_AMZN.index
price = p_AMZN.values
r_AMZN = pd.Series(np.log(price[1:]/price[:-1]), index=date[1:])

BABA = data_converter( stock_daily, 'BABA',' EQ' )
p_BABA = BABA['close']
date = p_BABA.index
price = p_BABA.values
r_BABA = pd.Series(np.log(price[1:]/price[:-1]), index=date[1:])

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 6))

axes[0].scatter(p_AMZN, p_BABA, color=dt_hex)
axes[0].set_xlabel('Amazon Price')
axes[0].set_ylabel('Alibaba Price')
axes[0].set_title('Daily Prices from Apr-2018 to Apr-2019')

axes[1].scatter(r_AMZN, r_BABA, color=r_hex)
axes[1].set_xlabel('Amazon Log-Return')
axes[1].set_ylabel('Alibaba Log-Return')
axes[1].set_title('Daily Returns from Apr-2018 to Apr-2019')

plt.show()

在本例中函数 scatter() 里的参数有

p_AMZN (r_AMZN)：Series，也可以是 list 或者 ndarray
p_BABA (r_BABA)：Series，也可以是 list 或者 ndarray
colors：用之前定义的深青色和红色

3.4 折线图

折线图 (line chart) 显示随时间而变化的连续数据，因此非常适用于显示在相等时间间隔下数据的趋势。在 Matplotlib 里的语法是

plt.plot()
ax.plot()

# 首先获取EURUSD的收盘价
curr = 'EURUSD'
EURUSD = data_converter(currency_daily, curr, 'FX')
rate = EURUSD['close']

用 Pandas 里面的 rolling() 函数来计算 MA，再画出收盘价，MA20 和 MA60 三条折线。

fig =plt.figure(figsize=(16, 6))
ax = fig.add_subplot(1, 1, 1)

ax.set_title(curr + '- Moving Average')
ax.set_xticks(range(0, len(rate.index), 10))
ax.set_xticklabels([rate.index[i] for i in ax.get_xticks()], rotation=45)

ax.plot(rate, color=dt_hex, linewidth=2, label='Close')

MA_20 = rate.rolling(20).mean()
MA_60 = rate.rolling(60).mean()

ax.plot(MA_20, color=r_hex, linewidth=2, label='MA20')
ax.plot(MA_60, color=g_hex, linewidth=2, label='MA60')

ax.legend(loc=0)

plt.show()

在本例中函数 plot() 里的参数有

rate, MA_20, MA_60：Series，也可以是 list 或者 ndarray
colors：用之前定义的深青色，红色，绿色
linewidth：像素 2
label：用于显示图例

3.5 饼状图

饼状图 (pie chart) 是一个划分为几个扇形的圆形统计图表，用于描述量、频率或百分比之间的相对关系。在饼状图中，每个扇区面积大小为其所表示的数量的比例。在 Matplotlib 里的语法是

plt.pie()
ax.pie()

问题：如何画出一个股票投资组合在 2019 年 4 月 26 日的饼状图，假设组合里面有 100 股英伟达，20 股亚马逊，50 股阿里巴巴，30 股脸书和 40 股苹果。

# 首先计算组合里五支股票在 2019 年 4 月 26 日的市值 (market value, MV)
stock_list = ['NVDA', 'AMZN', 'BABA', 'FB', 'AAPL']
date = '2019-04-26'

MV = [data_converter(stock_daily, code, 'EQ')['close'][date] for code in stock_list]
MV = np.array(MV) * np.array([100, 20, 50, 30, 40])

MV

array([17808.99963379, 39012.60009766,  9354.49981689,  5744.70016479,
        2043.00003052])

# 设定好五种颜色和百分数格式 %.0f%% (小数点后面保留 0 位)，画出饼状图。
fig = plt.figure(figsize=(16, 6))
ax = fig.add_subplot(1, 1, 1)

ax.pie(MV, labels=stock_list, colors=[dt_hex, r_hex, g_hex, tn_hex, g25_hex],
       autopct='%.0f%%')
plt.show()

在本例中函数 pie() 里的参数有

MV：股票组合市值，ndarray
labels：标识，list
colors：用之前定义的一组颜色，list
autopct：显示百分数的格式，str

3.6 同理心

把饼当成钟，大多数人习惯顺时针的看里面的内容，因此把面积最大的那块的一条边 (见下图) 放在 12 点的位置最能突显其重要性，之后按面积从大到小顺时针排列。

在画饼状图前，我们需要额外做两件事：

按升序排列 5 只股票的市值
设定 pie() 的相关参数达到上述「最大块放 12 点位置」的效果

idx = MV.argsort()[::-1]
MV = MV[idx]
stock_list = [ stock_list[i] for i in idx ]
print( MV )
print( stock_list )

[39012.60009766 17808.99963379  9354.49981689  5744.70016479
  2043.00003052]
['AMZN', 'NVDA', 'BABA', 'FB', 'AAPL']

设定参数

startangle = 90 是说第一片扇形 (AMZN 深青色那块) 的左边在 90 度位置
counterclock = False 是说顺时针拜访每块扇形

fig = plt.figure(figsize=(16, 6))
ax = fig.add_subplot(1, 1, 1)

ax.pie(MV, labels = stock_list, colors=[dt_hex, r_hex, g_hex,tn_hex,g25_hex],
      autopct='%.0f%%', startangle=90, counterclock=False)
plt.show()

当饼状图里面扇形多过 5 个时，面积相近的扇形大小并不容易一眼辨别出来，不信看上图的 BABA 和 APPL，没看到数字很难看出那个面积大。但绝大多数人是感官动物，图形和数字肯定先选择看图形，这个时候用柱状图 (bar chart) 来代替饼状图，每个市值成分大小一目了然

用 ax.bar() 函数来画柱状图

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(1, 1, 1)

pct_MV = MV / np.sum(MV)
index = np.arange(len(pct_MV))

ax.bar(index, pct_MV, facecolor=r_hex, edgecolor=dt_hex)
ax.set_xticks(index)
ax.set_xticklabels(stock_list)
ax.set_ylim(0, np.max(pct_MV) * 1.2)

for x, y in zip(index, pct_MV):
    ax.text(x+0.04, y+0.01, '{0:.0%}'.format(y), ha='center', va='center')
    
plt.show()

函数 bar() 里的参数有

index：横轴刻度，ndarray
pct_MV：股票组合市值比例，ndarray
facecolor：柱状颜色，红色
edgecolor：柱边颜色，深青色

如果柱状很多时，或者标签名字很长时，用横向柱状图 (horizontal bar chart)，函数为 ax.barh()。

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(1, 1, 1)

pct_MV = MV[::-1] / np.sum(MV)
index = np.arange(len(pct_MV))

ax.barh(index, pct_MV, facecolor=r_hex, edgecolor=dt_hex)
ax.set_yticks(index)
ax.set_yticklabels(stock_list[::-1])
ax.set_xlim(0, np.max(pct_MV) * 1.2)

for x, y in zip( pct_MV, index ):
    ax.text(x+0.04, y, '{0:.0%}'.format(x), ha='center', va='center')
    
plt.show()

plt.style.use('ggplot')

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 6))

axes[0].pie(MV, labels = stock_list, autopct='%.0f%%',
            startangle=90, counterclock=False)

pct_MV = MV[::-1] / np.sum(MV)
index = np.arange(len(pct_MV))

axes[1].barh(index, pct_MV)
axes[1].set_yticks(index)
axes[1].set_yticklabels(stock_list[::-1])
axes[1].set_xlim(0, np.max(pct_MV) * 1.2)

for x, y in zip( pct_MV, index ):
    axes[1].text(x+0.04, y, '{0:.0%}'.format(x), ha='right', va='center')

plt.tight_layout()
plt.show()

plt.style.use('seaborn-colorblind')

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 6))

axes[0].pie(MV, labels = stock_list, autopct='%.0f%%',
            startangle=90, counterclock=False)

pct_MV = MV[::-1] / np.sum(MV)
index = np.arange(len(pct_MV))

axes[1].barh(index, pct_MV)
axes[1].set_yticks(index)
axes[1].set_yticklabels(stock_list[::-1])
axes[1].set_xlim(0, np.max(pct_MV) * 1.2)

for x, y in zip( pct_MV, index ):
    axes[1].text(x+0.04, y, '{0:.0%}'.format(x), ha='right', va='center')

plt.tight_layout()
plt.show()

plt.style.use('tableau-colorblind10')

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(14, 6))

axes[0].pie(MV, labels = stock_list, autopct='%.0f%%',
            startangle=90, counterclock=False)

pct_MV = MV[::-1] / np.sum(MV)
index = np.arange(len(pct_MV))

axes[1].barh(index, pct_MV)
axes[1].set_yticks(index)
axes[1].set_yticklabels(stock_list[::-1])
axes[1].set_xlim(0, np.max(pct_MV) * 1.2)

for x, y in zip( pct_MV, index ):
    axes[1].text(x+0.04, y, '{0:.0%}'.format(x), ha='right', va='center')

plt.tight_layout()
plt.show()

你可能感兴趣的:(Python学习,python,matplotlib)

理解Gunicorn：Python WSGI服务器的基石范范0825 ipython linux 运维
理解Gunicorn：PythonWSGI服务器的基石介绍Gunicorn，全称GreenUnicorn，是一个为PythonWSGI（WebServerGatewayInterface）应用设计的高效、轻量级HTTP服务器。作为PythonWeb应用部署的常用工具，Gunicorn以其高性能和易用性著称。本文将介绍Gunicorn的基本概念、安装和配置，帮助初学者快速上手。1.什么是Gunico
Python数据分析与可视化实战指南 William数据分析 python python 数据
在数据驱动的时代，Python因其简洁的语法、强大的库生态系统以及活跃的社区，成为了数据分析与可视化的首选语言。本文将通过一个详细的案例，带领大家学习如何使用Python进行数据分析，并通过可视化来直观呈现分析结果。一、环境准备1.1安装必要库在开始数据分析和可视化之前，我们需要安装一些常用的库。主要包括pandas、numpy、matplotlib和seaborn等。这些库分别用于数据处理、数学
python os.environ 江湖偌大 python 深度学习
os.environ['TF_CPP_MIN_LOG_LEVEL']='0'#默认值，输出所有信息os.environ['TF_CPP_MIN_LOG_LEVEL']='1'#屏蔽通知信息（INFO）os.environ['TF_CPP_MIN_LOG_LEVEL']='2'#屏蔽通知信息和警告信息（INFO\WARNING）os.environ['TF_CPP_MIN_LOG_LEVEL']='
Python中os.environ基本介绍及使用方法鹤冲天Pro #Python python 服务器开发语言
文章目录python中os.environos.environ简介os.environ进行环境变量的增删改查python中os.environ的使用详解1.简介2.key字段详解2.1常见key字段3.os.environ.get()用法4.环境变量的增删改查和判断是否存在4.1新增环境变量4.2更新环境变量4.3获取环境变量4.4删除环境变量4.5判断环境变量是否存在python中os.envi
Pyecharts数据可视化大屏：打造沉浸式数据分析体验我的运维人生信息可视化数据分析数据挖掘运维开发技术共享
Pyecharts数据可视化大屏：打造沉浸式数据分析体验在当今这个数据驱动的时代，如何将海量数据以直观、生动的方式展现出来，成为了数据分析师和企业决策者关注的焦点。Pyecharts，作为一款基于Python的开源数据可视化库，凭借其丰富的图表类型、灵活的配置选项以及高度的定制化能力，成为了构建数据可视化大屏的理想选择。本文将深入探讨如何利用Pyecharts打造数据可视化大屏，并通过实际代码案例
Python教程：一文了解使用Python处理XPath 旦莫 Python进阶 python 开发语言
目录1.环境准备1.1安装lxml1.2验证安装2.XPath基础2.1什么是XPath？2.2XPath语法2.3示例XML文档3.使用lxml解析XML3.1解析XML文档3.2查看解析结果4.XPath查询4.1基本路径查询4.2使用属性查询4.3查询多个节点5.XPath的高级用法5.1使用逻辑运算符5.2使用函数6.实战案例6.1从网页抓取数据6.1.1安装Requests库6.1.2代
python os.environ_python os.environ 读取和设置环境变量 weixin_39605414 python os.environ
>>>importos>>>os.environ.keys()['LC_NUMERIC','GOPATH','GOROOT','GOBIN','LESSOPEN','SSH_CLIENT','LOGNAME','USER','HOME','LC_PAPER','PATH','DISPLAY','LANG','TERM','SHELL','J2REDIR','LC_MONETARY','QT_QPA
使用Faiss进行高效相似度搜索 llzwxh888 faiss python
在现代AI应用中，快速和高效的相似度搜索是至关重要的。Faiss（FacebookAISimilaritySearch）是一个专门用于快速相似度搜索和聚类的库，特别适用于高维向量。本文将介绍如何使用Faiss来进行相似度搜索，并结合Python代码演示其基本用法。什么是Faiss？Faiss是一个由FacebookAIResearch团队开发的开源库，主要用于高维向量的相似性搜索和聚类。Faiss
python是什么意思中文-在python中%是什么意思编程大乐趣
Python中%有两种：1、数值运算：%代表取模，返回除法的余数。如：>>>7%212、%操作符（字符串格式化，stringformatting），说明如下：%[(name)][flags][width].[precision]typecode(name)为命名flags可以有+，-，''或0。+表示右对齐。-表示左对齐。''为一个空格，表示在正数的左侧填充一个空格，从而与负数对齐。0表示使用0填
Day1笔记-Python简介&标识符和关键字&输入输出 ~在杰难逃~ Python python 开发语言大数据数据分析数据挖掘
大家好，从今天开始呢，杰哥开展一个新的专栏，当然，数据分析部分也会不定时更新的，这个新的专栏主要是讲解一些Python的基础语法和知识，帮助0基础的小伙伴入门和学习Python，感兴趣的小伙伴可以开始认真学习啦！一、Python简介【了解】1.计算机工作原理编程语言就是用来定义计算机程序的形式语言。我们通过编程语言来编写程序代码，再通过语言处理程序执行向计算机发送指令，让计算机完成对应的工作，编程
python八股文面试题分享及解析(1) Shawn________ python
#1.'''a=1b=2不用中间变量交换a和b'''#1.a=1b=2a,b=b,aprint(a)print(b)结果：21#2.ll=[]foriinrange(3):ll.append({'num':i})print(11)结果:#[{'num':0},{'num':1},{'num':2}]#3.kk=[]a={'num':0}foriinrange(3):#0,12#可变类型，不仅仅改变
每日算法&面试题，大厂特训二十八天——第二十天（树）肥学 ⚡算法题⚡面试题每日精进 java 算法数据结构
目录标题导读算法特训二十八天面试题点击直接资料领取导读肥友们为了更好的去帮助新同学适应算法和面试题，最近我们开始进行专项突击一步一步来。上一期我们完成了动态规划二十一天现在我们进行下一项对各类算法进行二十八天的一个小总结。还在等什么快来一起肥学进行二十八天挑战吧！！特别介绍小白练手专栏，适合刚入手的新人欢迎订阅编程小白进阶python有趣练手项目里面包括了像《机器人尬聊》《恶搞程序》这样的有趣文章
Python快速入门 —— 第三节：类与对象孤华暗香 Python快速入门 python 开发语言
第三节：类与对象目标：了解面向对象编程的基础概念，并学会如何定义类和创建对象。内容：类与对象：定义类：class关键字。类的构造函数：__init__()。类的属性和方法。对象的创建与使用。示例：classStudent:def__init__(self,name,age,major):self.name&#
pyecharts——绘制柱形图折线图 2224070247 信息可视化 python java 数据可视化
一、pyecharts概述自2013年6月百度EFE(ExcellentFrontEnd）数据可视化团队研发的ECharts1.0发布到GitHub网站以来，ECharts一直备受业界权威的关注并获得广泛好评，成为目前成熟且流行的数据可视化图表工具，被应用到诸多数据可视化的开发领域。Python作为数据分析领域最受欢迎的语言，也加入ECharts的使用行列，并研发出方便Python开发者使用的数据
Python 实现图片裁剪（附代码） | Python工具剑客阿良_ALiang
前言本文提供将图片按照自定义尺寸进行裁剪的工具方法，一如既往的实用主义。环境依赖ffmpeg环境安装，可以参考我的另一篇文章：windowsffmpeg安装部署_阿良的博客-CSDN博客本文主要使用到的不是ffmpeg，而是ffprobe也在上面这篇文章中的zip包中。ffmpy安装：pipinstallffmpy-ihttps://pypi.douban.com/simple代码不废话了，上代码
【华为OD技术面试真题 - 技术面】- python八股文真题题库（4) 算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选**1.Python中的`with`**用途和功能自动资源管理示例：文件操作上下文管理协议示例代码工作流程解析优点2.\_\_new\_\_和**\_\_init\_\_**区别__new____init__区别总结3.**切片（Slicing）操作**基本切片语法
python os 环境变量 CV矿工 python 开发语言 numpy
环境变量：环境变量是程序和操作系统之间的通信方式。有些字符不宜明文写进代码里，比如数据库密码，个人账户密码，如果写进自己本机的环境变量里，程序用的时候通过os.environ.get（）取出来就行了。os.environ是一个环境变量的字典。环境变量的相关操作importos"""设置/修改环境变量：os.environ[‘环境变量名称’]=‘环境变量值’#其中key和value均为string类
Python爬虫解析工具之xpath使用详解 eqa11 python 爬虫开发语言
文章目录Python爬虫解析工具之xpath使用详解一、引言二、环境准备1、插件安装2、依赖库安装三、xpath语法详解1、路径表达式2、通配符3、谓语4、常用函数四、xpath在Python代码中的使用1、文档树的创建2、使用xpath表达式3、获取元素内容和属性五、总结Python爬虫解析工具之xpath使用详解一、引言在Python爬虫开发中，数据提取是一个至关重要的环节。xpath作为一门
【华为OD技术面试真题 - 技术面】- python八股文真题题库（1）算法大师华为od 面试 python
华为OD面试真题精选专栏：华为OD面试真题精选目录:2024华为OD面试手撕代码真题目录以及八股文真题目录文章目录华为OD面试真题精选1.数据预处理流程数据预处理的主要步骤工具和库2.介绍线性回归、逻辑回归模型线性回归（LinearRegression）模型形式：关键点：逻辑回归（LogisticRegression）模型形式：关键点：参数估计与评估：3.python浅拷贝及深拷贝浅拷贝（Shal
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
《Python数据分析实战终极指南》 xjt921122 python 数据分析开发语言
对于分析师来说，大家在学习Python数据分析的路上，多多少少都遇到过很多大坑**，有关于技能和思维的**：Excel已经没办法处理现有的数据量了，应该学Python吗？找了一大堆Python和Pandas的资料来学习，为什么自己动手就懵了？跟着比赛类公开数据分析案例练了很久，为什么当自己面对数据需求还是只会数据处理而没有分析思路？学了对比、细分、聚类分析，也会用PEST、波特五力这类分析法，为啥
Python中深拷贝与浅拷贝的区别 yuxiaoyu.
转自：http://blog.csdn.net/u014745194/article/details/70271868定义：在Python中对象的赋值其实就是对象的引用。当创建一个对象，把它赋值给另一个变量的时候，python并没有拷贝这个对象，只是拷贝了这个对象的引用而已。浅拷贝：拷贝了最外围的对象本身，内部的元素都只是拷贝了一个引用而已。也就是，把对象复制一遍，但是该对象中引用的其他对象我不复
Python开发常用的三方模块如下：换个网名有点难 python 开发语言
Python是一门功能强大的编程语言，拥有丰富的第三方库，这些库为开发者提供了极大的便利。以下是100个常用的Python库，涵盖了多个领域：1、NumPy，用于科学计算的基础库。2、Pandas，提供数据结构和数据分析工具。3、Matplotlib，一个绘图库。4、Scikit-learn，机器学习库。5、SciPy，用于数学、科学和工程的库。6、TensorFlow，由Google开发的开源机
Python编译器鹿鹿~ Python编译器 Python python 开发语言后端
嘿嘿嘿我又来了啊有些小盆友可能不知道Python其实是有编译器的，也就是PyCharm。你们可能会问到这个是干嘛的又不可以吃也不可以穿好像没有什么用，其实你还说对了这个还真的不可以吃也不可以穿，但是它用来干嘛的呢。用来编译你所打出的代码进行运行（可能这里说的有点不对但是只是个人认为）现在我们来说说PyCharm是用来干嘛的。PyCharm是一种PythonIDE，带有一整套可以帮助用户在使用Pyt
一文掌握python面向对象魔术方法（二）程序员neil python python 开发语言
接上篇：一文掌握python面向对象魔术方法（一）-CSDN博客目录六、迭代和序列化：1、__iter__(self):定义迭代器，使得类可以被for循环迭代。2、__getitem__(self,key):定义索引操作，如obj[key]。3、__setitem__(self,key,value):定义赋值操作，如obj[key]=value。4、__delitem__(self,key):定义
一文掌握python常用的list（列表）操作程序员neil python python 开发语言
目录一、创建列表1.直接创建列表：2.使用list()构造器3.使用列表推导式4.创建空列表二、访问列表元素1.列表支持通过索引访问元素，索引从0开始：2.还可以使用切片操作访问列表的一部分：三、修改列表元素四、添加元素1.append()：在末尾添加元素2.insert()：在指定位置插入元素五、删除元素1.del：删除指定位置的元素2.remove()：删除指定值的第一个匹配项3.pop()：
Python实现简单的机器学习算法 master_chenchengg python python 办公效率 python开发 IT
Python实现简单的机器学习算法开篇：初探机器学习的奇妙之旅搭建环境：一切从安装开始必备工具箱第一步：安装Anaconda和JupyterNotebook小贴士：如何配置Python环境变量算法初体验：从零开始的Python机器学习线性回归：让数据说话数据准备：从哪里找数据编码实战：Python实现线性回归模型评估：如何判断模型好坏逻辑回归：从分类开始理论入门：什么是逻辑回归代码实现：使用skl
python中的深拷贝与浅拷贝 anshejd70787 python
深拷贝和浅拷贝浅拷贝的时候，修改原来的对象，浅拷贝的对象不会发生改变。1、对象的赋值对象的赋值实际上是对象之间的引用：当创建一个对象，然后将这个对象赋值给另外一个变量的时候，python并没有拷贝这个对象，而只是拷贝了这个对象的引用。当对对象做赋值或者是参数传递或者作为返回值的时候，总是传递原始对象的引用，而不是一个副本。如下所示：>>>aList=["kel","abc",123]>>>bLis
用Python实现简单的猜数字游戏程序媛了了 python 游戏 java
猜数字游戏代码：importrandomdefpythonit():a=random.randint(1,100)n=int(input("输入你猜想的数字："))whilen!=a:ifn>a:print("很遗憾，猜大了")n=int(input("请再次输入你猜想的数字："))elifna::如果玩家猜的数字n大于随机数字a，则输出"很遗憾，猜大了"，并提示玩家再次输入。elifn
用Python实现读取统计单词个数程序媛了了 python 游戏 java
完整实例代码：fromcollectionsimportCounterdefpythonit():danci={}withopen("pythonit.txt","r",encoding="utf-8")asf:foriinf:words=i.strip().split()forwordinwords:ifwordnotindanci:danci[word]=1else:danci[word]+=
jvm调优总结（从基本概念到深度优化） oloz java jvm jdk 虚拟机应用服务器
JVM参数详解：http://www.cnblogs.com/redcreen/archive/2011/05/04/2037057.html Java虚拟机中，数据类型可以分为两类：基本类型和引用类型。基本类型的变量保存原始值，即：他代表的值就是数值本身；而引用类型的变量保存引用值。“引用值”代表了某个对象的引用，而不是对象本身，对象本身存放在这个引用值所表示的地址的位置。
【Scala十六】Scala核心十：柯里化函数 bit1129 scala
本篇文章重点说明什么是函数柯里化，这个语法现象的背后动机是什么，有什么样的应用场景，以及与部分应用函数(Partial Applied Function)之间的联系 1. 什么是柯里化函数 A way to write functions with multiple parameter lists. For instance def f(x: Int)(y: Int) is a
HashMap dalan_123 java
HashMap在java中对很多人来说都是熟的；基于hash表的map接口的非同步实现。允许使用null和null键；同时不能保证元素的顺序；也就是从来都不保证其中的元素的顺序恒久不变。 1、数据结构在java中，最基本的数据结构无外乎：数组和引用（指针），所有的数据结构都可以用这两个来构造，HashMap也不例外，归根到底HashMap就是一个链表散列的数据
Java Swing如何实时刷新JTextArea，以显示刚才加append的内容周凡杨 java 更新 swing JTextArea
在代码中执行完textArea.append("message")后，如果你想让这个更新立刻显示在界面上而不是等swing的主线程返回后刷新，我们一般会在该语句后调用textArea.invalidate()和textArea.repaint()。问题是这个方法并不能有任何效果，textArea的内容没有任何变化，这或许是swing的一个bug，有一个笨拙的办法可以实现
servlet或struts的Action处理ajax请求 g21121 servlet
其实处理ajax的请求非常简单，直接看代码就行了： //如果用的是struts //HttpServletResponse response = ServletActionContext.getResponse(); // 设置输出为文字流 response.setContentType("text/plain"); // 设置字符集 res
FineReport的公式编辑框的语法简介老A不折腾 finereport 公式总结
FINEREPORT用到公式的地方非常多，单元格（以=开头的便被解析为公式），条件显示，数据字典，报表填报属性值定义，图表标题，轴定义，页眉页脚，甚至单元格的其他属性中的鼠标悬浮提示内容都可以写公式。简单的说下自己感觉的公式要注意的几个地方： 1.if语句语法刚接触感觉比较奇怪，if(条件式子,值1,值2)，if可以嵌套，if(条件式子1，值1，if(条件式子2，值2，值3)
linux mysql 数据库乱码的解决办法墙头上一根草 linux mysql 数据库乱码
linux 上mysql数据库区分大小写的配置 lower_case_table_names=1 1-不区分大小写 0-区分大小写修改/etc/my.cnf 具体的修改内容如下: [client] default-character-set=utf8 [mysqld] datadir=/var/lib/mysql socket=/va
我的spring学习笔记6-ApplicationContext实例化的参数兼容思想 aijuans Spring 3
ApplicationContext能读取多个Bean定义文件，方法是： ApplicationContext appContext = new ClassPathXmlApplicationContext（ new String[]｛“bean-config1.xml”，“bean-config2.xml”，“bean-config3.xml”，“bean-config4.xml
mysql 基准测试之sysbench annan211 基准测试 mysql基准测试 MySQL测试 sysbench
1 执行如下命令，安装sysbench-0.5： tar xzvf sysbench-0.5.tar.gz cd sysbench-0.5 chmod +x autogen.sh ./autogen.sh ./configure --with-mysql --with-mysql-includes=/usr/local/mysql
sql的复杂查询使用案列与技巧百合不是茶 oracle sql 函数数据分页合并查询
本片博客使用的数据库表是oracle中的scott用户表; ------------------- 自然连接查询查询 smith 的上司(两种方法) &
深入学习Thread类 bijian1013 java thread 多线程 java多线程
一．线程的名字下面来看一下Thread类的name属性，它的类型是String。它其实就是线程的名字。在Thread类中，有String getName()和void setName(String)两个方法用来设置和获取这个属性的值。同时，Thr
JSON串转换成Map以及如何转换到对应的数据类型 bijian1013 java fastjson net.sf.json
在实际开发中，难免会碰到JSON串转换成Map的情况，下面来看看这方面的实例。另外，由于fastjson只支持JDK1.5及以上版本，因此在JDK1.4的项目中可以采用net.sf.json来处理。一.fastjson实例 JsonUtil.java package com.study; impor
【RPC框架HttpInvoker一】HttpInvoker：Spring自带RPC框架 bit1129 spring
HttpInvoker是Spring原生的RPC调用框架，HttpInvoker同Burlap和Hessian一样，提供了一致的服务Exporter以及客户端的服务代理工厂Bean，这篇文章主要是复制粘贴了Hessian与Spring集成一文，【RPC框架Hessian四】Hessian与Spring集成在【RPC框架Hessian二】Hessian 对象序列化和反序列化一文中
【Mahout二】基于Mahout CBayes算法的20newsgroup的脚本分析 bit1129 Mahout
#!/bin/bash # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information re
nginx三种获取用户真实ip的方法 ronin47
随着nginx的迅速崛起，越来越多公司将apache更换成nginx. 同时也越来越多人使用nginx作为负载均衡, 并且代理前面可能还加上了CDN加速，但是随之也遇到一个问题：nginx如何获取用户的真实IP地址,如果后端是apache,请跳转到<apache获取用户真实IP地址>，如果是后端真实服务器是nginx，那么继续往下看。实例环境：用户IP 120.22.11.11
java-判断二叉树是不是平衡 bylijinnan java
参考了 http://zhedahht.blog.163.com/blog/static/25411174201142733927831/ 但是用java来实现有一个问题。由于Java无法像C那样“传递参数的地址，函数返回时能得到参数的值”，唯有新建一个辅助类：AuxClass import ljn.help.*; public class BalancedBTree {
BeanUtils.copyProperties VS PropertyUtils.copyProperties 诸葛不亮 PropertyUtils BeanUtils
BeanUtils.copyProperties VS PropertyUtils.copyProperties 作为两个bean属性copy的工具类，他们被广泛使用，同时也很容易误用，给人造成困然；比如：昨天发现同事在使用BeanUtils.copyProperties copy有integer类型属性的bean时，没有考虑到会将null转换为0，而后面的业
[金融与信息安全]最简单的数据结构最安全 comsci 数据结构
现在最流行的数据库的数据存储文件都具有复杂的文件头格式，用操作系统的记事本软件是无法正常浏览的，这样的情况会有什么问题呢？从信息安全的角度来看，如果我们数据库系统仅仅把这种格式的数据文件做异地备份，如果相同版本的所有数据库管理系统都同时被攻击，那么
vi区段删除 Cwind linux vi 区段删除
区段删除是编辑和分析一些冗长的配置文件或日志文件时比较常用的操作。简记下vi区段删除要点备忘。 vi概述引文中并未将末行模式单独列为一种模式。单不单列并不重要，能区分命令模式与末行模式即可。 vi区段删除步骤： 1. 在末行模式下使用:set nu显示行号非必须，随光标移动vi右下角也会显示行号，能够正确找到并记录删除开始行
清除tomcat缓存的方法总结 dashuaifu tomcat 缓存
用tomcat容器，大家可能会发现这样的问题，修改jsp文件后，但用IE打开依然是以前的Jsp的页面。出现这种现象的原因主要是tomcat缓存的原因。解决办法如下: 在jsp文件头加上 <meta http-equiv="Expires" content="0"> <meta http-equiv="kiben&qu
不要盲目的在项目中使用LESS CSS dcj3sjt126com Web less
　如果你还不知道LESS CSS是什么东西，可以看一下这篇文章，是我一朋友写给新人看的《CSS——LESS》　　不可否认，LESS CSS是个强大的工具，它弥补了css没有变量、无法运算等一些“先天缺陷”，但它似乎给我一种错觉，就是为了功能而实现功能。　　比如它的引用功能 ? .rounded_corners{
[入门]更上一层楼 dcj3sjt126com PHP yii2
更上一层楼通篇阅读完整个“入门”部分，你就完成了一个完整 Yii 应用的创建。在此过程中你学到了如何实现一些常用功能，例如通过 HTML 表单从用户那获取数据，从数据库中获取数据并以分页形式显示。你还学到了如何通过 Gii 去自动生成代码。使用 Gii 生成代码把 Web 开发中多数繁杂的过程转化为仅仅填写几个表单就行。本章将介绍一些有助于更好使用 Yii 的资源：
Apache HttpClient使用详解 eksliang httpclient http协议
Http协议的重要性相信不用我多说了，HttpClient相比传统JDK自带的URLConnection，增加了易用性和灵活性（具体区别，日后我们再讨论），它不仅是客户端发送Http请求变得容易，而且也方便了开发人员测试接口（基于Http协议的），即提高了开发的效率，也方便提高代码的健壮性。因此熟练掌握HttpClient是很重要的必修内容，掌握HttpClient后，相信对于Http协议的了解会
zxing二维码扫描功能 gundumw100 android zxing
经常要用到二维码扫描功能现给出示例代码 import com.google.zxing.WriterException; import com.zxing.activity.CaptureActivity; import com.zxing.encoding.EncodingHandler; import android.app.Activity; import an
纯HTML+CSS带说明的黄色导航菜单 ini html Web html5 css hovertree
HoverTree带说明的CSS菜单:纯HTML+CSS结构链接带说明的黄色导航在线体验效果：http://hovertree.com/texiao/css/1.htm代码如下,保存到HTML文件可以看到效果： <!DOCTYPE html > <html > <head> <title>HoverTree
fastjson初始化对性能的影响 kane_xie fastjson 序列化
之前在项目中序列化是用thrift，性能一般，而且需要用编译器生成新的类，在序列化和反序列化的时候感觉很繁琐，因此想转到json阵营。对比了jackson，gson等框架之后，决定用fastjson，为什么呢，因为看名字感觉很快。。。网上的说法： fastjson 是一个性能很好的 Java 语言实现的 JSON 解析器和生成器，来自阿里巴巴的工程师开发。
基于Mybatis封装的增删改查实现通用自动化sql mengqingyu DAO
1.基于map或javaBean的增删改查可实现不写dao接口和实现类以及xml，有效的提高开发速度。 2.支持自定义注解包括主键生成、列重复验证、列名、表名等 3.支持批量插入、批量更新、批量删除 <bean id="dynamicSqlSessionTemplate" class="com.mqy.mybatis.support.Dynamic
js控制input输入框的方法封装(数字，中文，字母，浮点数等) qifeifei javascript js
在项目开发的时候，经常有一些输入框，控制输入的格式，而不是等输入好了再去检查格式，格式错了就报错，体验不好。 /** 数字，中文，字母,浮点数(+/-/.) 类型输入限制，只要在input标签上加上 jInput="number,chinese,alphabet,floating" 备注：floating属性只能单独用*/ funct
java 计时器应用 tangqi609567707 java timer
mport java.util.TimerTask; import java.util.Calendar; public class MyTask extends TimerTask { private static final int
erlang输出调用栈信息 wudixiaotie erlang
在erlang otp的开发中，如果调用第三方的应用，会有有些错误会不打印栈信息，因为有可能第三方应用会catch然后输出自己的错误信息，所以对排查bug有很大的阻碍，这样就要求我们自己打印调用的栈信息。用这个函数：erlang:process_display (self (), backtrace).需要注意这个函数只会输出到标准错误输出。也可以用这个函数：erlang:get_s