【Python数据科学快速入门系列 | 09】Matplotlib数据关系图表应用总结




  • 《Python数据科学快速入门系列》快速导航:
  • 1. 概述
  • 2. 常用的数据关系图表应用
    • 2.1 散点图
      • 2.1.1 单特征与标签的相关性分析
      • 2.1.1 双特征与标签的相关性分析
      • 2.1.3 双特征散点矩阵与标签的相关性分析
    • 2.2 散点曲线图
    • 2.3 曲线图
  • 3. 总结


1. 概述



【Python数据科学快速入门系列 | 09】Matplotlib数据关系图表应用总结_第1张图片

2. 常用的数据关系图表应用

2.1 散点图

散点图也叫 X-Y 图,它将所有的数据以点的形式展现在直角坐标系上,以显示变量之间的相互影响程度,点的位置由变量的数值决定。


【Python数据科学快速入门系列 | 09】Matplotlib数据关系图表应用总结_第2张图片




A scatter plot of *y* vs. *x* with varying marker size and/or color.

x, y : float or array-like, shape (n, )
    The data positions.

s : float or array-like, shape (n, ), optional
    The marker size in points**2.
    Default is ``rcParams['lines.markersize'] ** 2``.

c : array-like or list of colors or color, optional
    The marker colors. Possible values:

    - A scalar or sequence of n numbers to be mapped to colors using
      *cmap* and *norm*.
    - A 2D array in which the rows are RGB or RGBA.
    - A sequence of colors of length n.
    - A single color format string.

    Note that *c* should not be a single numeric RGB or RGBA sequence
    because that is indistinguishable from an array of values to be
    colormapped. If you want to specify the same RGB or RGBA value for
    all points, use a 2D array with a single row.  Otherwise, value-
    matching will have precedence in case of a size matching with *x*
    and *y*.

    If you wish to specify a single color for all points
    prefer the *color* keyword argument.

    Defaults to `None`. In that case the marker color is determined
    by the value of *color*, *facecolor* or *facecolors*. In case
    those are not specified or `None`, the marker color is determined
    by the next color of the ``Axes``' current "shape and fill" color
    cycle. This cycle defaults to :rc:`axes.prop_cycle`.

marker : `~.markers.MarkerStyle`, default: :rc:`scatter.marker`
    The marker style. *marker* can be either an instance of the class
    or the text shorthand for a particular marker.
    See :mod:`matplotlib.markers` for more information about marker

cmap : str or `~matplotlib.colors.Colormap`, default: :rc:`image.cmap`
    A `.Colormap` instance or registered colormap name. *cmap* is only
    used if *c* is an array of floats.

norm : `~matplotlib.colors.Normalize`, default: None
    If *c* is an array of floats, *norm* is used to scale the color
    data, *c*, in the range 0 to 1, in order to map into the colormap
    If *None*, use the default `.colors.Normalize`.

vmin, vmax : float, default: None
    *vmin* and *vmax* are used in conjunction with the default norm to
    map the color array *c* to the colormap *cmap*. If None, the
    respective min and max of the color array is used.
    It is an error to use *vmin*/*vmax* when *norm* is given.

alpha : float, default: None
    The alpha blending value, between 0 (transparent) and 1 (opaque).

linewidths : float or array-like, default: :rc:`lines.linewidth`
    The linewidth of the marker edges. Note: The default *edgecolors*
    is 'face'. You may want to change this as well.

edgecolors : {'face', 'none', *None*} or color or sequence of color, default: :rc:`scatter.edgecolors`
    The edge color of the marker. Possible values:

    - 'face': The edge color will always be the same as the face color.
    - 'none': No patch boundary will be drawn.
    - A color or sequence of colors.

    For non-filled markers, *edgecolors* is ignored. Instead, the color
    is determined like with 'face', i.e. from *c*, *colors*, or

plotnonfinite : bool, default: False
    Whether to plot points with nonfinite *c* (i.e. ``inf``, ``-inf``
    or ``nan``). If ``True`` the points are drawn with the *bad*
    colormap color (see `.Colormap.set_bad`).


Other Parameters
data : indexable object, optional
    If given, the following parameters also accept a string ``s``, which is
    interpreted as ``data[s]`` (unless this raises an exception):

    *x*, *y*, *s*, *linewidths*, *edgecolors*, *c*, *facecolor*, *facecolors*, *color*
**kwargs : `~matplotlib.collections.Collection` properties

See Also
plot : To plot scatter plots when markers are identical in size and

* The `.plot` function will be faster for scatterplots where markers
  don't vary in size or color.

* Any or all of *x*, *y*, *s*, and *c* may be masked arrays, in which
  case all masks will be combined and only unmasked points will be

* Fundamentally, scatter works with 1D arrays; *x*, *y*, *s*, and *c*
  may be input as N-D arrays, but within scatter they will be
  flattened. The exception is *c*, which will be flattened only if its
  size matches the size of *x* and *y*.

2.1.1 单特征与标签的相关性分析


import numpy as np

data = []
column_name = []
with open(file='iris.txt',mode='r') as f:
    # 过滤标题行
    line = f.readline()
    if line:
        column_name = np.array(line.strip().split(','))
    while True:
        line = f.readline()
        if line:

data = np.array(data,dtype=float)

# 使用切片提取前4列数据作为特征数据
X_data = data[:, :4]  # 或者 X_data = data[:, :-1]

# 使用切片提取最后1列数据作为标签数据
y_data = data[:, -1]

data.shape, X_data.shape, y_data.shape
((150, 5), (150, 4), (150,))
from matplotlib import pyplot as plt

plt.rcParams["font.sans-serif"]=["SimHei"] #设置字体
plt.rcParams["axes.unicode_minus"]=False #该语句解决图像中的“-”负号的乱码问题

fig, ax = plt.subplots()
# 花萼长度与鸢尾花分类的相关性分析
ax.scatter(X_data[:,0], y_data, c='blueviolet',marker='.', s=20)
Text(0, 0.5, '鸢尾花分类')

【Python数据科学快速入门系列 | 09】Matplotlib数据关系图表应用总结_第3张图片

2.1.1 双特征与标签的相关性分析



from matplotlib import pyplot as plt

plt.rcParams["font.sans-serif"]=["SimHei"] #设置字体
plt.rcParams["axes.unicode_minus"]=False #该语句解决图像中的“-”负号的乱码问题

fig, ax = plt.subplots()
# 花萼长度与鸢尾花分类的相关性分析
ax.scatter(x=X_data[:,0], y=X_data[:,1], c=y_data, marker='.', s=20)
Text(0, 0.5, '花萼宽度')

【Python数据科学快速入门系列 | 09】Matplotlib数据关系图表应用总结_第4张图片


from cProfile import label
from matplotlib import pyplot as plt

plt.rcParams["font.sans-serif"]=["SimHei"] #设置字体
plt.rcParams["axes.unicode_minus"]=False #该语句解决图像中的“-”负号的乱码问题

fig, ax = plt.subplots()
# 花萼长度与鸢尾花分类的相关性分析
ax.scatter(x=X_data[:,0][y_data==0], y=X_data[:,1][y_data==0], c="red", marker='.', s=20, label="Setosa")
ax.scatter(x=X_data[:,0][y_data==1], y=X_data[:,1][y_data==1], c="green", marker='.', s=20, label="Versicolor")
ax.scatter(x=X_data[:,0][y_data==2], y=X_data[:,1][y_data==2], c="blueviolet", marker='.', s=20, label="Virginical")



【Python数据科学快速入门系列 | 09】Matplotlib数据关系图表应用总结_第5张图片

2.1.3 双特征散点矩阵与标签的相关性分析

因为鸢尾花的特征数量其实为4个,那么其实有 C 4 2 C_{4}^{2} C42共12种组合,可以通过散点矩阵图来实现可视化展现双变量与标签之间的相关性分析。

from matplotlib import pyplot as plt
import seaborn as sns

fig, ax = plt.subplots(4,4, figsize=(12, 12))

for i in range(4):
    for j in range(4):
        if i != j:
            ax[i][j].scatter(X_data[:,i], X_data[:,j], c=y_data, cmap='brg', s=10)
            # 核密度曲线KDE
            sns.kdeplot(X_data[y_data==0][:,i], ax=ax[i][j], label="Setosa")
            sns.kdeplot(X_data[y_data==1][:,i], ax=ax[i][j], label="Versicolor")
            sns.kdeplot(X_data[y_data==2][:,i], ax=ax[i][j], label="Virginical")

        # 显示横轴自变量名称
        if i==0:
        # 显示纵轴自变量名称
        if j==0:


【Python数据科学快速入门系列 | 09】Matplotlib数据关系图表应用总结_第6张图片


2.2 散点曲线图


import numpy as np
from matplotlib import pyplot as plt

mu = 0
sigma = 0.1

x_data = np.linspace(0,2*np.pi, 100)
y_data = np.sin(x_data)

error = 1.2
noise = np.random.normal(loc=mu, scale=sigma, size=len(y_data))*error

y_data_1 = y_data + noise

plt.scatter(x_data, y_data_1, c="blueviolet", marker='.', s=20)
plt.plot(x_data, y_data, c='green')

【Python数据科学快速入门系列 | 09】Matplotlib数据关系图表应用总结_第7张图片

2.3 曲线图


import numpy as np
from matplotlib import pyplot as plt

mu = 0
sigma = 0.1

x_data = np.linspace(0, 50, 100)
y_data = x_data**2 + x_data + 15
y_data_2 = (-1) * x_data**2 - x_data + 15

plt.plot(x_data, y_data, c='green')
plt.plot(x_data, y_data_2, c='red')

【Python数据科学快速入门系列 | 09】Matplotlib数据关系图表应用总结_第8张图片

3. 总结



