Matplotlib(3、直方图) - plt.hist()参数解释&应用实例

matplotlib画直方图 - plt.hist()

一、plt.hist()参数详解

简介:
plt.hist():直方图,一种特殊的柱状图。
将统计值的范围分段,即将整个值的范围分成一系列间隔,然后计算每个间隔中有多少值。
直方图也可以被归一化以显示“相对”频率。 然后,它显示了属于几个类别中的每个类别的占比,其高度总和等于1。

import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.pyplot import MultipleLocator
from matplotlib import ticker
%matplotlib inline


plt.hist(x, bins=None, range=None, density=None, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, label=None, stacked=False, normed=None, *, data=None, **kwargs)

常用参数解释:
x: 作直方图所要用的数据,必须是一维数组;多维数组可以先进行扁平化再作图;必选参数;
bins: 直方图的柱数,即要分的组数,默认为10;
range:元组(tuple)或None;剔除较大和较小的离群值,给出全局范围;如果为None,则默认为(x.min(), x.max());即x轴的范围;
density:布尔值。如果为true,则返回的元组的第一个参数n将为频率而非默认的频数;
weights:与x形状相同的权重数组;将x中的每个元素乘以对应权重值再计数;如果normed或density取值为True,则会对权重进行归一化处理。这个参数可用于绘制已合并的数据的直方图;
cumulative:布尔值;如果为True,则计算累计频数;如果normed或density取值为True,则计算累计频率;
bottom:数组,标量值或None;每个柱子底部相对于y=0的位置。如果是标量值,则每个柱子相对于y=0向上/向下的偏移量相同。如果是数组,则根据数组元素取值移动对应的柱子;即直方图上下便宜距离;
histtype:{‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’};'bar’是传统的条形直方图;'barstacked’是堆叠的条形直方图;'step’是未填充的条形直方图,只有外边框;‘stepfilled’是有填充的直方图;当histtype取值为’step’或’stepfilled’,rwidth设置失效,即不能指定柱子之间的间隔,默认连接在一起;
align:{‘left’, ‘mid’, ‘right’};‘left’:柱子的中心位于bins的左边缘;‘mid’:柱子位于bins左右边缘之间;‘right’:柱子的中心位于bins的右边缘;
orientation:{‘horizontal’, ‘vertical’}:如果取值为horizontal,则条形图将以y轴为基线,水平排列;简单理解为类似bar()转换成barh(),旋转90°;
rwidth:标量值或None。柱子的宽度占bins宽的比例;
log:布尔值。如果取值为True,则坐标轴的刻度为对数刻度;如果log为True且x是一维数组,则计数为0的取值将被剔除,仅返回非空的(frequency, bins, patches);
color:具体颜色,数组(元素为颜色)或None。
label:字符串(序列)或None;有多个数据集时,用label参数做标注区分;
stacked:布尔值。如果取值为True,则输出的图为多个数据集堆叠累计的结果;如果取值为False且histtype=‘bar’或’step’,则多个数据集的柱子并排排列;
normed: 是否将得到的直方图向量归一化,即显示占比,默认为0,不归一化;不推荐使用,建议改用density参数;
edgecolor: 直方图边框颜色;
alpha: 透明度;

返回值(用参数接收返回值,便于设置数据标签):
n:直方图向量,即每个分组下的统计值,是否归一化由参数normed设定。当normed取默认值时,n即为直方图各组内元素的数量(各组频数);
bins: 返回各个bin的区间范围;
patches:返回每个bin里面包含的数据,是一个list。
其他参数与plt.bar()类似。

二、plt.hist()简单应用

import matplotlib.pyplot as plt
%matplotlib inline


# 最简单,只传递x,组数,宽度,范围
plt.hist(data13['carrier_no'], bins=11, rwidth=0.8, range=(1,12), align='left')
plt.show()

Matplotlib(3、直方图) - plt.hist()参数解释&应用实例_第1张图片

三、plt.bar()综合应用

import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.pyplot import MultipleLocator
from matplotlib import ticker
%matplotlib inline


plt.figure(figsize=(8,5), dpi=80)
# 拿参数接收hist返回值,主要用于记录分组返回的值,标记数据标签
n, bins, patches = plt.hist(data13['carrier_no'], bins=11, rwidth=0.8, range=(1,12), align='left', label='xx直方图')
for i in range(len(n)):
    plt.text(bins[i], n[i]*1.02, int(n[i]), fontsize=12, horizontalalignment="center") #打标签,在合适的位置标注每个直方图上面样本数
plt.ylim(0,16000)
plt.title('直方图')
plt.legend()
# plt.savefig('直方图'+'.png')
plt.show()

Matplotlib(3、直方图) - plt.hist()参数解释&应用实例_第2张图片

官方参数解释

Parameters
----------
x : (n,) array or sequence of (n,) arrays
    Input values, this takes either a single array or a sequence of
    arrays which are not required to be of the same length.

bins : int or sequence or str, optional
    If an integer is given, ``bins + 1`` bin edges are calculated and
    returned, consistent with `numpy.histogram`.

    If `bins` is a sequence, gives bin edges, including left edge of
    first bin and right edge of last bin.  In this case, `bins` is
    returned unmodified.

    All but the last (righthand-most) bin is half-open.  In other
    words, if `bins` is::

        [1, 2, 3, 4]

    then the first bin is ``[1, 2)`` (including 1, but excluding 2) and
    the second ``[2, 3)``.  The last bin, however, is ``[3, 4]``, which
    *includes* 4.

    Unequally spaced bins are supported if *bins* is a sequence.

    With Numpy 1.11 or newer, you can alternatively provide a string
    describing a binning strategy, such as 'auto', 'sturges', 'fd',
    'doane', 'scott', 'rice' or 'sqrt', see
    `numpy.histogram`.

    The default is taken from :rc:`hist.bins`.

range : tuple or None, optional
    The lower and upper range of the bins. Lower and upper outliers
    are ignored. If not provided, *range* is ``(x.min(), x.max())``.
    Range has no effect if *bins* is a sequence.

    If *bins* is a sequence or *range* is specified, autoscaling
    is based on the specified bin range instead of the
    range of x.

    Default is ``None``

density : bool, optional
    If ``True``, the first element of the return tuple will
    be the counts normalized to form a probability density, i.e.,
    the area (or integral) under the histogram will sum to 1.
    This is achieved by dividing the count by the number of
    observations times the bin width and not dividing by the total
    number of observations. If *stacked* is also ``True``, the sum of
    the histograms is normalized to 1.

    Default is ``None`` for both *normed* and *density*. If either is
    set, then that value will be used. If neither are set, then the
    args will be treated as ``False``.

    If both *density* and *normed* are set an error is raised.

weights : (n, ) array_like or None, optional
    An array of weights, of the same shape as *x*.  Each value in *x*
    only contributes its associated weight towards the bin count
    (instead of 1).  If *normed* or *density* is ``True``,
    the weights are normalized, so that the integral of the density
    over the range remains 1.

    Default is ``None``.

    This parameter can be used to draw a histogram of data that has
    already been binned, e.g. using `np.histogram` (by treating each
    bin as a single point with a weight equal to its count) ::

        counts, bins = np.histogram(data)
        plt.hist(bins[:-1], bins, weights=counts)

    (or you may alternatively use `~.bar()`).

cumulative : bool, optional
    If ``True``, then a histogram is computed where each bin gives the
    counts in that bin plus all bins for smaller values. The last bin
    gives the total number of datapoints. If *normed* or *density*
    is also ``True`` then the histogram is normalized such that the
    last bin equals 1. If *cumulative* evaluates to less than 0
    (e.g., -1), the direction of accumulation is reversed.
    In this case, if *normed* and/or *density* is also ``True``, then
    the histogram is normalized such that the first bin equals 1.

    Default is ``False``

bottom : array_like, scalar, or None
    Location of the bottom baseline of each bin.  If a scalar,
    the base line for each bin is shifted by the same amount.
    If an array, each bin is shifted independently and the length
    of bottom must match the number of bins.  If None, defaults to 0.

    Default is ``None``

histtype : {'bar', 'barstacked', 'step',  'stepfilled'}, optional
    The type of histogram to draw.

    - 'bar' is a traditional bar-type histogram.  If multiple data
      are given the bars are arranged side by side.

    - 'barstacked' is a bar-type histogram where multiple
      data are stacked on top of each other.

    - 'step' generates a lineplot that is by default
      unfilled.

    - 'stepfilled' generates a lineplot that is by default
      filled.

    Default is 'bar'

align : {'left', 'mid', 'right'}, optional
    Controls how the histogram is plotted.

        - 'left': bars are centered on the left bin edges.

        - 'mid': bars are centered between the bin edges.

        - 'right': bars are centered on the right bin edges.

    Default is 'mid'

orientation : {'horizontal', 'vertical'}, optional
    If 'horizontal', `~matplotlib.pyplot.barh` will be used for
    bar-type histograms and the *bottom* kwarg will be the left edges.

rwidth : scalar or None, optional
    The relative width of the bars as a fraction of the bin width.  If
    ``None``, automatically compute the width.

    Ignored if *histtype* is 'step' or 'stepfilled'.

    Default is ``None``

log : bool, optional
    If ``True``, the histogram axis will be set to a log scale. If
    *log* is ``True`` and *x* is a 1D array, empty bins will be
    filtered out and only the non-empty ``(n, bins, patches)``
    will be returned.

    Default is ``False``

color : color or array_like of colors or None, optional
    Color spec or sequence of color specs, one per dataset.  Default
    (``None``) uses the standard line color sequence.

    Default is ``None``

label : str or None, optional
    String, or sequence of strings to match multiple datasets.  Bar
    charts yield multiple patches per dataset, but only the first gets
    the label, so that the legend command will work as expected.

    default is ``None``

stacked : bool, optional
    If ``True``, multiple data are stacked on top of each other If
    ``False`` multiple data are arranged side by side if histtype is
    'bar' or on top of each other if histtype is 'step'

    Default is ``False``

normed : bool, optional
    Deprecated; use the density keyword argument instead.

Returns
-------
n : array or list of arrays
    The values of the histogram bins. See *density* and *weights* for a
    description of the possible semantics.  If input *x* is an array,
    then this is an array of length *nbins*. If input is a sequence of
    arrays ``[data1, data2,..]``, then this is a list of arrays with
    the values of the histograms for each of the arrays in the same
    order.  The dtype of the array *n* (or of its element arrays) will
    always be float even if no weighting or normalization is used.

bins : array
    The edges of the bins. Length nbins + 1 (nbins left edges and right
    edge of last bin).  Always a single array even when multiple data
    sets are passed in.

patches : list or list of lists
    Silent list of individual patches used to create the histogram
    or list of such list if multiple input datasets.

Other Parameters
----------------
**kwargs : `~matplotlib.patches.Patch` properties

See also
--------
hist2d : 2D histograms

Notes
-----


.. note::
    In addition to the above described arguments, this function can take a
    **data** keyword argument. If such a **data** argument is given, the
    following arguments are replaced by **data[]**:

    * All arguments with the following names: 'weights', 'x'.

    Objects passed as **data** must support item access (``data[]``) and
    membership test (`` in data``).

你可能感兴趣的:(Python可视化,python,可视化,matplotlib)