简介:
plt.hist():直方图,一种特殊的柱状图。
将统计值的范围分段,即将整个值的范围分成一系列间隔,然后计算每个间隔中有多少值。
直方图也可以被归一化以显示“相对”频率。 然后,它显示了属于几个类别中的每个类别的占比,其高度总和等于1。
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.pyplot import MultipleLocator
from matplotlib import ticker
%matplotlib inline
plt.hist(x, bins=None, range=None, density=None, weights=None, cumulative=False, bottom=None, histtype='bar', align='mid', orientation='vertical', rwidth=None, log=False, color=None, label=None, stacked=False, normed=None, *, data=None, **kwargs)
常用参数解释:
x: 作直方图所要用的数据,必须是一维数组;多维数组可以先进行扁平化再作图;必选参数;
bins: 直方图的柱数,即要分的组数,默认为10;
range:元组(tuple)或None;剔除较大和较小的离群值,给出全局范围;如果为None,则默认为(x.min(), x.max());即x轴的范围;
density:布尔值。如果为true,则返回的元组的第一个参数n将为频率而非默认的频数;
weights:与x形状相同的权重数组;将x中的每个元素乘以对应权重值再计数;如果normed或density取值为True,则会对权重进行归一化处理。这个参数可用于绘制已合并的数据的直方图;
cumulative:布尔值;如果为True,则计算累计频数;如果normed或density取值为True,则计算累计频率;
bottom:数组,标量值或None;每个柱子底部相对于y=0的位置。如果是标量值,则每个柱子相对于y=0向上/向下的偏移量相同。如果是数组,则根据数组元素取值移动对应的柱子;即直方图上下便宜距离;
histtype:{‘bar’, ‘barstacked’, ‘step’, ‘stepfilled’};'bar’是传统的条形直方图;'barstacked’是堆叠的条形直方图;'step’是未填充的条形直方图,只有外边框;‘stepfilled’是有填充的直方图;当histtype取值为’step’或’stepfilled’,rwidth设置失效,即不能指定柱子之间的间隔,默认连接在一起;
align:{‘left’, ‘mid’, ‘right’};‘left’:柱子的中心位于bins的左边缘;‘mid’:柱子位于bins左右边缘之间;‘right’:柱子的中心位于bins的右边缘;
orientation:{‘horizontal’, ‘vertical’}:如果取值为horizontal,则条形图将以y轴为基线,水平排列;简单理解为类似bar()转换成barh(),旋转90°;
rwidth:标量值或None。柱子的宽度占bins宽的比例;
log:布尔值。如果取值为True,则坐标轴的刻度为对数刻度;如果log为True且x是一维数组,则计数为0的取值将被剔除,仅返回非空的(frequency, bins, patches);
color:具体颜色,数组(元素为颜色)或None。
label:字符串(序列)或None;有多个数据集时,用label参数做标注区分;
stacked:布尔值。如果取值为True,则输出的图为多个数据集堆叠累计的结果;如果取值为False且histtype=‘bar’或’step’,则多个数据集的柱子并排排列;
normed: 是否将得到的直方图向量归一化,即显示占比,默认为0,不归一化;不推荐使用,建议改用density参数;
edgecolor: 直方图边框颜色;
alpha: 透明度;
返回值(用参数接收返回值,便于设置数据标签):
n:直方图向量,即每个分组下的统计值,是否归一化由参数normed设定。当normed取默认值时,n即为直方图各组内元素的数量(各组频数);
bins: 返回各个bin的区间范围;
patches:返回每个bin里面包含的数据,是一个list。
其他参数与plt.bar()类似。
import matplotlib.pyplot as plt
%matplotlib inline
# 最简单,只传递x,组数,宽度,范围
plt.hist(data13['carrier_no'], bins=11, rwidth=0.8, range=(1,12), align='left')
plt.show()
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.pyplot import MultipleLocator
from matplotlib import ticker
%matplotlib inline
plt.figure(figsize=(8,5), dpi=80)
# 拿参数接收hist返回值,主要用于记录分组返回的值,标记数据标签
n, bins, patches = plt.hist(data13['carrier_no'], bins=11, rwidth=0.8, range=(1,12), align='left', label='xx直方图')
for i in range(len(n)):
plt.text(bins[i], n[i]*1.02, int(n[i]), fontsize=12, horizontalalignment="center") #打标签,在合适的位置标注每个直方图上面样本数
plt.ylim(0,16000)
plt.title('直方图')
plt.legend()
# plt.savefig('直方图'+'.png')
plt.show()
Parameters
----------
x : (n,) array or sequence of (n,) arrays
Input values, this takes either a single array or a sequence of
arrays which are not required to be of the same length.
bins : int or sequence or str, optional
If an integer is given, ``bins + 1`` bin edges are calculated and
returned, consistent with `numpy.histogram`.
If `bins` is a sequence, gives bin edges, including left edge of
first bin and right edge of last bin. In this case, `bins` is
returned unmodified.
All but the last (righthand-most) bin is half-open. In other
words, if `bins` is::
[1, 2, 3, 4]
then the first bin is ``[1, 2)`` (including 1, but excluding 2) and
the second ``[2, 3)``. The last bin, however, is ``[3, 4]``, which
*includes* 4.
Unequally spaced bins are supported if *bins* is a sequence.
With Numpy 1.11 or newer, you can alternatively provide a string
describing a binning strategy, such as 'auto', 'sturges', 'fd',
'doane', 'scott', 'rice' or 'sqrt', see
`numpy.histogram`.
The default is taken from :rc:`hist.bins`.
range : tuple or None, optional
The lower and upper range of the bins. Lower and upper outliers
are ignored. If not provided, *range* is ``(x.min(), x.max())``.
Range has no effect if *bins* is a sequence.
If *bins* is a sequence or *range* is specified, autoscaling
is based on the specified bin range instead of the
range of x.
Default is ``None``
density : bool, optional
If ``True``, the first element of the return tuple will
be the counts normalized to form a probability density, i.e.,
the area (or integral) under the histogram will sum to 1.
This is achieved by dividing the count by the number of
observations times the bin width and not dividing by the total
number of observations. If *stacked* is also ``True``, the sum of
the histograms is normalized to 1.
Default is ``None`` for both *normed* and *density*. If either is
set, then that value will be used. If neither are set, then the
args will be treated as ``False``.
If both *density* and *normed* are set an error is raised.
weights : (n, ) array_like or None, optional
An array of weights, of the same shape as *x*. Each value in *x*
only contributes its associated weight towards the bin count
(instead of 1). If *normed* or *density* is ``True``,
the weights are normalized, so that the integral of the density
over the range remains 1.
Default is ``None``.
This parameter can be used to draw a histogram of data that has
already been binned, e.g. using `np.histogram` (by treating each
bin as a single point with a weight equal to its count) ::
counts, bins = np.histogram(data)
plt.hist(bins[:-1], bins, weights=counts)
(or you may alternatively use `~.bar()`).
cumulative : bool, optional
If ``True``, then a histogram is computed where each bin gives the
counts in that bin plus all bins for smaller values. The last bin
gives the total number of datapoints. If *normed* or *density*
is also ``True`` then the histogram is normalized such that the
last bin equals 1. If *cumulative* evaluates to less than 0
(e.g., -1), the direction of accumulation is reversed.
In this case, if *normed* and/or *density* is also ``True``, then
the histogram is normalized such that the first bin equals 1.
Default is ``False``
bottom : array_like, scalar, or None
Location of the bottom baseline of each bin. If a scalar,
the base line for each bin is shifted by the same amount.
If an array, each bin is shifted independently and the length
of bottom must match the number of bins. If None, defaults to 0.
Default is ``None``
histtype : {'bar', 'barstacked', 'step', 'stepfilled'}, optional
The type of histogram to draw.
- 'bar' is a traditional bar-type histogram. If multiple data
are given the bars are arranged side by side.
- 'barstacked' is a bar-type histogram where multiple
data are stacked on top of each other.
- 'step' generates a lineplot that is by default
unfilled.
- 'stepfilled' generates a lineplot that is by default
filled.
Default is 'bar'
align : {'left', 'mid', 'right'}, optional
Controls how the histogram is plotted.
- 'left': bars are centered on the left bin edges.
- 'mid': bars are centered between the bin edges.
- 'right': bars are centered on the right bin edges.
Default is 'mid'
orientation : {'horizontal', 'vertical'}, optional
If 'horizontal', `~matplotlib.pyplot.barh` will be used for
bar-type histograms and the *bottom* kwarg will be the left edges.
rwidth : scalar or None, optional
The relative width of the bars as a fraction of the bin width. If
``None``, automatically compute the width.
Ignored if *histtype* is 'step' or 'stepfilled'.
Default is ``None``
log : bool, optional
If ``True``, the histogram axis will be set to a log scale. If
*log* is ``True`` and *x* is a 1D array, empty bins will be
filtered out and only the non-empty ``(n, bins, patches)``
will be returned.
Default is ``False``
color : color or array_like of colors or None, optional
Color spec or sequence of color specs, one per dataset. Default
(``None``) uses the standard line color sequence.
Default is ``None``
label : str or None, optional
String, or sequence of strings to match multiple datasets. Bar
charts yield multiple patches per dataset, but only the first gets
the label, so that the legend command will work as expected.
default is ``None``
stacked : bool, optional
If ``True``, multiple data are stacked on top of each other If
``False`` multiple data are arranged side by side if histtype is
'bar' or on top of each other if histtype is 'step'
Default is ``False``
normed : bool, optional
Deprecated; use the density keyword argument instead.
Returns
-------
n : array or list of arrays
The values of the histogram bins. See *density* and *weights* for a
description of the possible semantics. If input *x* is an array,
then this is an array of length *nbins*. If input is a sequence of
arrays ``[data1, data2,..]``, then this is a list of arrays with
the values of the histograms for each of the arrays in the same
order. The dtype of the array *n* (or of its element arrays) will
always be float even if no weighting or normalization is used.
bins : array
The edges of the bins. Length nbins + 1 (nbins left edges and right
edge of last bin). Always a single array even when multiple data
sets are passed in.
patches : list or list of lists
Silent list of individual patches used to create the histogram
or list of such list if multiple input datasets.
Other Parameters
----------------
**kwargs : `~matplotlib.patches.Patch` properties
See also
--------
hist2d : 2D histograms
Notes
-----
.. note::
In addition to the above described arguments, this function can take a
**data** keyword argument. If such a **data** argument is given, the
following arguments are replaced by **data[]**:
* All arguments with the following names: 'weights', 'x'.
Objects passed as **data** must support item access (``data[]``) and
membership test (`` in data``).