pandas做数据分析(四):常用函数

一.统计信息

1.pandas.DataFrame.describe

DataFrame.describe(percentiles=None, include=None, exclude=None)

作用:
生成简要的统计信息,排除NaN值

参数:
percentiles : array-like, 可选,optional
The percentiles to include in the output. Should all be in the interval [0, 1]. By default percentiles is [.25, .5, .75], returning the 25th, 50th, and 75th percentiles.
include, exclude : list-like, ‘all’, or None (default)
Specify the form of the returned result. Either:
None to both (default). The result will include only numeric-typed columns or, if none are, only categorical columns.
A list of dtypes or strings to be included/excluded. To select all numeric types use numpy numpy.number. To select categorical objects use type object. See also the select_dtypes documentation. eg. df.describe(include=[‘O’])
If include is the string ‘all’, the output column-set will match the input one.
Returns:
summary: NDFrame of summary statistics
See also DataFrame.select_dtypes
Notes

The output DataFrame index depends on the requested dtypes:

For numeric dtypes, it will include: count, mean, std, min, max, and lower, 50, and upper percentiles.

For object dtypes (e.g. timestamps or strings), the index will include the count, unique, most common, and frequency of the most common. Timestamps also include the first and last items.

For mixed dtypes, the index will be the union of the corresponding output types. Non-applicable entries will be filled with NaN. Note that mixed-dtype outputs can only be returned from mixed-dtype inputs and appropriate use of the include/exclude arguments.

If multiple values have the highest count, then the count and most common pair will be arbitrarily chosen from among those with the highest count.

The include, exclude arguments are ignored for Series.

三.绘图相关

1.pandas.DataFrame.hist

使用matplotlib来画出DataFrame的直方图.有多少个列,就会画出多少个子图.
DataFrame.hist(data, column=None, by=None, grid=True, xlabelsize=None, xrot=None, ylabelsize=None, yrot=None, ax=None, sharex=False, sharey=False, figsize=None, layout=None, bins=10, **kwds)

参数:
data : DataFrame
column : 字符串或者序列,如果传进去了,就只会画指定的这些列的直方图.
by : object, optional
If passed, then used to form histograms for separate groups
grid : 布尔值,默认是True,表示是否显示网格线.
xlabelsize : int, default None
If specified changes the x-axis label size
xrot : float, default None
rotation of x axis labels
ylabelsize : int, default None
If specified changes the y-axis label size
yrot : float, default None
rotation of y axis labels
ax : matplotlib axes object, default None
sharex : boolean, default True if ax is None else False
In case subplots=True, share x axis and set some x axis labels to invisible; defaults to True if ax is None otherwise False if an ax is passed in; Be aware, that passing in both an ax and sharex=True will alter all x axis labels for all subplots in a figure!
sharey : boolean, default False
In case subplots=True, share y axis and set some y axis labels to invisible
figsize : tuple
The size of the figure to create in inches by default
layout: (optional) a tuple (rows, columns) for the layout of the histograms
bins: 整形,默认是10.表示在直方图中箱线条的数量.
kwds : other plotting keyword arguments
To be passed to hist function

你可能感兴趣的:(数据分析,pandas)