Py之seaborn:数据可视化seaborn库的柱状图、箱线图(置信区间图)、散点图/折线图、核密度图/等高线图、盒形图/小提琴图/LV多框图的简介、使用方法之最强攻略(建议收藏)
导读:数据可视化是以客观数据为主体,从数据角度窥探这个世界;目的是描述真实,洞察未知;从浩如烟海的复杂数据中理出头绪,化繁为简,变成看得见的财富,要让行动的决策人在短时间内看得懂,从而实现更高效的决策。它主要是借助于图形化手段,清晰有效地传达与沟通信息。但是,这并不就意味着数据可视化就一定因为要实现其功能用途而令人感到枯燥乏味,或者是为了看上去绚丽多彩而显得极端复杂。为了有效地传达思想概念,美学形式与功能需要齐头并进,通过直观地传达关键的方面与特征,从而实现对于相当稀疏而又复杂的数据集的深入洞察。
目录
一、如何选择图表类型?
二、seaborn库中单独绘图的11种函数讲解:数据可视化Seaborn库的柱状图、箱线图(置信区间图)、散点图/折线图、核密度图/等高线图、盒形图/小提琴图/LV多框图的简介、使用方法之最强攻略(建议收藏)
1、countplot函数:柱状图(类别特征计算重复个数)
2、catplot函数:柱状图、箱型图(置信区间)、散点图、小提琴图等
(1)、CatPlotByG
(2)、CatPlotByHG
3、barplot函数:条形图可视化
(1)、BarPlot
(2)、BarPlotByV
(3)、BarPlotBy2V
4、pointplot函数:点估计和置信区间可视化(误差条)
5、stripplot函数:散点图可视化
6、relplot函数:散点图/折线图可视化
7、regplot函数:散点线性回归分析图/置信区间图可视化
(1)、default
(2)、单独,用log(x)拟合回归模型并截断模型预测
8、kdeplot函数:核密度等高线图可视化
9、boxplot函数:盒形图可视化
10、violinplot函数:小提琴图可视化
11、boxenplot函数:LV多框图可视化
相关文章
Py之seaborn:seaborn库的简介、安装、使用方法之详细攻略
seaborn.countplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, dodge=True, ax=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.countplot.html?highlight=countplot#seaborn.countplot
Show the counts of observations in each categorical bin using bars. A count plot can be thought of as a histogram across a categorical, instead of quantitative, variable. The basic API and options are identical to those for barplot(), so you can compare counts across nested variables. Input data can be passed in a variety of formats, including: Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters. A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted. A “wide-form” DataFrame, such that each numeric column will be plotted. An array or list of vectors. |
使用条形显示每个分类箱中的观察值的个数。 计数图可以看作是一个分类变量的直方图,而不是定量变量。基本API和选项与barplot()相同,因此可以跨嵌套变量比较计数。 输入数据可以以多种格式传递,包括: 表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。 一个“long-form”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。 一种“wide-form”数据帧,这样每个数字列都将被绘制出来。 向量的数组或列表。 |
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements.
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type. |
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制绘图元素的顺序。
该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。 |
seaborn.catplot(*, x=None, y=None, hue=None, data=None, row=None, col=None, col_wrap=None, estimator=, ci=95, n_boot=1000, units=None, seed=None, order=None, hue_order=None, row_order=None, col_order=None, kind='strip', height=5, aspect=1, orient=None, color=None, palette=None, legend=True, legend_out=True, sharex=True, sharey=True, margin_titles=False, facet_kws=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.catplot.html?highlight=catplot#seaborn.catplot
Figure-level interface for drawing categorical plots onto a FacetGrid. This function provides access to several axes-level functions that show the relationship between a numerical and one or more categorical variables using one of several visual representations. The kind parameter selects the underlying axes-level function to use:
Categorical scatterplots: stripplot() (with kind="strip"; the default) swarmplot() (with kind="swarm")
Categorical distribution plots: boxplot() (with kind="box") violinplot() (with kind="violin") boxenplot() (with kind="boxen")
Categorical estimate plots: pointplot() (with kind="point") barplot() (with kind="bar") countplot() (with kind="count") |
用于在FacetGrid上绘制分类图的图形级界面。 这个函数提供了对几个轴级函数的访问,这些函数使用一种可视表示表示数值和一个或多个分类变量之间的关系。kind参数选择要使用的axis级函数:
分类散点图: stripplot() (with kind="strip"; the default) swarmplot() (with kind="swarm"
分类分布绘图: boxplot() (with kind="box") violinplot() (with kind="violin") boxenplot() (with kind="boxen")
分类预测绘图: pointplot() (with kind="point") barplot() (with kind="bar") countplot() (with kind="count") |
Extra keyword arguments are passed to the underlying function, so you should refer to the documentation for each to see kind-specific options.
Note that unlike when using the axes-level functions directly, data must be passed in a long-form DataFrame with variables specified by passing strings to x, y, hue, etc.
As in the case with the underlying plot functions, if variables have a categorical data type, the levels of the categorical variables, and their order will be inferred from the objects. Otherwise you may have to use alter the dataframe sorting or use the function parameters (orient, order, hue_order, etc.) to set up the plot correctly.
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type. |
额外的关键字参数被传递给底层函数,因此您应该参考每个函数的文档来查看特定种类的选项。
请注意,与直接使用ax级函数不同的是,数据必须在一个长格式的数据框架中传递,通过向x、y、hue等传递字符串来指定变量。
与基础绘图函数的情况一样,如果变量具有类别数据类型,则类别变量的级别及其顺序将从对象中推断出来。否则,你可能不得不使用alter the dataframe sorted或使用函数参数(orient, order, hue_order等)来正确设置绘图。
该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。 |
seaborn.barplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, estimator=, ci=95, n_boot=1000, units=None, seed=None, orient=None, color=None, palette=None, saturation=0.75, errcolor='.26', errwidth=None, capsize=None, dodge=True, ax=None, **kwargs)
仅第2变量必须为数值型
条形图表示数值变量与每个矩形高度的中心趋势的估计值(默认平均值),并使用误差条提供关于该估计值附近的不确定性的一些指示。误差条越长,数据离散程度越大,数据越不稳定。
官方文档解释:http://seaborn.pydata.org/generated/seaborn.barplot.html?highlight=barplot#seaborn.barplot
Show point estimates and confidence intervals as rectangular bars. A bar plot represents an estimate of central tendency for a numeric variable with the height of each rectangle and provides some indication of the uncertainty around that estimate using error bars. Bar plots include 0 in the quantitative axis range, and they are a good choice when 0 is a meaningful value for the quantitative variable, and you want to make comparisons against it. For datasets where 0 is not a meaningful value, a point plot will allow you to focus on differences between levels of one or more categorical variables. It is also important to keep in mind that a bar plot shows only the mean (or other estimator) value, but in many cases it may be more informative to show the distribution of values at each level of the categorical variables. In that case, other approaches such as a box or violin plot may be more appropriate. |
用矩形条显示点估计和置信区间。 条形图表示对每个矩形高度的数值变量的集中趋势的估计,并使用误差条提供了一些关于估计的不确定性的指示。条形图在数量轴范围中包括0,当0是数量变量的一个有意义的值,并希望与之进行比较时,条形图是一个很好的选择。 对于0不是一个有意义的值的数据集,点图将允许你关注一个或多个分类变量的不同级别。 同样重要的是要记住,条形图只显示平均值(或其他估计值),但在许多情况下,显示分类变量每一级的值分布可能会提供更多信息。在这种情况下,其他方法,如盒子或小提琴情节可能更合适。 |
Input data can be passed in a variety of formats, including:
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements. This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type. |
输入数据可以以多种格式传递,包括:
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制绘图元素的顺序。 该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有numeric 或date 类型。 |
seaborn.pointplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, estimator=, ci=95, n_boot=1000, units=None, seed=None, markers='o', linestyles='-', dodge=False, join=True, scale=1, orient=None, color=None, palette=None, errwidth=None, capsize=None, ax=None, **kwargs)
仅第2变量必须为数值型
置信区间估计:图中的点为该组数据的平均值点,竖线则为误差条,默认两个均值点会相连接
官方文档解释:http://seaborn.pydata.org/generated/seaborn.pointplot.html?highlight=pointplot#seaborn.pointplot
Show point estimates and confidence intervals using scatter plot glyphs. A point plot represents an estimate of central tendency for a numeric variable by the position of scatter plot points and provides some indication of the uncertainty around that estimate using error bars. Point plots can be more useful than bar plots for focusing comparisons between different levels of one or more categorical variables. They are particularly adept at showing interactions: how the relationship between levels of one categorical variable changes across levels of a second categorical variable. The lines that join each point from the same hue level allow interactions to be judged by differences in slope, which is easier for the eyes than comparing the heights of several groups of points or bars. It is important to keep in mind that a point plot shows only the mean (or other estimator) value, but in many cases it may be more informative to show the distribution of values at each level of the categorical variables. In that case, other approaches such as a box or violin plot may be more appropriate. |
使用散点图符号显示点估计和置信区间。 点图通过散点的位置表示对数值变量的集中趋势的估计,并使用误差条提供一些关于估计的不确定性的指示。 点图比条形图更有助于集中比较一个或多个分类变量的不同层次。他们特别擅长展示交互作用:一个分类变量的各个层次之间的关系如何在另一个分类变量的各个层次之间发生变化。连接来自同一色调等级的每个点的线条允许通过斜率的差异来判断交互作用,这比比较几组点或条的高度更容易。 重要的是要记住点图只显示平均值(或其他估计值),但在许多情况下,显示分类变量的每一级值的分布可能会提供更多的信息。在这种情况下,其他方法,如盒子或小提琴情节可能更合适。 |
Input data can be passed in a variety of formats, including: Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters. A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted. A “wide-form” DataFrame, such that each numeric column will be plotted. An array or list of vectors.
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements. This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type. |
输入数据可以以多种格式传递,包括: 表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制绘图元素的顺序。 该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。 |
seaborn.stripplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, jitter=True, dodge=False, orient=None, color=None, palette=None, size=5, edgecolor='gray', linewidth=0, ax=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.stripplot.html?highlight=stripplot#seaborn.stripplot
Draw a scatterplot where one variable is categorical. A strip plot can be drawn on its own, but it is also a good complement to a box or violin plot in cases where you want to show all observations along with some representation of the underlying distribution. Input data can be passed in a variety of formats, including: Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters. A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted. A “wide-form” DataFrame, such that each numeric column will be plotted. An array or list of vectors. |
绘制一个散点图,其中一个变量是类别变量。 条形图可以自己绘制,但在您想要显示所有观察结果以及一些潜在分布的表示的情况下,它也是盒形图或小提琴形图的一个很好的补充。
输入数据可以以多种格式传递,包括: 表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。 一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。 一种“宽格式”数据帧,这样每个数字列都将被绘制出来。 向量的数组或列表。
|
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements.
This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type. |
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制情节元素的顺序。 该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。 |
seaborn.relplot(*, x=None, y=None, hue=None, size=None, style=None, data=None, row=None, col=None, col_wrap=None, row_order=None, col_order=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=None, dashes=None, style_order=None, legend='auto', kind='scatter', height=5, aspect=1, facet_kws=None, units=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.relplot.html?highlight=relplot#seaborn.relplot
Figure-level interface for drawing relational plots onto a FacetGrid. This function provides access to several different axes-level functions that show the relationship between two variables with semantic mappings of subsets. The kind parameter selects the underlying axes-level function to use: scatterplot() (with kind="scatter"; the default) lineplot() (with kind="line") Extra keyword arguments are passed to the underlying function, so you should refer to the documentation for each to see kind-specific options. The relationship between x and y can be shown for different subsets of the data using the hue, size, and style parameters. These parameters control what visual semantics are used to identify the different subsets. It is possible to show up to three dimensions independently by using all three semantic types, but this style of plot can be hard to interpret and is often ineffective. Using redundant semantics (i.e. both hue and style for the same variable) can be helpful for making graphics more accessible. See the tutorial for more information. |
用于在FacetGrid上绘制关系图的图形级接口。 这个函数提供了对几个不同的轴级函数的访问,这些函数显示了两个具有子集语义映射的变量之间的关系。kind参数选择要使用的axis级函数: scatterplot() (with kind="scatter"; the default) lineplot() (with kind="line") 额外的关键字参数被传递给底层函数,因此您应该参考每个函数的文档来查看特定种类的选项。 x和y之间的关系可以通过使用hue、size和style参数来显示数据的不同子集。这些参数控制使用什么视觉语义来标识不同的子集。通过使用这三种语义类型,我们可以独立呈现出三个维度,但这种绘图风格很难解释,而且通常是无效的。使用冗余的语义(例如,相同变量的色调和样式)有助于让图形更容易访问。 有关更多信息,请参阅本教程。 |
The default treatment of the hue (and to a lesser extent, size) semantic, if present, depends on whether the variable is inferred to represent “numeric” or “categorical” data. In particular, numeric variables are represented with a sequential colormap by default, and the legend entries show regular “ticks” with values that may or may not exist in the data. This behavior can be controlled through various parameters, as described and illustrated below. After plotting, the FacetGrid with the plot is returned and can be used directly to tweak supporting plot details or add other layers. |
如果存在色相(以及较小程度上的大小)语义的默认处理,则取决于该变量是被推断为表示“numeric”还是“categorical”数据。具体来说,默认情况下,数值变量用顺序的colormap表示,并且图例条目显示有规律的“刻度”,刻度的值可能存在于数据中,也可能不存在。这种行为可以通过各种参数来控制,如下面的描述和说明所示。 绘制后,返回带有plot的FacetGrid,可以直接用于调整支持的plot细节或添加其他层。 |
seaborn.regplot(*, x=None, y=None, data=None, x_estimator=None, x_bins=None, x_ci='ci', scatter=True, fit_reg=True, ci=95, n_boot=1000, units=None, seed=None, order=1, logistic=False, lowess=False, robust=False, logx=False, x_partial=None, y_partial=None, truncate=True, dropna=True, x_jitter=None, y_jitter=None, label=None, color=None, marker='o', scatter_kws=None, line_kws=None, ax=None)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.regplot.html?highlight=regplot#seaborn.regplot
Plot data and a linear regression model fit. There are a number of mutually exclusive options for estimating the regression model. See the tutorial for more information. |
图数据与线性回归模型拟合。 估计回归模型有许多互斥的选项。有关更多信息,请参阅本教程。 |
单独,x_estimator=np.mean, # 如果x为离散数据,显示其平均值。
seaborn.kdeplot(x=None, *, y=None, shade=None, vertical=False, kernel=None, bw=None, gridsize=200, cut=3, clip=None, legend=True, cumulative=False, shade_lowest=None, cbar=False, cbar_ax=None, cbar_kws=None, ax=None, weights=None, hue=None, palette=None, hue_order=None, hue_norm=None, multiple='layer', common_norm=True, common_grid=False, levels=10, thresh=0.05, bw_method='scott', bw_adjust=1, log_scale=None, color=None, fill=None, data=None, data2=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.kdeplot.html?highlight=kdeplot#seaborn.kdeplot
Plot univariate or bivariate distributions using kernel density estimation. A kernel density estimate (KDE) plot is a method for visualizing the distribution of observations in a dataset, analagous to a histogram. KDE represents the data using a continuous probability density curve in one or more dimensions. The approach is explained further in the user guide. |
使用核密度估计绘制单变量或双变量分布。 核密度估计(KDE)图是一种可视化数据集中观测值分布的方法,类似于直方图。KDE使用一个或多个维度的连续概率密度曲线表示数据。 该方法在用户指南中有进一步的解释。 |
seaborn.boxplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, fliersize=5, linewidth=None, whis=1.5, ax=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.boxplot.html?highlight=boxplot#seaborn.boxplot
Draw a box plot to show distributions with respect to categories. A box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that facilitates comparisons between variables or across levels of a categorical variable. The box shows the quartiles of the dataset while the whiskers extend to show the rest of the distribution, except for points that are determined to be “outliers” using a method that is a function of the inter-quartile range.
Input data can be passed in a variety of formats, including: Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters. A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted. A “wide-form” DataFrame, such that each numeric column will be plotted. An array or list of vectors. |
画一个盒形图来显示与类别特征有关的分布。 盒形图(或盒须图)显示了定量数据的分布,以促进变量之间的比较或分类变量的层次。盒形图显示数据集的四分位数,而须扩展显示分布的其余部分,除了使用四分位数间范围函数的方法确定为“异常值”的点。
输入数据可以以多种格式传递,包括: 表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。 一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。 一种“宽格式”数据帧,这样每个数字列都将被绘制出来。 向量的数组或列表。 |
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements. This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type. |
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制绘图元素的顺序。 该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。 |
seaborn.violinplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, bw='scott', cut=2, scale='area', scale_hue=True, gridsize=100, width=0.8, inner='box', split=False, dodge=True, orient=None, linewidth=None, color=None, palette=None, saturation=0.75, ax=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.violinplot.html?highlight=violinplot#seaborn.violinplot
Draw a combination of boxplot and kernel density estimate. A violin plot plays a similar role as a box and whisker plot. It shows the distribution of quantitative data across several levels of one (or more) categorical variables such that those distributions can be compared. Unlike a box plot, in which all of the plot components correspond to actual datapoints, the violin plot features a kernel density estimation of the underlying distribution. This can be an effective and attractive way to show multiple distributions of data at once, but keep in mind that the estimation procedure is influenced by the sample size, and violins for relatively small samples might look misleadingly smooth. Input data can be passed in a variety of formats, including: Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters. A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted. A “wide-form” DataFrame, such that each numeric column will be plotted. An array or list of vectors. |
绘制箱线图和核密度估计的组合。 小提琴图的作用类似于盒子和胡须图。它显示了定量数据在一个(或多个)分类变量的几个层次上的分布,这样就可以比较这些分布。盒形图的所有图组件都对应于实际的数据点,与盒形图不同,小提琴形图的特点是对底层分布的核密度估计。 这是一种有效且有吸引力的同时显示多个数据分布的方法,但请记住,估计过程受到样本大小的影响,相对较小的样本可能看起来平滑得令人误解。
输入数据可以以多种格式传递,包括: 表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。 一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。 一种“宽格式”数据帧,这样每个数字列都将被绘制出来。 向量的数组或列表。 |
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements. This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type. |
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制情节元素的顺序。 该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。 |
split=True, # 设置是否拆分小提琴图,前提条件是第三特征为二类别属性,尝试测试
seaborn.boxenplot(*, x=None, y=None, hue=None, data=None, order=None, hue_order=None, orient=None, color=None, palette=None, saturation=0.75, width=0.8, dodge=True, k_depth='tukey', linewidth=None, scale='exponential', outlier_prop=0.007, trust_alpha=0.05, showfliers=True, ax=None, **kwargs)
官方文档解释:http://seaborn.pydata.org/generated/seaborn.boxenplot.html?highlight=boxenplot#seaborn.boxenplot
Draw an enhanced box plot for larger datasets. This style of plot was originally named a “letter value” plot because it shows a large number of quantiles that are defined as “letter values”. It is similar to a box plot in plotting a nonparametric representation of a distribution in which all features correspond to actual observations. By plotting more quantiles, it provides more information about the shape of the distribution, particularly in the tails. For a more extensive explanation, you can read the paper that introduced the plot: https://vita.had.co.nz/papers/letter-value-plot.html
Input data can be passed in a variety of formats, including: Vectors of data represented as lists, numpy arrays, or pandas Series objects passed directly to the x, y, and/or hue parameters. A “long-form” DataFrame, in which case the x, y, and hue variables will determine how the data are plotted. A “wide-form” DataFrame, such that each numeric column will be plotted. An array or list of vectors. |
为更大的数据集绘制增强的箱线图。 这种样式的图最初被称为“字母值”图,因为它显示了大量定义为“字母值”的分位数。它类似于用非参数表示一个分布的箱线图,其中所有的特征都对应于实际的观察结果。通过绘制更多的分位数,它提供了更多关于分布形状的信息,特别是在尾部。要想获得更广泛的解释,你可以阅读介绍情节的文章: https://vita.had.co.nz/papers/letter-value-plot.html
输入数据可以以多种格式传递,包括: 表示为列表、numpy数组或pandas系列对象的数据向量,直接传递给x、y和/或hue参数。 一个“长格式”数据帧,在这种情况下,x, y和hue变量将决定数据如何绘制。 一种“宽格式”数据帧,这样每个数字列都将被绘制出来。 向量的数组或列表。 |
In most cases, it is possible to use numpy or Python objects, but pandas objects are preferable because the associated names will be used to annotate the axes. Additionally, you can use Categorical types for the grouping variables to control the order of plot elements. This function always treats one of the variables as categorical and draws data at ordinal positions (0, 1, … n) on the relevant axis, even when the data has a numeric or date type. |
在大多数情况下,可以使用numpy或Python对象,但pandas对象更合适,因为关联的名称将用于注释坐标轴。此外,您可以为分组变量使用类别类型来控制情节元素的顺序。 该函数总是将其中一个变量视为类别变量,并在相关轴上的顺序位置(0,1,…n)绘制数据,即使数据具有数字或日期类型。 |