使用散点图的方式展示数据分布规律;
借助误差线或连接曲线;
抖动散点图(jitter scatter chart)
每个类别数据点的Y轴数值保持不变;
数据点X轴数值沿着X轴类别标签中心线在一定范围内随机生成;
再绘制成散点图;
抖动散点图的主要绘制参数:
数据点的抖动范围;
由于随机生成数据点的X轴数值,因此很容易存在数据点重合叠加情况,不利于观察数据分布规律;
plotnine中的geomjitter()函数可以绘制抖动散点图;
关键参数position=positionjitter(width=NULL),width表示水平方向左右抖动范围;
蜂巢图(hive chart)
每个类别数据点沿着X轴类别标签中心线向两侧;
同时逐步向上均匀而对称地展开,整体较为美观,也方便观察数据的分布规律;
Seaborn中的swarmplot()函数绘制;
点阵图(dot plot)
每个类别数据点沿着X轴类别标签中心线向两侧均匀而对称的展开;
整体较为美观,方便观察数据分布规律;
plotnine包中的geom_dotplot()函数可以绘制点阵图;
主要参数包括
箱型宽度(binwidth);
箱型的排布方向(binaxis)(沿X轴或Y轴);
散点的排布方式(stackdir)[''UP"(默认)、"down"、"center"];
散点大小(dotsize);
抖动散点图+带误差线的散点图
先根据每个类别数据直接绘制散点图;
再添加每个类别数据的均值与误差线(标准差)
average+standard deviation
使用点阵图作为背景,可以显示数据分布情况;
带连接线的带误差线散点图
使用曲线连接散点,X轴变量为连续型的时间变量;
用曲线连接数据点可以表示数据的变化关系与趋势;
pandas包的groupby()函数和aggregate()函数分别计算不同类别的均值与标准差;
plotnine包的geompoint()函数和geomerrorbar()函数分别绘制均值点和对应的误差线;
使用geom_line()函数绘制光滑的曲线连接各点;
先使用geomjitter()或geomdotplot()函数绘制点阵图或抖动散点图;
再添加误差线和均值点;
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *
df=pd.read_csv('d:\python\out\DistributionD.csv')
df['class']=df['class'].astype("category", ["n", "s", "k", "mm"])
#抖动散点图
jitter_plot=(ggplot(df,aes(x='class',y="value",fill="class"))
+geom_jitter(width=0.3,size=3,stroke=0.1,show_legend=False)
+scale_fill_hue(s = 0.90, l = 0.65, h=0.0417,color_space='husl')
+theme_matplotlib()
+theme(#legend_position='none',
aspect_ratio =1.05,
dpi=100,
figure_size=(4,4)))
print(jitter_plot)
jitter_plot.save("jitter_plot.pdf")
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *
df=pd.read_csv('d:\python\out\DistributionD.csv')
df['class']=df['class'].astype("category",["n", "s", "k", "mm"])
#蜂巢图
sns.set_palette("husl") #设定绘图的颜色主题
fig = plt.figure(figsize=(4,4), dpi=100)
sns.swarmplot(x="class", y="value",hue="class", data=df,edgecolor='k',linewidth=0.2)
plt.legend().set_visible(False)
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *
df=pd.read_csv('d:\python\out\DistributionD.csv')
df['class']=df['class'].astype("category", ["n", "s", "k", "mm"])
#-点阵图
dot_plot=(ggplot(df,aes(x='class',y="value",fill="class"))
+geom_dotplot(binaxis = "y",stackdir ='center',
binwidth=0.15,show_legend=False)
+scale_fill_hue(s = 0.90, l = 0.65, h=0.0417,color_space='husl')
+theme_matplotlib()
+theme(#legend_position='none',
aspect_ratio =1.05,
dpi=100,
figure_size=(4,4)))
print(dot_plot)
dot_plot.save("dot_plot.pdf")
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *
df=pd.read_csv('d:\python\out\DistributionD.csv')
df['class']=df['class'].astype("category", ["n", "s", "k", "mm"])
#带误差线的均值散点图
jitter_plot=(ggplot(df,aes(x='class',y="value",fill="class"))
+geom_jitter(width=0.3,size=3,stroke=0.1,show_legend=False)
+stat_summary(fun_data="mean_sdl", fun_args = { 'mult':1},geom="pointrange", color = "black",size = 1,show_legend=False)
#+stat_summary(fun_data="mean_sdl", fun_args = {'mult':1},geom="point", fill="w",color = "black",size = 5,stroke=1,show_legend=False)
+geom_point(stat="summary", fun_data="mean_sdl",fun_args = { 'mult':1},fill="w",color = "black",size = 5,stroke=1,show_legend=False)
+scale_fill_hue(s = 0.90, l = 0.65, h=0.0417,color_space='husl')
+theme_matplotlib()
+theme(#legend_position='none',
aspect_ratio =1.05,
dpi=100,
figure_size=(4,4)))
print(jitter_plot)
jitter_plot.save("jitter_plot2.pdf")
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from plotnine import *
df=pd.read_csv('d:\python\out\DistributionD.csv')
df['class']=df['class'].astype("category", ["n", "s", "k", "mm"])
#带误差线散点与点阵组合图
dot_plot2=(ggplot(df,aes(x='class',y="value",fill="class"))
+geom_dotplot(binaxis = "y",stackdir ='center',
binwidth=0.15,show_legend=False)
+stat_summary(fun_data="mean_sdl", fun_args = { 'mult':1},geom="pointrange", color = "black",size = 1,show_legend=False)
+stat_summary(fun_data="mean_sdl", fun_args = { 'mult':1},geom="point", fill="w",color = "black",size = 5,stroke=1,show_legend=False)
+scale_fill_hue(s = 0.90, l = 0.65, h=0.0417,color_space='husl')
+theme_matplotlib()
+theme(#legend_position='none',
aspect_ratio =1.05,
dpi=100,
figure_size=(4,4)))
print(dot_plot2)
dot_plot.save("dot_plot2.pdf")