plot画次轴 python_python数据分布型图表散点数据分布图系列

plot画次轴 python_python数据分布型图表散点数据分布图系列_第1张图片

散点数据分布图系列

散点数据分布图

  • 使用散点图的方式展示数据分布规律;

  • 借助误差线或连接曲线;

常见不同形式的散点数据分布图

  1. 抖动散点图(jitter scatter chart)

  • 每个类别数据点的Y轴数值保持不变;

  • 数据点X轴数值沿着X轴类别标签中心线在一定范围内随机生成;

  • 再绘制成散点图;

  • 抖动散点图的主要绘制参数:

    • 数据点的抖动范围;

    • 由于随机生成数据点的X轴数值,因此很容易存在数据点重合叠加情况,不利于观察数据分布规律;

    • plotnine中的geomjitter()函数可以绘制抖动散点图;

    • 关键参数position=positionjitter(width=NULL),width表示水平方向左右抖动范围;

  1. 蜂巢图(hive chart)

  • 每个类别数据点沿着X轴类别标签中心线向两侧;

  • 同时逐步向上均匀而对称地展开,整体较为美观,也方便观察数据的分布规律;

  • Seaborn中的swarmplot()函数绘制;

  1. 点阵图(dot plot)

  • 每个类别数据点沿着X轴类别标签中心线向两侧均匀而对称的展开;

  • 整体较为美观,方便观察数据分布规律;

  • plotnine包中的geom_dotplot()函数可以绘制点阵图;

  • 主要参数包括

    • 箱型宽度(binwidth);

    • 箱型的排布方向(binaxis)(沿X轴或Y轴);

    • 散点的排布方式(stackdir)[''UP"(默认)、"down"、"center"];

    • 散点大小(dotsize);

  1. 抖动散点图+带误差线的散点图

  • 先根据每个类别数据直接绘制散点图;

  • 再添加每个类别数据的均值与误差线(标准差)

  • average+standard deviation

  • 使用点阵图作为背景,可以显示数据分布情况;

  1. 带连接线的带误差线散点图

  • 使用曲线连接散点,X轴变量为连续型的时间变量;

  • 用曲线连接数据点可以表示数据的变化关系与趋势;

  • pandas包的groupby()函数和aggregate()函数分别计算不同类别的均值与标准差;

  • plotnine包的geompoint()函数和geomerrorbar()函数分别绘制均值点和对应的误差线;

  • 使用geom_line()函数绘制光滑的曲线连接各点;

绘制散点分布图系列的绘制方法

  • 先使用geomjitter()或geomdotplot()函数绘制点阵图或抖动散点图;

  • 再添加误差线和均值点;

抖动散点图

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

from plotnine import *

df=pd.read_csv('d:\python\out\DistributionD.csv')

df['class']=df['class'].astype("category", ["n", "s", "k", "mm"])

#抖动散点图

jitter_plot=(ggplot(df,aes(x='class',y="value",fill="class"))

+geom_jitter(width=0.3,size=3,stroke=0.1,show_legend=False)

+scale_fill_hue(s = 0.90, l = 0.65, h=0.0417,color_space='husl')

+theme_matplotlib()

+theme(#legend_position='none',

aspect_ratio =1.05,

dpi=100,

figure_size=(4,4)))

print(jitter_plot)

jitter_plot.save("jitter_plot.pdf")

蜂巢图

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

from plotnine import *

df=pd.read_csv('d:\python\out\DistributionD.csv')

df['class']=df['class'].astype("category",["n", "s", "k", "mm"])

#蜂巢图

sns.set_palette("husl") #设定绘图的颜色主题

fig = plt.figure(figsize=(4,4), dpi=100)

sns.swarmplot(x="class", y="value",hue="class", data=df,edgecolor='k',linewidth=0.2)

plt.legend().set_visible(False)

点阵图

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

from plotnine import *

df=pd.read_csv('d:\python\out\DistributionD.csv')

df['class']=df['class'].astype("category", ["n", "s", "k", "mm"])

#-点阵图

dot_plot=(ggplot(df,aes(x='class',y="value",fill="class"))

+geom_dotplot(binaxis = "y",stackdir ='center',

binwidth=0.15,show_legend=False)

+scale_fill_hue(s = 0.90, l = 0.65, h=0.0417,color_space='husl')

+theme_matplotlib()

+theme(#legend_position='none',

aspect_ratio =1.05,

dpi=100,

figure_size=(4,4)))

print(dot_plot)

dot_plot.save("dot_plot.pdf")

带误差线的均值散点图

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

from plotnine import *

df=pd.read_csv('d:\python\out\DistributionD.csv')

df['class']=df['class'].astype("category", ["n", "s", "k", "mm"])

#带误差线的均值散点图

jitter_plot=(ggplot(df,aes(x='class',y="value",fill="class"))

+geom_jitter(width=0.3,size=3,stroke=0.1,show_legend=False)

+stat_summary(fun_data="mean_sdl", fun_args = { 'mult':1},geom="pointrange", color = "black",size = 1,show_legend=False)

#+stat_summary(fun_data="mean_sdl", fun_args = {'mult':1},geom="point", fill="w",color = "black",size = 5,stroke=1,show_legend=False)

+geom_point(stat="summary", fun_data="mean_sdl",fun_args = { 'mult':1},fill="w",color = "black",size = 5,stroke=1,show_legend=False)

+scale_fill_hue(s = 0.90, l = 0.65, h=0.0417,color_space='husl')

+theme_matplotlib()

+theme(#legend_position='none',

aspect_ratio =1.05,

dpi=100,

figure_size=(4,4)))

print(jitter_plot)

jitter_plot.save("jitter_plot2.pdf")

带误差线散点与点阵组合图

import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

from plotnine import *

df=pd.read_csv('d:\python\out\DistributionD.csv')

df['class']=df['class'].astype("category", ["n", "s", "k", "mm"])

#带误差线散点与点阵组合图

dot_plot2=(ggplot(df,aes(x='class',y="value",fill="class"))

+geom_dotplot(binaxis = "y",stackdir ='center',

binwidth=0.15,show_legend=False)

+stat_summary(fun_data="mean_sdl", fun_args = { 'mult':1},geom="pointrange", color = "black",size = 1,show_legend=False)

+stat_summary(fun_data="mean_sdl", fun_args = { 'mult':1},geom="point", fill="w",color = "black",size = 5,stroke=1,show_legend=False)

+scale_fill_hue(s = 0.90, l = 0.65, h=0.0417,color_space='husl')

+theme_matplotlib()

+theme(#legend_position='none',

aspect_ratio =1.05,

dpi=100,

figure_size=(4,4)))

print(dot_plot2)

dot_plot.save("dot_plot2.pdf")

你可能感兴趣的:(plot画次轴,python)