目的
我有一批数据,想看他们的相关性,计算相关性r后,发现相关性没有我自己想像中的好,所以我画了下散点图,发现有些异常值影响了r,所以如何能更全面的展示这样的数据?
df<-data.frame(set12$RatioMC.x, set12$RatioMC.y)
ggplot(df,aes(x=df[,1],y=df[,2]))+
geom_point(alpha=1,colour="#FFA54F",size=1)+
labs(x="Set 1",y="Set 2",title=c("Pearson’s correlation test of Set1 and Set2"))+
theme(axis.title = element_text(size = 7), plot.title = element_text(size=7))+
annotate("text", x = 1, y = 5, label =paste("r =", round(cor(df[1,2],2)), colour="black", size=4)
可执行方案
- 从数据上做调整
- 做log转换
df<-data.frame(log2(set12$RatioMC.x), log2(set12$RatioMC.y))
-
去除异常点,注意,通过设置xlim,ylim只是溢出的数据点不再显示,但数据本身没发生变化,所以r没有变化,如果想让r自动更改,最好去数据里做调整。
- 从绘图上更改, 将问题转化为:处理数据遮盖绘制问题
可选方案很多,比如设置点中空,像素点或者通过设置透明度来处理
转为二维核密度估计问题
数据点分箱,计算每个箱内点的数目或其他统计变换
ggplot(df,aes(x=df[,1],y=df[,2]))+
geom_point(alpha=1,colour="#FFA54F",size=1)+
geom_abline(slope = 1,intercept = 0,linetype="dashed")+
#geom_density2d()+
#stat_density_2d(geom = "point", aes(size = after_stat(density)), n = 20, contour = FALSE)+
#stat_density_2d(geom = "raster", aes(fill = after_stat(density)), contour = FALSE)+
#stat_density_2d(geom = "tile", aes(fill = after_stat(density)), contour = FALSE)+
#scale_fill_continuous(high="red2",low="blue4") #有fill可以此参数
#geom_bin2d(bins=50)+
#geom_hex()+
#stat_density_2d(aes(fill = after_stat(level)), geom = "polygon")+
stat_density2d(aes(alpha = ..density..), geom = "tile", contour = FALSE) +
theme_bw()+
xlim(-2,2)+
ylim(-2,2)+
labs(x="Set 1",y="Set 2",title=c("Pearson’s correlation test of Set1 and Set2"))+
theme(axis.title = element_text(size = 7),plot.title = element_text(size=7))+
annotate("text", x = -1.5, y = 2, label =paste("r =", round(cor(df)[1,2],2)),colour="black",size=4)
此外,发现了之前看上的一个图的绘制方法(具体内容后续再补充)
# Inspired by the image-density plots of Ken Knoblauch
cars <- ggplot(mtcars, aes(mpg, factor(cyl)))
cars + geom_point()
cars + stat_bin2d(aes(fill = after_stat(count)), binwidth = c(3,1))
cars + stat_bin2d(aes(fill = after_stat(density)), binwidth = c(3,1))
cars + stat_density(aes(fill = after_stat(density)), geom = "raster", position = "identity")
cars + stat_density(aes(fill = after_stat(count)), geom = "raster", position = "identity")