以前真是对置信区间有误解

statquest学习笔记+1啊啊啊啊啊,以前真是对置信区间有误解。视频地址:https://www.bilibili.com/video/BV1vb411p7xE

1.Bootstrapping自助抽样法

为了更好的理解置信区间,先学习一下如何计算它(Bootstrapping是多种计算置信区间方法中的一种)。

Bootstrapping的例子:

1.从12只小鼠的体重数据中抽样,每次抽12个,有放回

weight = c(15.4,25.3,25.6,34.7,28.8,18.9,30.0,36.7,25.8,27.7,38.7,32.5)
n1 = sample(weight,12,replace = T);n1
##  [1] 18.9 25.3 28.8 25.8 15.4 18.9 25.8 18.9 25.3 27.7 36.7 28.8

2.求取出的12个样本的均值

mean(n1)
## [1] 24.69167

3.将前两步重复1万次,得到1万个均值

(意思一下,100个就好了)

ms = sapply(1:100, function(x){mean(sample(weight,12,replace = T))});head(ms)
## [1] 31.60833 29.40833 26.21667 25.99167 29.39167 28.61667

把多次抽样计算出来的均值画在图上:

library(ggplot2)
dat = data.frame(a = 10:50,b = -20:20)
p <- ggplot(dat,aes(x=a,y=b)) + geom_point(alpha = 0)+theme_bw()+
  geom_segment(aes(x = 10, y = 0, xend = 50, yend = 0),
               arrow = arrow(length = unit(0.5, "cm")))
a1 = p
for(i in 1:100){
  a1 = a1 + geom_vline(xintercept = ms[[i]],color = "red",size = 0.3,alpha = 0.5)
}
a1
image.png

95%置信区间:包含95%的均值的区间

99%置信区间:包含99%的均值的区间

2.置信区间的作用

12个小鼠,可以看作是地球上所有小鼠的中抽出的一组样本。样本均值是总体均值的估计,所以,总体均值(true mean)落在上面计算的置信区间内。置信区间外的取值,和true mean有显著差异。

如果我们有两组数据

再来另外12只小鼠。

weight2 = c(32.5,23.4,36.7,35.7,38.7,32.5,32.4,37.0,26.7,30.0,34.4,49.8)

ms = sapply(1:100, function(x){mean(sample(weight2,12,replace = T))})

dat = data.frame(a = 10:50,b = -20:20)
p <- ggplot(dat,aes(x=a,y=b)) + geom_point(alpha = 0)+theme_bw()+
  geom_segment(aes(x = 10, y = 0, xend = 50, yend = 0),
               arrow = arrow(length = unit(0.5, "cm")))

p +geom_vline(aes(xintercept = ms[1]),color = "red",size = 0.3,alpha = 0.5)
image.png
a2 = p
for(i in 1:100){
  a2 = a2 + geom_vline(xintercept = ms[[i]],color = "blue",size = 0.3,alpha = 0.5)
}
a2
image.png
library(patchwork)
a1/a2
image.png

-当两组小鼠的95%置信区间不存在重叠时,说明组小鼠的体重具有显著差异(仅根据置信区间就可以得出结论)

-当两组小鼠的95%置信区间存在重叠时,仍然有可能有显著差异,需要借助t检验来判断。

啊!

你可能感兴趣的:(以前真是对置信区间有误解)