这里模拟了4个因子,5个观测值的数据框, 主要介绍了一下几种方法的汇总统计:
1.1 模拟数据代码
dat = data.frame(F1=1:24,F2=rep(1:2,12),F3=rep(1:3,8),F4=rep(1:4,6),
y1=rnorm(24),y2=rnorm(24),y3=rnorm(24),y4=rnorm(24))
dat
结果:
> dat
F1 F2 F3 F4 y1 y2 y3 y4
1 1 1 1 1 -1.56638762 1.77659389 0.62182746 0.07154109
2 2 2 2 2 1.09314649 0.71375709 -2.00699087 -0.21156736
3 3 1 3 3 -0.05927923 0.37890941 -2.44829351 -0.21725814
4 4 2 1 4 -0.20873143 0.11067616 0.68841731 1.76528949
5 5 1 2 1 -1.27492917 -0.95776287 -1.68332656 -0.01702117
6 6 2 3 2 -1.05095342 -0.38322499 0.14197083 -0.78430424
7 7 1 1 3 1.37034964 0.05623515 -1.40426807 0.65247027
8 8 2 2 4 0.36660747 -0.12935219 0.35927791 0.78090801
9 9 1 3 1 0.23858612 1.40575764 0.10948955 -0.97792913
10 10 2 1 2 -1.20208996 0.83394104 0.81612552 -1.16199479
11 11 1 2 3 0.67429860 1.64004800 0.21721424 0.10194002
12 12 2 3 4 -0.28761315 -0.16285338 0.88606656 0.89780823
13 13 1 1 1 0.58100320 -0.50242117 0.69975049 0.23075716
14 14 2 2 2 -0.09756759 0.32500760 1.34954777 0.49576819
15 15 1 3 3 -0.79733970 -0.45139957 0.96597139 -2.47475726
16 16 2 1 4 -1.53313299 -1.36002014 0.06478981 0.27118850
17 17 1 2 1 -1.76762191 -1.17475175 -1.16165180 0.08503871
18 18 2 3 2 -0.32539248 -1.12102656 1.35283538 0.46963266
19 19 1 1 3 -0.29976865 1.19147376 0.38726070 0.12839759
20 20 2 2 4 -0.53285724 -0.37190046 -1.02641877 -1.71363552
21 21 1 3 1 -0.74750973 -0.69994486 1.29616246 -0.22394345
22 22 2 1 2 -0.82581172 -0.83660765 0.43636897 0.29364722
23 23 1 2 3 0.74471471 0.38635141 -0.85874012 -1.17886383
24 24 2 3 4 1.28956868 -1.41161366 0.36144567 -0.31512618
假定汇总的统计量包括: 观测值个数, 平均数, 标准差, 变异系数. 统计时不包括缺失值.
func <- function(x)(c(n = length(x),mean=mean(x,na.rm = T),sd=sd(x,na.rm = T),cv=sd(x,na.rm = T)/mean(x,na.rm = T)*100))
代码
aggregate(y1 ~ F4, data=dat,mean)
结果
> aggregate(y1 ~ F4, data=dat,mean)
F4 y1
1 1 -0.7561432
2 2 -0.4014448
3 3 0.2721626
4 4 -0.1510264
代码
aggregate(y1 ~ F4, data=dat,mean)
结果
> aggregate(y1 ~ F4, data=dat,func)
F4 y1.n y1.mean y1.sd y1.cv
1 1 6.0000000 -0.7561432 0.9722392 -128.5787188
2 2 6.0000000 -0.4014448 0.8455661 -210.6307286
3 3 6.0000000 0.2721626 0.7964707 292.6452101
4 4 6.0000000 -0.1510264 0.9403466 -622.6370275
aggregate(y1 ~ F4 + F3, data=dat,mean)
aggregate(y1 ~ F4 + F3, data=dat,func)
结果
> aggregate(y1 ~ F4 + F3, data=dat,mean)
F4 F3 y1
1 1 1 -0.49269221
2 2 1 -1.01395084
3 3 1 0.53529049
4 4 1 -0.87093221
5 1 2 -1.52127554
6 2 2 0.49778945
7 3 2 0.70950665
8 4 2 -0.08312489
9 1 3 -0.25446181
10 2 3 -0.68817295
11 3 3 -0.42830947
12 4 3 0.50097777
> aggregate(y1 ~ F4 + F3, data=dat,func)
F4 F3 y1.n y1.mean y1.sd y1.cv
1 1 1 2.00000000 -0.49269221 1.51843461 -308.19131500
2 2 1 2.00000000 -1.01395084 0.26606889 -26.24080813
3 3 1 2.00000000 0.53529049 1.18095197 220.61889335
4 4 1 2.00000000 -0.87093221 0.93649332 -107.52769414
5 1 2 2.00000000 -1.52127554 0.34838637 -22.90093824
6 2 2 2.00000000 0.49778945 0.84196201 169.14018691
7 3 2 2.00000000 0.70950665 0.04979171 7.01779324
8 4 2 2.00000000 -0.08312489 0.63601759 -765.13499750
9 1 3 2.00000000 -0.25446181 0.69727506 -274.01953526
10 2 3 2.00000000 -0.68817295 0.51304906 -74.55234353
11 3 3 2.00000000 -0.42830947 0.52188756 -121.84824327
12 4 3 2.00000000 0.50097777 1.11523596 222.61186859
注意, 这里多变量时, 使用cbind函数
aggregate(cbind(y1,y2)~F4, data=dat, mean)
aggregate(cbind(y1,y2)~F4, data=dat, func)
结果
> aggregate(cbind(y1,y2)~F4, data=dat, mean)
F4 y1 y2
1 1 -0.7561432 -0.02542152
2 2 -0.4014448 -0.07802558
3 3 0.2721626 0.53360303
4 4 -0.1510264 -0.55417728
> aggregate(cbind(y1,y2)~F4, data=dat, func)
F4 y1.n y1.mean y1.sd y1.cv y2.n y2.mean y2.sd y2.cv
1 1 6.0000000 -0.7561432 0.9722392 -128.5787188 6.000000e+00 -2.542152e-02 1.278144e+00 -5.027804e+03
2 2 6.0000000 -0.4014448 0.8455661 -210.6307286 6.000000e+00 -7.802558e-02 8.218860e-01 -1.053355e+03
3 3 6.0000000 0.2721626 0.7964707 292.6452101 6.000000e+00 5.336030e-01 7.616742e-01 1.427417e+02
4 4 6.0000000 -0.1510264 0.9403466 -622.6370275 6.000000e+00 -5.541773e-01 6.623361e-01 -1.195170e+02
aggregate(cbind(y1,y2,y3)~F4+F3, data=dat, mean)
aggregate(cbind(y1,y2,y3)~F4+F3, data=dat, func)
结果
> aggregate(cbind(y1,y2,y3)~F4+F3, data=dat, mean)
F4 F3 y1 y2 y3
1 1 1 -0.49269221 0.637086358 0.6607890
2 2 1 -1.01395084 -0.001333303 0.6262472
3 3 1 0.53529049 0.623854458 -0.5085037
4 4 1 -0.87093221 -0.624671988 0.3766036
5 1 2 -1.52127554 -1.066257310 -1.4224892
6 2 2 0.49778945 0.519382343 -0.3287215
7 3 2 0.70950665 1.013199705 -0.3207629
8 4 2 -0.08312489 -0.250626325 -0.3335704
9 1 3 -0.25446181 0.352906393 0.7028260
10 2 3 -0.68817295 -0.752125774 0.7474031
11 3 3 -0.42830947 -0.036245083 -0.7411611
12 4 3 0.50097777 -0.787233521 0.6237561
> aggregate(cbind(y1,y2,y3)~F4+F3, data=dat, func)
F4 F3 y1.n y1.mean y1.sd y1.cv y2.n y2.mean y2.sd y2.cv y3.n
1 1 1 2.00000000 -0.49269221 1.51843461 -308.19131500 2.000000e+00 6.370864e-01 1.611507e+00 2.529495e+02 2.0000000
2 2 1 2.00000000 -1.01395084 0.26606889 -26.24080813 2.000000e+00 -1.333303e-03 1.181256e+00 -8.859624e+04 2.0000000
3 3 1 2.00000000 0.53529049 1.18095197 220.61889335 2.000000e+00 6.238545e-01 8.027349e-01 1.286734e+02 2.0000000
4 4 1 2.00000000 -0.87093221 0.93649332 -107.52769414 2.000000e+00 -6.246720e-01 1.039939e+00 -1.664777e+02 2.0000000
5 1 2 2.00000000 -1.52127554 0.34838637 -22.90093824 2.000000e+00 -1.066257e+00 1.534343e-01 -1.438999e+01 2.0000000
6 2 2 2.00000000 0.49778945 0.84196201 169.14018691 2.000000e+00 5.193823e-01 2.748874e-01 5.292583e+01 2.0000000
7 3 2 2.00000000 0.70950665 0.04979171 7.01779324 2.000000e+00 1.013200e+00 8.864974e-01 8.749483e+01 2.0000000
8 4 2 2.00000000 -0.08312489 0.63601759 -765.13499750 2.000000e+00 -2.506263e-01 1.715075e-01 -6.843157e+01 2.0000000
9 1 3 2.00000000 -0.25446181 0.69727506 -274.01953526 2.000000e+00 3.529064e-01 1.488957e+00 4.219126e+02 2.0000000
10 2 3 2.00000000 -0.68817295 0.51304906 -74.55234353 2.000000e+00 -7.521258e-01 5.217045e-01 -6.936400e+01 2.0000000
11 3 3 2.00000000 -0.42830947 0.52188756 -121.84824327 2.000000e+00 -3.624508e-02 5.871171e-01 -1.619853e+03 2.0000000
12 4 3 2.00000000 0.50097777 1.11523596 222.61186859 2.000000e+00 -7.872335e-01 8.830069e-01 -1.121658e+02 2.0000000
y3.mean y3.sd y3.cv
1 0.6607890 0.0550999 8.3385012
2 0.6262472 0.2685284 42.8789796
3 -0.5085037 1.2668021 -249.1234928
4 0.3766036 0.4409712 117.0916257
5 -1.4224892 0.3688798 -25.9319910
6 -0.3287215 2.3734312 -722.0187593
7 -0.3207629 0.7608146 -237.1890640
8 -0.3335704 0.9798355 -293.7417173
9 0.7028260 0.8391045 119.3900697
10 0.7474031 0.8562105 114.5580645
11 -0.7411611 2.4142499 -325.7388963
12 0.6237561 0.3709630 59.4724411
ggplot(dat,aes(x=F1,y=y2))+geom_line()
增加点图
ggplot(dat,aes(x=F1,y=y2))+geom_line() + geom_point()
使用reshape2包中的melt进行数据转换
dd = reshape2::melt(dat,1:4,value.name="y")
head(dd)
ggplot(dd,aes(x=F1,y=y,colour=variable))+geom_line() + geom_point()
结果
> dd = reshape2::melt(dat,1:4,value.name="y")
> head(dd)
F1 F2 F3 F4 variable y
1 1 1 1 1 y1 -1.56638762
2 2 2 2 2 y1 1.09314649
3 3 1 3 3 y1 -0.05927923
4 4 2 1 4 y1 -0.20873143
5 5 1 2 1 y1 -1.27492917
6 6 2 3 2 y1 -1.05095342
搞定!!!