Python(Kinda)中的重复测量方差分析

I love doing data analyses with pandas, numpy, sci-py etc., but I often need to run repeated measures ANOVAs, which are not implemented in any major python libraries. Python Psychologist shows how to do repeated measures ANOVAs yourself in python, but I find using a widley distributed implementation comforting…

我喜欢用pandas,numpy,sci-py等进行数据分析,但是我经常需要运行重复测量方差分析 ,这在任何主要的python库中都未实现。 Python心理学家展示了如何在python中自己进行重复测量方差分析,但是我发现使用了widley分布式实现令人欣慰。

In this post I show how to execute a repeated measures ANOVAs using the rpy2 library, which allows us to move data between python and R, and execute R commands from python. I use rpy2 to load the car library and run the ANOVA.

在本文中,我将展示如何使用rpy2库执行重复测量方差分析,该库使我们能够在python和R之间移动数据,并从python执行R命令。 我使用rpy2加载汽车库并运行ANOVA。

I will show how to run a one-way repeated measures ANOVA and a two-way repeated measures ANOVA.

我将展示如何运行单向重复测量方差分析和双向重复测量方差分析。

1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10

Below I use the random library to generate some fake data. I seed the random number generator with a one so that this analysis can be replicated.

下面,我使用随机库生成一些伪造数据。 我给随机数生成器加了一个种子,以便可以复制此分析。

I will generated 3 conditions which represent 3 levels of a single variable.

我将生成3个条件,它们代表单个变量的3个级别。

The data are generated from a gaussian distribution. The second condition has a higher mean than the other two conditions.

数据是从高斯分布生成的。 第二个条件的均值高于其他两个条件。

1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9

Python(Kinda)中的重复测量方差分析_第1张图片

Next, I load rpy2 for ipython. I am doing these analyses with ipython in a jupyter notebook (highly recommended).

接下来,我为ipython加载rpy2。 我正在jupyter笔记本中使用ipython进行这些分析(强烈建议)。

1
1

Here’s how to run the ANOVA. Note that this is a one-way anova with 3 levels of the factor.

这是运行方差分析的方法。 请注意,这是具有3个因子水平的单向方差分析。

1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
12
12
13
13
14
14
15
15
16
16
17
17
18
18
19
19
20
20
21
21
22
22
23
23
Type III Repeated Measures MANOVA Tests:

------------------------------------------

Term: (Intercept) 

 Response transformation matrix:
       (Intercept)
cond_1           1
cond_2           1
cond_3           1

Sum of squares and products for the hypothesis:
            (Intercept)
(Intercept)   102473990

Sum of squares and products for error:
            (Intercept)
(Intercept)     78712.7

Multivariate Tests: (Intercept)
                 Df test stat approx F num Df den Df     Pr(>F)    
Pillai            1    0.9992 37754.33      1     29 < 2.22e-16 ***
Wilks             1    0.0008 37754.33      1     29 < 2.22e-16 ***
Hotelling-Lawley  1 1301.8736 37754.33      1     29 < 2.22e-16 ***
Roy               1 1301.8736 37754.33      1     29 < 2.22e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

------------------------------------------

Term: Factor 

 Response transformation matrix:
       Factor1 Factor2
cond_1       1       0
cond_2       0       1
cond_3      -1      -1

Sum of squares and products for the hypothesis:
          Factor1   Factor2
Factor1  3679.584  19750.87
Factor2 19750.870 106016.58

Sum of squares and products for error:
         Factor1  Factor2
Factor1 40463.19 27139.59
Factor2 27139.59 51733.12

Multivariate Tests: Factor
                 Df test stat approx F num Df den Df    Pr(>F)    
Pillai            1 0.7152596 35.16759      2     28 2.303e-08 ***
Wilks             1 0.2847404 35.16759      2     28 2.303e-08 ***
Hotelling-Lawley  1 2.5119704 35.16759      2     28 2.303e-08 ***
Roy               1 2.5119704 35.16759      2     28 2.303e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

                  SS num Df Error SS den Df         F    Pr(>F)    
(Intercept) 34157997      1    26238     29 37754.334 < 2.2e-16 ***
Factor         59964      2    43371     58    40.094 1.163e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Mauchly Tests for Sphericity

       Test statistic p-value
Factor        0.96168 0.57866


Greenhouse-Geisser and Huynh-Feldt Corrections
 for Departure from Sphericity

        GG eps Pr(>F[GG])    
Factor 0.96309  2.595e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

        HF eps   Pr(>F[HF])
Factor 1.03025 1.163294e-11
Type III Repeated Measures MANOVA Tests:

------------------------------------------

Term: (Intercept) 

 Response transformation matrix:
       (Intercept)
cond_1           1
cond_2           1
cond_3           1

Sum of squares and products for the hypothesis:
            (Intercept)
(Intercept)   102473990

Sum of squares and products for error:
            (Intercept)
(Intercept)     78712.7

Multivariate Tests: (Intercept)
                 Df test stat approx F num Df den Df     Pr(>F)    
Pillai            1    0.9992 37754.33      1     29 < 2.22e-16 ***
Wilks             1    0.0008 37754.33      1     29 < 2.22e-16 ***
Hotelling-Lawley  1 1301.8736 37754.33      1     29 < 2.22e-16 ***
Roy               1 1301.8736 37754.33      1     29 < 2.22e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

------------------------------------------

Term: Factor 

 Response transformation matrix:
       Factor1 Factor2
cond_1       1       0
cond_2       0       1
cond_3      -1      -1

Sum of squares and products for the hypothesis:
          Factor1   Factor2
Factor1  3679.584  19750.87
Factor2 19750.870 106016.58

Sum of squares and products for error:
         Factor1  Factor2
Factor1 40463.19 27139.59
Factor2 27139.59 51733.12

Multivariate Tests: Factor
                 Df test stat approx F num Df den Df    Pr(>F)    
Pillai            1 0.7152596 35.16759      2     28 2.303e-08 ***
Wilks             1 0.2847404 35.16759      2     28 2.303e-08 ***
Hotelling-Lawley  1 2.5119704 35.16759      2     28 2.303e-08 ***
Roy               1 2.5119704 35.16759      2     28 2.303e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Univariate Type III Repeated-Measures ANOVA Assuming Sphericity

                  SS num Df Error SS den Df         F    Pr(>F)    
(Intercept) 34157997      1    26238     29 37754.334 < 2.2e-16 ***
Factor         59964      2    43371     58    40.094 1.163e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Mauchly Tests for Sphericity

       Test statistic p-value
Factor        0.96168 0.57866


Greenhouse-Geisser and Huynh-Feldt Corrections
 for Departure from Sphericity

        GG eps Pr(>F[GG])    
Factor 0.96309  2.595e-11 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

        HF eps   Pr(>F[HF])
Factor 1.03025 1.163294e-11
 

The ANOVA table isn’t pretty, but it works. As you can see, the ANOVA was wildly significant.

方差分析表不是很漂亮,但是可以用。 如您所见,ANOVA非常重要。

Next, I generate data for a two-way (2×3) repeated measures ANOVA. Condition A is the same data as above. Condition B has a different pattern (2 is lower than 1 and 3), which should produce an interaction.

接下来,我为两次(2×3)重复测量方差分析生成数据。 条件A与上述相同。 条件B具有不同的模式(2小于1和3),这应该产生相互作用。

random.seed(1)

cond_1a = [random.gauss(600,30) for x in range(30)] #u=600,sd=30
cond_2a = [random.gauss(650,30) for x in range(30)] #u=650,sd=30
cond_3a = [random.gauss(600,30) for x in range(30)] #u=600,sd=30

cond_1b = [random.gauss(600,30) for x in range(30)] #u=600,sd=30
cond_2b = [random.gauss(550,30) for x in range(30)] #u=550,sd=30
cond_3b = [random.gauss(650,30) for x in range(30)] #u=650,sd=30

width = 0.25
plt.bar(np.arange(1,4)-width,[np.mean(cond_1a),np.mean(cond_2a),np.mean(cond_3a)],width)
plt.bar(np.arange(1,4),[np.mean(cond_1b),np.mean(cond_2b),np.mean(cond_3b)],width,color=plt.rcParams['axes.color_cycle'][0])
plt.legend(['A','B'],loc=4)
plt.xticks([1,2,3]);
random . seed ( 1 )
 
 cond_1a = [ random . gauss ( 600 , 30 ) for x in range ( 30 )] #u=600,sd=30
 cond_2a = [ random . gauss ( 650 , 30 ) for x in range ( 30 )] #u=650,sd=30
 cond_3a = [ random . gauss ( 600 , 30 ) for x in range ( 30 )] #u=600,sd=30
 
 cond_1b = [ random . gauss ( 600 , 30 ) for x in range ( 30 )] #u=600,sd=30
 cond_2b = [ random . gauss ( 550 , 30 ) for x in range ( 30 )] #u=550,sd=30
 cond_3b = [ random . gauss ( 650 , 30 ) for x in range ( 30 )] #u=650,sd=30
 
 width = 0.25
 plt . bar ( np . arange ( 1 , 4 ) - width ,[ np . mean ( cond_1a ), np . mean ( cond_2a ), np . mean ( cond_3a )], width )
 plt . bar ( np . arange ( 1 , 4 ),[ np . mean ( cond_1b ), np . mean ( cond_2b ), np . mean ( cond_3b )], width , color = plt . rcParams [ 'axes.color_cycle' ][ 0 ])
 plt . legend ([ 'A' , 'B' ], loc = 4 )
 plt . xticks ([ 1 , 2 , 3 ]);
 

Python(Kinda)中的重复测量方差分析_第2张图片
%Rpush cond_1a cond_1b cond_2a cond_2b cond_3a cond_3b

%R Factor1 <- c('A','A','A','B','B','B')
%R Factor2 <- c('Cond1','Cond2','Cond3','Cond1','Cond2','Cond3')
%R idata <- data.frame(Factor1, Factor2)

#make sure the vectors appear in the same order as they appear in the dataframe
%R Bind <- cbind(cond_1a, cond_2a, cond_3a, cond_1b, cond_2b, cond_3b)
%R model <- lm(Bind~1)

%R library(car)
%R analysis <- Anova(model, idata=idata, idesign=~Factor1*Factor2, type="III")
%R anova_sum = summary(analysis)
%Rpull anova_sum

print anova_sum
% Rpush cond_1a cond_1b cond_2a cond_2b cond_3a cond_3b
 
 % R Factor1 <- c ( 'A' , 'A' , 'A' , 'B' , 'B' , 'B' )
 % R Factor2 <- c ( 'Cond1' , 'Cond2' , 'Cond3' , 'Cond1' , 'Cond2' , 'Cond3' )
 % R idata <- data . frame ( Factor1 , Factor2 )
 
 #make sure the vectors appear in the same order as they appear in the dataframe
 % R Bind <- cbind ( cond_1a , cond_2a , cond_3a , cond_1b , cond_2b , cond_3b )
 % R model <- lm ( Bind ~ 1 )
 
 % R library ( car )
 % R analysis <- Anova ( model , idata = idata , idesign =~ Factor1 * Factor2 , type = "III" )
 % R anova_sum = summary ( analysis )
 % Rpull anova_sum
 
 print anova_sum
 

翻译自: https://www.pybloggers.com/2016/02/repeated-measures-anova-in-python-kinda/

你可能感兴趣的:(python,数据分析,机器学习,numpy,数据可视化)