Fisher’s exact test in R: independence test for a small sample

Example

Data

For our example, we want to determine whether there is a statistically significant association between smoking and being a professional athlete. Smoking can only be “yes” or “no” and being a professional athlete can only be “yes” or “no”. The two variables of interest are qualitative variables and we collected data on 14 persons.


Observed frequencies

Our data are summarized in the contingency table below reporting the number of people in each subgroup:

Expected frequencies

Remember that the Fisher’s exact test is used when there is at least one cell in the contingency table of the expected frequencies below 5. To retrieve the expected frequencies, use the chisq.test() function together with $expected:

The contingency table above confirms that we should use the Fisher’s exact test instead of the Chi-square test because there is at least one cell below 5.

Tip: although it is a good practice to check the expected frequencies beforedeciding between the Chi-square and the Fisher test, it is not a big issue if you forget. As you can see above, when doing the Chi-square test in R (with chisq.test()), a warning such as “Chi-squared approximation may be incorrect” will appear. This warning means that the smallest expected frequencies is lower than 5. Therefore, do not worry if you forgot to check the expected frequencies before applying the appropriate test to your data, R will warn you that you should use the Fisher’s exact test instead of the Chi-square test if that is the case.

Fisher’s exact test in R

To perform the Fisher’s exact test in R, use the fisher.test() function as you would do for the Chi-square test:2


The most important in the output is the p-value. You can also retrieve the p-value with:

test$p.value## [1] 0.02097902

Conclusion and interpretation

From the output and from test$p.value we see that the p-value is less than the significance level of 5%. Like any other statistical test, if the p-value is less than the significance level, we can reject the null hypothesis.

⇒ In our context, rejecting the null hypothesis for the Fisher’s exact test of independence means that there is a significant relationship between the two categorical variables (smoking habits and being an athlete or not). Therefore, knowing the value of one variable helps to predict the value of the other variable.

你可能感兴趣的:(Fisher’s exact test in R: independence test for a small sample)