Uses
Among the most frequently used t-tests are:
- A one-sample location test of whether the mean of a population has a value specified in a null hypothesis.
- A two-sample location test of the null hypothesis such that the means of two populations are equal. All such tests are usually called Student's t-tests, though strictly speaking that name should only be used if the variances of the two populations are also assumed to be equal; the form of the test used when this assumption is dropped is sometimes called Welch's t-test. These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping.
- A test of the null hypothesis that the difference between two responses measured on the same statistical unit has a mean value of zero. For example, suppose we measure the size of a cancer patient's tumor before and after a treatment. If the treatment is effective, we expect the tumor size for many of the patients to be smaller following the treatment. This is often referred to as the "paired" or "repeated measures" t-test: see paired difference test.
- A test of whether the slope of a regression line differs significantly from 0.
One-sample t-test
In testing the null hypothesis that the population mean is equal to a specified value μ0, one uses the statistic
where is the sample mean, s is the sample standard deviation of the sample and n is the sample size. The degrees of freedom used in this test are n − 1. Although the parent population does not need to be normally distributed, the distribution of the population of sample means, , is assumed to be normal. By the central limit theorem, if the sampling of the parent population is independent and the first moment of the parent population exists then the sample means will be approximately normal.(The degree of approximation will depend on how close the parent population is to a normal distribution and the sample size, n.)
Slope of a regression line
Suppose one is fitting the model
where x is known, α and β are unknown, and ε is a normally distributed random variable with mean 0 and unknown variance σ2, and Y is the outcome of interest. We want to test the null hypothesis that the slope β is equal to some specified value β0 (often taken to be 0, in which case the null hypothesis is that x and y are independent).
Let
Then
has a t-distribution with n − 2 degrees of freedom if the null hypothesis is true. The standard error of the slope coefficient:
can be written in terms of the residuals. Let
-
Independent two-sample t-test
Equal sample sizes, equal variance
Given two groups (1, 2), this test is only applicable when:
- the two sample sizes (that is, the number, n, of participants of each group) are equal;
- it can be assumed that the two distributions have the same variance;
Violations of these assumptions are discussed below.
The t statistic to test whether the means are different can be calculated as follows:
where
Here is the pooled standard deviation for n=n1=n2 and and are the unbiased estimators of the variances of the two samples. The denominator of t is the standard error of the difference between two means.
For significance testing, the degrees of freedom for this test is 2n − 2 where n is the number of participants in each group.
-
Equal or unequal sample sizes, equal variance
This test is used only when it can be assumed that the two distributions have the same variance. (When this assumption is violated, see below.) Note that the previous formulae are a special case valid when both samples have equal sizes: n = n1 = n2. The t statistic to test whether the means are different can be calculated as follows:
where
is an estimator of the pooled standard deviation of the two samples: it is defined in this way so that its square is an unbiased estimator of the common variance whether or not the population means are the same. In these formulae, ni − 1 is the number of degrees of freedom for each group, and the total sample size minus two (that is, n1 + n2 − 2) is the total number of degrees of freedom, which is used in significance testing.
Equal or unequal sample sizes, unequal variances
Main article: Welch's t-testThis test, also known as Welch's t-test, is used only when the two population variances are not assumed to be equal (the two sample sizes may or may not be equal) and hence must be estimated separately. The t statistic to test whether the population means are different is calculated as:
where
Here s2i is the unbiased estimator of the variance of each of the two samples with ni = number of participants in group i, i=1 or 2. Note that in this case is not a pooled variance. For use in significance testing, the distribution of the test statistic is approximated as an ordinary Student's t distribution with the degrees of freedom calculated using
This is known as the Welch–Satterthwaite equation. The true distribution of the test statistic actually depends (slightly) on the two unknown population variances (see Behrens–Fisher problem).
F-test -
The formula for the one-way ANOVA F-test statistic is
or
The "explained variance", or "between-group variability" is
where denotes the sample mean in the ith group, ni is the number of observations in the ith group, denotes the overall mean of the data, and K denotes the number of groups.
The "unexplained variance", or "within-group variability" is
where Yij is the jth observation in the ith out of K groups and N is the overall sample size. This F-statistic follows the F-distribution with K−1, N −K degrees of freedom under the null hypothesis. The statistic will be large if the between-group variability is large relative to the within-group variability, which is unlikely to happen if the population means of the groups all have the same value.
Note that when there are only two groups for the one-way ANOVA F-test, F=t2where t is the Student's t statistic.
-
Software implementations
Many spreadsheet programs and statistics packages, such as QtiPlot, LibreOffice Calc, Microsoft Excel, SAS, SPSS, Stata, DAP, gretl, R, Python, PSPP, Matlab and Minitab, include implementations of Student's t-test.
Language/Program Function Notes Microsoft Excel pre 2010 TTEST(array1, array2, tails, type)
See [1] Microsoft Excel 2010 and later T.TEST(array1, array2, tails, type)
See [2] LibreOffice TTEST(Data1; Data2; Mode; Type)
See [3] Google Sheets TTEST(range1, range2, tails, type)
See [4] Python scipy.stats.ttest_ind(a, b, axis=0, equal_var=True)
See [5] Matlab ttest(data1, data2)
See [6] Mathematica TTest[{data1,data2}]
See [7] R t.test(data1, data2, var.equal=TRUE)
See [8] SAS PROC TTEST
See [9] Java tTest(sample1, sample2)
See [10] Julia EqualVarianceTTest(sample1, sample2)
See [11] Stata ttest data1 == data2
See [12]