4 Student’s t-test

4.1 One-sample t-test

One of the most basic statistical analyses involves testing whether a group of data is statistically similar to a known value. For example, a researcher may want to know whether the yield data from a wheat experiment was similar to the state-wide average wheat yield for that year. In the example below, yield data (in kg/ha) from 10 different plots is entered manually, and stored in an object named “yield”. The state-wide dryland wheat average for the same year was 1812 kg/ha. Summary information of the wheat yield, and the differences between the experiment yields and the state average (“yld.diff”) are calculated. The t.test() function is used to calculate the one-sample Student’s t-test.

yield <- c(2280, 2690, 2080, 2820, 1340, 2080, 2480, 2420, 2150, 1880)
yld.diff <- yield - 1810
yld.dat<-data.frame(yield, yld.diff)
yld.dat
##    yield yld.diff
## 1   2280      470
## 2   2690      880
## 3   2080      270
## 4   2820     1010
## 5   1340     -470
## 6   2080      270
## 7   2480      670
## 8   2420      610
## 9   2150      340
## 10  1880       70
colMeans(yld.dat)
##    yield yld.diff 
##     2222      412

The mean wheat yield in the experiment was 2222 kg/ha, 412 kg/ha greater than the state-wide average. There are several possible methods to conduct the one-sample t-test. The first is to subtract the state-wide mean from each observation from the experiment, then test whether the mean difference in yield is different from zero. This is done simply by running the t.test() function on the “yld.diff” object calculated above.

t.test(yld.diff)
## 
##  One Sample t-test
## 
## data:  yld.diff
## t = 3.065, df = 9, p-value = 0.01346
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  107.9233 716.0767
## sample estimates:
## mean of x 
##       412

The t.test() output specifies that the alternative hypothesis is that the true mean is not equal to zero, and gives a P-value of 0.013; there is fairly strong evidence that the wheat yield in the experiment is different from the state-wide average. The t.test() function also allows the user to specify a value other than zero to test. This eliminates the need to first calculate the difference of each mean from the state-wide average, and instead, specify the state-wide average within the t.test() function using the mu argument. mu is a common statistical symbol for the population mean.

t.test(yield, mu=1810)
## 
##  One Sample t-test
## 
## data:  yield
## t = 3.065, df = 9, p-value = 0.01346
## alternative hypothesis: true mean is not equal to 1810
## 95 percent confidence interval:
##  1917.923 2526.077
## sample estimates:
## mean of x 
##      2222

Results are identical to testing whether the difference between experiment yields and the state-wide average yield is different from zero. In both of the above cases, a two-sided alternative hypothesis is tested; that is, the alternate hypothesis being tested is that the experimental average is different from the state-wide average yield. In some cases, though, a one-sided alternative hypothesis would make more practical sense. Perhaps it is assumed a priori that the treatments in this experiment will improve wheat yields. Therefore it would be more logical to test whether the wheat yield in the experiment was greater than (not simply different from) the state-wide average. This can be done using the alternative argument in the t.test() function. The alternative argument can be set to either ‘greater’ or ‘less’.

t.test(yield, alternative="greater", mu=1810)
## 
##  One Sample t-test
## 
## data:  yield
## t = 3.065, df = 9, p-value = 0.006731
## alternative hypothesis: true mean is greater than 1810
## 95 percent confidence interval:
##  1975.595      Inf
## sample estimates:
## mean of x 
##      2222

As expected, the one-sided alternative produces a p-value of exactly one-half of the two-sided alternative. It appears that there is strong evidence that the experimental mean wheat yield is greater than the state-wide average wheat yield.

4.2 Two-sample t-Test

The two-sample t-test is more common than the one-sample test in designed agricultural experiments. In a two-sample t-test, there are two groups (often experimental treatments) on which data is collected. Some examples where the two-sample t-test would be used might include testing isogenic crop varieties to test for yield drag, or comparing two herbicide formulations for absorption or efficacy. Another example from the published literature can be found in Kniss et al. (2011), where glyphosate-resistant sugarbeet was compared with conventional sugarbeet varieties. For this study, 11 fields in Wyoming were split, with one side planted to glyphosate-resistant varieties, and the other side planted to conventional varieties. The two sides of the field were then managed as the grower thought best for the rest of the year.

sbeet.dat <- read.csv("http://rstats4ag.org/data/sugarbeet.csv")
sbeet.dat$Yield <- round(sbeet.dat$Yield * 2.24, 1) # Convert data to SI units
head(sbeet.dat)
##   Field Type Yield
## 1     1    C  72.8
## 2     1    R  74.4
## 3     2    C  64.1
## 4     2    R  65.4
## 5     3    C  56.7
## 6     3    R  68.8

In the “sugarbeet.csv” file, each field is represented in two data rows: one row for the glyphosate-resistant (‘R’) side, and a second row for the conventional side (‘C’). Therefore, prior to running the t.test function, the data is divided into two separate data frames using the subset function. The subset function takes the sbeet.dat data frame, and copies only the observations (rows) where the column “Type” matches an argument; ”R” for the first, and ”C” for the second. This results in one data frame that contains only the glyphosate-resistant observations (“RR”) and one data frame that contains only the conventional observations (“CON”).

RR <- subset(sbeet.dat, sbeet.dat$Type == "R")
CON <- subset(sbeet.dat, sbeet.dat$Type == "C")
t.test(RR$Yield, CON$Yield)
## 
##  Welch Two Sample t-test
## 
## data:  RR$Yield and CON$Yield
## t = 1.3745, df = 19.079, p-value = 0.1852
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.064981 19.628617
## sample estimates:
## mean of x mean of y 
##  58.60909  50.82727

The t.test function then compares the “Yield” column from the RR data frame to yield data in the the CON data frame. The t-test output provides the mean yield for each group (58.6 tons/ha for glyphosate-resistant, and 50.8 tons/ha for the conventional). In this case it appears that due to the variability in the data, there is not strong evidence for a difference between the two systems (P=0.185). Since there are only two groups in the “Type” column, we can simplify the code above by not subsetting the data ahead of time. We can simply use the t.test() function and specify that Type contains the relevant treatment information by using the formula response ~ groups, specifically in this case: Yield ~ Type.

t.test(Yield ~ Type, data=sbeet.dat)

4.3 Paired t-Test

A special case of the two-sample t-test is the paired t-test. In this sugarbeet example, the two-sample t-test assumes the 11 observations on the glyphosate-resistant fields are independent from the 11 observations on the conventional fields. In reality, this was not the case. The way the study was carried out, conventional sugarbeet varieties were planted in the same (or immediately adjacent) field as the glyphosate-resistant varieties. Therefore, the glyphosate-resistant observation was not actually independent from the observation on the conventional side of the field. For each field, weather, irrigation, fertility, etc. were all the same for both sides of the field. We can therefore assume that most of the variability between the conventional and glyphosate-resistant portions of the first field is due to the varieties, or the way each side of the field was managed (herbicides, tillage, etc.), and not due to unrelated external factors. Because the purpose of the study was to compare the glyphosate-resistant and conventional systems, it is more desirable to consider the two sides of the same field as “paired” samples, to which different treatments were applied. This can be achieved by adding the argument paired=T to the t.test() function.

t.test(Yield ~ Type, data=sbeet.dat, paired=T)
## 
##  Paired t-test
## 
## data:  Yield by Type
## t = -3.3302, df = 10, p-value = 0.007615
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -12.988388  -2.575248
## sample estimates:
## mean of the differences 
##               -7.781818

In this case, we reach a very different conclusion; that within a field pair, there is very strong evidence that the difference between group means is different from zero (P=0.008); or that within each field, yield in the glyphosate-resistant system was different from conventional sugarbeet yields. On average, the glyphosate-resistant system resulted in 7.8 tons/ha greater yield than the conventional system. Notice that the paired t-test provides the exact same result as if we subtract the conventional yields from the glyphosate-resistant yields for each pair, then conduct a one-sample t-test on the difference.

RR$Yield - CON$Yield -> diff.Yield
t.test(diff.Yield)

In this example, there was no a priori expectation that the glyphosate-resistant system would yield more or less than the conventional system, and therefore the two-sided alternative hypothesis is appropriate.