# Seed Germination

Most weeds multiply and spread by seeds and understanding how seeds germinate, and how ambient factors affect seed dormancy, germination, vigor and early growth is instrumental for our continuous struggle to deplete soils of weed seeds. On agricultural land in Denmark we find about 30,000 viable weed seeds/m2. It has dropped to half of what we found 45 years ago. This drop is due to many factors, but probably the most important one is the continuous use of herbicides.

The weed's ability to set seed and the seeds' ability to germinate is therefore important for us to try understand and counteract seed soil bank enrichment with agronomic means. Therefore, seed testing experiments to assess germination ability and vigor should support future action of depleting weed pressure.

The International Seed Testing Association ISTA produces internationally agreed rules for seed sampling and testing and disseminates knowledge in seed science and technology. This facilitates seed trading nationally and internationally and is focusing on agricultural species. The principles of studying agricultural species and weed species are the same, but for weed species we are not interested in certifying their germinability, but to discover how to counteract their germination ability and vigor. In crop science and breeding the aim is the opposite to select for germinability and vigor.

Nonlinear regression is commonly used to evaluate cumulative germination in cause of time, temperature or other external gradients. This approach, however, is problematic as it ignores that successive observations on the germination curve are highly correlated. The total number of seeds that have germinated at a particular time is highly dependent on the number of seeds that already have germinated. Moreover, variation in the proportions of germinated seeds will vary with time, being largest at intermediate monitoring intervals and smallest at the initial and final intervals, where germination activity is low. This means that the fundamental assumptions implicitly underlying nonlinear regression (independence among proportions and variance homogeneity) are not met.

In experiments, where new batches of seed are used for each inspection time, data can be analyzed by well-established models for binomial data such as log-logistic regression or other generalized linear models.

In experiments where the same batch of seeds is followed over time, the same seeds are observed repeatedly for some pre-specified duration of an experiment. We observe the waiting time until the event of interest occurs and the resulting data are often referred to as event-time or time-to-event data.

In this chapter we will concentrate on time-to-event data Ritz et al. 2013. The Figure shows the basic arrangement of this kind of experiments and how we arrange the data in R

Figure 1. Example of how an experiment set up with four petridishes and how the data are arranged for being prepared for time-to-event analysis

For various reasons germination need not occur at all during the experiment. We refer to this as right-censoring. Consequently, a plausible statistical model has to incorporate that the event of interest, seed germination, may occur after termination of the experiment, assigning a positive probability to the situation that germination is not observed. This is why the final inspection time has an Inf value that means infinity concludes the experiment.

Also, the germination is obviously not observed exactly at the very point in time, when the event took place. For instance, seeds in petridishes may only be inspected on a daily or weekly basis and in case a seed has germinated the only information we get: because germination took place between two inspection intervals. This type of time-to-event data is often referred to as grouped data or interval-censored. As a consequence an appropriate model should be specified by means of the probability of germination within the monitoring intervals. This probability will depend heavily on the monitoring intervals. Ignoring this grouped-data structure may lead to biased estimates. The finer the time grid the smaller the error caused by ignoring discretization.

## Example 1: Chickweed

library(drc)

##   start end count
## 1     0  12     0
## 2    12  22     0
## 3    22  30     0

tail(chickweed,3)

##    start   end count
## 33 266.5 276.5     1
## 34 276.5 281.5     0
## 35 281.5   Inf   160

chickweed.m1 <- drm(count~start+end, data = chickweed, fct = LL.3(), type = "event")


Notice the model count~start+end operates with two independent variables the start and Â´endÂ´. It means that the intervals between inspection can vary; there is not need to have the same inspection intervals for the whole period of the experiment.

summary(chickweed.m1)

##
## Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)
##
## Parameter estimates:
##
##                Estimate Std. Error   t-value p-value
## b:(Intercept) -20.76747    2.94423  -7.05362       0
## d:(Intercept)   0.20011    0.02830   7.07108       0
## e:(Intercept) 196.05308    2.50570  78.24272       0


The output from the drm()is the same as for other non-linear regressions, and the three parameter log-logistic model, LL.3(), is also familiar in dose-response work. What is new here is that the model needs start+end,...,type = "event". The regression is illustrated in Figure 2.

plot(chickweed.m1, xlab = "Time (hours)", ylab = "Proportion germinated",
xlim=c(0, 340), ylim=c(0, 0.25), log="", lwd=2, cex=1.2)


Figure 2. Germination curves of ALS resistant Stellaria media from Danish fields in 2001. The code is from the example data set chickweed in the drc package Ritz et al. 2013. Note that we now use the actual time on the x-axis and not the logarithm as we did in the dose-response curve chapter.

On the basis of the regression parameters we can see that the maximum germination rate is not that high and the $T_{50}$ is 196 hours, meaning that it takes 196 days to make 0.20/2= 0.1 or 10 % of the seeds germinate. The interesting biological parameter are the upper limit (maximum) of germination 0.20 (0.03) and the time to 50% germination, $T_{50}$, 196 (2.5).

As was the case in the dose-response chapter we can also estimate how long it takes to make 10 percent or 90 percent of the seed germinate.

## Calculating t10, $T_{50}$, t90 for the distribution of viable seeds
ED(chickweed.m1, c(10, 50, 90),interval="delta")

##
## Estimated effective doses
## (Delta method-based confidence interval(s))
##
##      Estimate Std. Error    Lower  Upper
## 1:10 176.3700     3.4930 169.5239 183.22
## 1:50 196.0531     2.5057 191.1420 200.96
## 1:90 217.9328     4.2730 209.5578 226.31


If a cumulative germination curve had been analysed by a nonlinear regressions, the parameters would be close to each other, but their standard errors would be overly precise. If we are interested only in the curve fitting and not the uncertainties of parameters, either ways of fitting may suffice. But usually we want to get the correct standard error of parameters to compare germination ability and/or vigor among species or biotypes, and the time-to-event regression is the way to go to get a better appreciation of the uncertainties in germination ability and may prevent us to jump to wrong conclusions.

## Example 2: Bromus tectorum

Another example uses germination data for three different biotypes of Bromus tectorum collected in Wyoming. Seeds were placed in petridishes on moist filter paper in a germination chamber. The petridishes were checked for germinated seeds once daily from the initiation of the experiment for 20 days. Each day the number of newly germinated seeds was recorded.

library(drc)

##   Source starttime endtime REP Daily.germ Cum.Germ
## 1     S1         0       1   1          0        0
## 2     S1         0       1   2          0        0
## 3     S1         0       1   3          0        0


There are 6 columns in the data set, the Source is the B. tectorum biotype, starttime and endtime denote the beginning and end of the interval represented by each recording time. This is similar to the chickweed data set. The Daily.germ column represents the number of germination events that occurred since the last observation. The Cum.germ column is a cumulative total that is often used (incorrectly) for analyzing germination over time.

The final observation in this experiment occurred 20 days after it began. Therefore, we have an interval in the data set from time period 20 to Inf. Many of the petridishes had not germinated all seeds at the final observation. We don't know for certain that these seeds never would have germinated. All we can say is that they had not yet germinated at the final observation of day 20. In the data set, we incorporate this uncertainty by recording the number of non germinated seeds in the interval 20 to Inf.

Below we have the the results of the fitting and the illustration of the fit.

event.drc<-drm(Daily.germ ~ starttime+endtime, Source, data=germ.dat, fct=LL.3(names=c("Slope","Max","T50")), type="event")
summary(event.drc)

##
## Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)
##
## Parameter estimates:
##
##            Estimate Std. Error    t-value p-value
## Slope:S1  -9.033346   0.672866 -13.425170       0
## Slope:S2 -21.201833   1.479204 -14.333270       0
## Slope:S3  -6.881535   0.487104 -14.127454       0
## Max:S1     0.723644   0.031957  22.644366       0
## Max:S2     0.861366   0.024315  35.425698       0
## Max:S3     0.823871   0.027519  29.938221       0
## T50:S1    11.159668   0.183749  60.733165       0
## T50:S2    10.207110   0.067093 152.134036       0
## T50:S3     9.481473   0.189810  49.952556       0

plot(event.drc, log="", ylim=c(0,1), xlim=c(0,20),   ylab="Proportion of \ngerminated seed", xlab="Time (d)",
legendPos=c(20,.40))


Figure 3. Regression fits for the time-to-event 3 parameter log-logistic regression. Please note that the curves for the time to event do stop at 20 days, because we do not know about future germination beyond 20 days.

In these codes, we have set the names argument within the drc fitting function LL.3(). By default, the three parameters are named b, d, and e. For users who deal with these models on a regular basis, it is certainly acceptable to leave the default parameter names and interpret them accordingly. However, renaming the parameters to Slope, Max, and $T_{50}$ provides a useful reminder of how to interpret the parameters in the model output. This is especially helpful when presenting the results to someone unfamiliar with R syntax.

In Germination studies all the parameters have some biological significance. The upper limit being the maximal germination after 20 day; $T_{50}$ is the time taken to reach 50% germination relative to the upper limit. The slope could give an indication of the vigor of the germinating seeds. The $T_{5}$ or $T_{95}$ are the time taken to make 5 and 95 percent of the seeds germinated, respectively in relation to the upper limits .

#The ED5,ED50,and ED95 for the the germination
ED(event.drc,c(5,50,95),interval="delta")

##
## Estimated effective doses
## (Delta method-based confidence interval(s))
##
##        Estimate Std. Error     Lower   Upper
## S1:5   8.055492   0.223188  7.618051  8.4929
## S1:50 11.159668   0.183749 10.799526 11.5198
## S1:95 15.460037   0.477398 14.524354 16.3957
## S2:5   8.883609   0.102082  8.683532  9.0837
## S2:50 10.207110   0.067093 10.075610 10.3386
## S2:95 11.727789   0.139815 11.453757 12.0018
## S3:5   6.180903   0.216676  5.756226  6.6056
## S3:50  9.481473   0.189810  9.109453  9.8535
## S3:95 14.544531   0.545603 13.475168 15.6139


Usually the $T_{50}$ is the most precise estimate and as seen below this does also apply here when we look at the 95% Confidence Interval.

#Coefficient of variation for ED50
T50<-ED(event.drc,c(50),display=FALSE)$EDdisplay[,2]*100/ED(event.drc,c(50),display=FALSE)$EDdisplay[,1]
T50

##     S1:50     S2:50     S3:50
## 1.6465468 0.6573151 2.0018996

#Coefficient of variation for ED5
T5<-ED(event.drc,c(5),display=FALSE)$EDdisplay[,2]*100/ED(event.drc,c(5),display=FALSE)$EDdisplay[,1]
T5

##     S1:5     S2:5     S3:5
## 2.770631 1.149105 3.505573

#Coefficient of variation for ED95
T95<-ED(event.drc,c(95),display=FALSE)$EDdisplay[,2]*100/ED(event.drc,c(95),display=FALSE)$EDdisplay[,1]
T95

##    S1:95    S2:95    S3:95
## 3.087947 1.192167 3.751262


Direct comparison of model parameters can be done by using the compParm() function in drc. Because we have used the names argument above, we need to specify those names in the compParm() syntax. If we had not used the names argument, we would compare slopes by specifying the b parameter, maximum asymptotes d parameter, and $T_{50}$ the e parameter.

compParm(event.drc, "Slope")

##
## Comparison of parameter 'Slope'
##
##         Estimate Std. Error    t-value p-value
## S1/S2   0.426064   0.043483 -13.198989  0.0000
## S1/S3   1.312694   0.134887   2.318197  0.0204
## S2/S3   3.080975   0.306211   6.795878  0.0000


Slopes are different between S2 and the rest as also seen in Figure 3. It means that the vigor of the germination process for S3 is greater that for the
other two biotypes.

Looking at the upper limits:

compParm(event.drc, "Max")

##
## Comparison of parameter 'Max'
##
##        Estimate Std. Error   t-value p-value
## S1/S2  0.840112   0.044032 -3.631179  0.0003
## S1/S3  0.878346   0.048635 -2.501398  0.0124
## S2/S3  1.045510   0.045723  0.995356  0.3196


The maximum germination for biotypes S2 and S3 is similar, The biotype S1 has a lower maximum germination than do the other two biotypes. The time required for 50% germination can also be compared using the compParm() function.

compParm(event.drc, "T50")

##
## Comparison of parameter 'T50'
##
##       Estimate Std. Error  t-value p-value
## S1/S2 1.093323   0.019384 4.814550   0e+00
## S1/S3 1.176997   0.030508 5.801602   0e+00
## S2/S3 1.076532   0.022683 3.373972   7e-04


And they are in all instances different form each other.
The $T_{50}$ which can also be associated with speed of germination are different form each other.

Looking at Figure 3 it is noted that the logistic regression does not seem to be that good to capture the upper limits for S2 and S3 and perhaps the symmetric sigmoid curves would could be replaced by an asymmetric one. By using the Weibull 1 model, W1.3(), which is asymmetric(i.e. $T_{50}$ is not a parameter in the model). We may be better to model the upper limit. However, the Weibull 1 is rather sensitive to too many zeros so we must get rid of those in order to get a proper fit.This is done by omitting observations with the functions subset(germ.dat, Daily.germ!=0). This function gives some warnings that we can ignore for the time being.

#Fitting LL.3() with teh same conditions as for teh Weilbull 1 model
event.drc.1<-drm(Daily.germ ~ starttime+endtime, Source, fct=LL.3(names=c("Slope","Max","T50")), type="event",data=subset(germ.dat, Daily.germ!=0))

## Warning in dose/parmMat[, 4]: longer object length is not a multiple of
## shorter object length

## Warning in resp - predVec: longer object length is not a multiple of
## shorter object length

## Warning in matrix(c(predVec, resVec), length(dose), 2): data length [152]
## is not a sub-multiple or multiple of the number of rows [31]

## Warning in log(dose) - log(parmMat[, 4]): longer object length is not a
## multiple of shorter object length

## Warning in dose/parmMat[, 4]: longer object length is not a multiple of
## shorter object length

#Fitting LL.3()
event.drc.W1 <- drm(Daily.germ ~ starttime+endtime, Source, data=subset(germ.dat, Daily.germ!=0), fct=W1.3(), type="event")

## Warning in log(dose) - log(parmMat[, 4]): longer object length is not a
## multiple of shorter object length

## Warning in resp - predVec: longer object length is not a multiple of
## shorter object length

## Warning in matrix(c(predVec, resVec), length(dose), 2): data length [152]
## is not a sub-multiple or multiple of the number of rows [31]

## Warning in log(dose) - log(parmMat[, 4]): longer object length is not a
## multiple of shorter object length

## Warning in dose/parmMat[, 4]: longer object length is not a multiple of
## shorter object length

#To extract the T50
ED(event.drc.W1,50,interval="delta")

##
## Estimated effective doses
## (Delta method-based confidence interval(s))
##
##        Estimate Std. Error     Lower   Upper
## S1:50 11.043969   0.183815 10.683698 11.4042
## S2:50 10.151655   0.061806 10.030518 10.2728
## S3:50  9.352818   0.211253  8.938769  9.7669


The $T_{50}$ did no change much but in Figure 4, right hand side the subtle changes in the curve by going from a log-logistic (LL.3()) to an asymmetric (W1.3()) curve is not dramatic, but from the the two fits, there are some differences in the way the upper limit is being described. The Weibull 1 seems to do a better job than the log-logistic in Figure 3

par(mfrow=c(1,2), mar=c(4.2,6.2,0.5,0.5))
plot(event.drc.W1, log="", ylim=c(0,1), xlim=c(0,20),   ylab="Proportion of \ngerminated seed", xlab="Time (d)",
legendPos=c(20,.40))

plot(event.drc.W1, log="", ylim=c(0,1), xlim=c(0,20),   ylab="Proportion of \ngerminated seed", xlab="Time (d)",type="none", legendPos=c(20,.40))
plot(event.drc.1, log="", ylim=c(0,1), xlim=c(0,20),   ylab="Proportion of \ngerminated seed", xlab="Time (d)",type="none",add=TRUE, col=c(2), legendPos=c(20,.40))


Figure 4. An assymetric Weilbull 1 model(W1.3())(right) and comparison with the symmetric log-logistic model (LL.3()) (left).

Even though the changes are not dramatic

AIC(event.drc.1)

## [1] 2600.177

AIC(event.drc.W1)

## [1] 2526.731


the AIC does drop somewhat when going form a log-logistic to an Weibull 1, which indicates that overall the asymmetric W1.3() fits best to the data.