# Seed Germination

Most weeds multiply and spread by seeds and understanding how seeds germinate, and how ambient factors affect seed dormancy, germination, vigor and early growth is instrumental for our continuous struggle to deplete soils of weed seeds. On agricultural land in Denmark we find about 30,000 viable weed seeds/m^{2.} It has dropped to half of what we found 45 years ago. This drop is due to many factors, but probably the most important one is the continuous use of herbicides.

The weed's ability to set seed and the seeds' ability to germinate is therefore important for us to try understand and counteract seed soil bank enrichment with agronomic means. Therefore, seed testing experiments to assess germination ability and vigor should support future action of depleting weed pressure.

The International Seed Testing Association ISTA produces internationally agreed rules for seed sampling and testing and disseminates knowledge in seed science and technology. This facilitates seed trading nationally and internationally and is focusing on agricultural species. The principles of studying agricultural species and weed species are the same, but for weed species we are not interested in certifying their germinability, but to discover how to counteract their germination ability and vigor. In crop science and breeding the aim is the opposite to select for germinability and vigor.

Nonlinear regression is commonly used to evaluate cumulative germination in cause of time, temperature or other external gradients. This approach, however, is problematic as it ignores that successive observations on the germination curve are highly correlated. The total number of seeds that have germinated at a particular time is highly dependent on the number of seeds that already have germinated. Moreover, variation in the proportions of germinated seeds will vary with time, being largest at intermediate monitoring intervals and smallest at the initial and final intervals, where germination activity is low. This means that the fundamental assumptions implicitly underlying nonlinear regression (independence among proportions and variance homogeneity) are not met.

In experiments, where new batches of seed are used for each inspection time, data can be analyzed by well-established models for binomial data such as log-logistic regression or other generalized linear models.

In experiments where the same batch of seeds is followed over time, the same seeds are observed repeatedly for some pre-specified duration of an experiment. We observe the waiting time until the event of interest occurs and the resulting data are often referred to as event-time or time-to-event data.

In this chapter we will concentrate on time-to-event data Ritz et al. 2013. The Figure shows the basic arrangement of this kind of experiments and how we arrange the data in R

** Figure 1. Example of how an experiment set up with four petridishes and how the data are arranged for being prepared for time-to-event analysis**

For various reasons germination need not occur at all during the experiment. We refer to this as right-censoring. Consequently, a plausible statistical model has to incorporate that the event of interest, seed germination, may occur after termination of the experiment, assigning a positive probability to the situation that germination is not observed. This is why the final inspection time has an `Inf`

value that means infinity concludes the experiment.

Also, the germination is obviously not observed exactly at the very point in time, when the event took place. For instance, seeds in petridishes may only be inspected on a daily or weekly basis and in case a seed has germinated the only information we get: because germination took place between two inspection intervals. This type of time-to-event data is often referred to as grouped data or interval-censored. As a consequence an appropriate model should be specified by means of the probability of germination within the monitoring intervals. This probability will depend heavily on the monitoring intervals. Ignoring this grouped-data structure may lead to biased estimates. The finer the time grid the smaller the error caused by ignoring discretization.

## Example 1: Chickweed

```
library(drc)
head(chickweed,3)
```

```
## start end count
## 1 0 12 0
## 2 12 22 0
## 3 22 30 0
```

```
tail(chickweed,3)
```

```
## start end count
## 33 266.5 276.5 1
## 34 276.5 281.5 0
## 35 281.5 Inf 160
```

```
chickweed.m1 <- drm(count~start+end, data = chickweed, fct = LL.3(), type = "event")
```

Notice the model `count~start+end`

operates with two independent variables the `start`

and Â´endÂ´. It means that the intervals between inspection can vary; there is not need to have the same inspection intervals for the whole period of the experiment.

```
summary(chickweed.m1)
```

```
##
## Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) -20.76747 2.94423 -7.05362 0
## d:(Intercept) 0.20011 0.02830 7.07108 0
## e:(Intercept) 196.05308 2.50570 78.24272 0
```

The output from the `drm()`

is the same as for other non-linear regressions, and the three parameter log-logistic model, `LL.3()`

, is also familiar in dose-response work. What is new here is that the model needs `start+end,...,type = "event"`

. The regression is illustrated in **Figure 2**.

```
plot(chickweed.m1, xlab = "Time (hours)", ylab = "Proportion germinated",
xlim=c(0, 340), ylim=c(0, 0.25), log="", lwd=2, cex=1.2)
```

**Figure 2. Germination curves of ALS resistant Stellaria media from Danish fields in 2001. The code is from the example data set chickweed in the drc package Ritz et al. 2013. Note that we now use the actual time on the x-axis and not the logarithm as we did in the dose-response curve chapter.**

On the basis of the regression parameters we can see that the maximum germination rate is not that high and the is 196 hours, meaning that it takes 196 days to make 0.20/2= 0.1 or 10 % of the seeds germinate. The interesting biological parameter are the upper limit (maximum) of germination 0.20 (0.03) and the time to 50% germination, , 196 (2.5).

As was the case in the dose-response chapter we can also estimate how long it takes to make 10 percent or 90 percent of the seed germinate.

```
## Calculating t10, $T_{50}$, t90 for the distribution of viable seeds
ED(chickweed.m1, c(10, 50, 90),interval="delta")
```

```
##
## Estimated effective doses
## (Delta method-based confidence interval(s))
##
## Estimate Std. Error Lower Upper
## 1:10 176.3700 3.4930 169.5239 183.22
## 1:50 196.0531 2.5057 191.1420 200.96
## 1:90 217.9328 4.2730 209.5578 226.31
```

If a cumulative germination curve had been analysed by a nonlinear regressions, the parameters would be close to each other, but their standard errors would be overly precise. If we are interested only in the curve fitting and not the uncertainties of parameters, either ways of fitting may suffice. But usually we want to get the correct standard error of parameters to compare germination ability and/or vigor among species or biotypes, and the time-to-event regression is the way to go to get a better appreciation of the uncertainties in germination ability and may prevent us to jump to wrong conclusions.

## Example 2: *Bromus tectorum*

Another example uses germination data for three different biotypes of *Bromus tectorum* collected in Wyoming. Seeds were placed in petridishes on moist filter paper in a germination chamber. The petridishes were checked for germinated seeds once daily from the initiation of the experiment for 20 days. Each day the number of newly germinated seeds was recorded.

```
library(drc)
germ.dat<-read.csv("http://rstats4ag.org/data/broteGerm.csv")
head(germ.dat,3)
```

```
## Source starttime endtime REP Daily.germ Cum.Germ
## 1 S1 0 1 1 0 0
## 2 S1 0 1 2 0 0
## 3 S1 0 1 3 0 0
```

There are 6 columns in the data set, the `Source`

is the *B. tectorum* biotype, `starttime`

and `endtime`

denote the beginning and end of the interval represented by each recording time. This is similar to the chickweed data set. The `Daily.germ`

column represents the number of germination events that occurred since the last observation. The `Cum.germ`

column is a cumulative total that is often used (incorrectly) for analyzing germination over time.

The final observation in this experiment occurred 20 days after it began. Therefore, we have an interval in the data set from time period 20 to `Inf`

. Many of the petridishes had not germinated all seeds at the final observation. We don't know for certain that these seeds *never* would have germinated. All we can say is that they had not yet germinated at the final observation of day 20. In the data set, we incorporate this uncertainty by recording the number of non germinated seeds in the interval 20 to `Inf`

.

Below we have the the results of the fitting and the illustration of the fit.

```
event.drc<-drm(Daily.germ ~ starttime+endtime, Source, data=germ.dat, fct=LL.3(names=c("Slope","Max","T50")), type="event")
summary(event.drc)
```

```
##
## Model fitted: Log-logistic (ED50 as parameter) with lower limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## Slope:S1 -9.033346 0.672866 -13.425170 0
## Slope:S2 -21.201833 1.479204 -14.333270 0
## Slope:S3 -6.881535 0.487104 -14.127454 0
## Max:S1 0.723644 0.031957 22.644366 0
## Max:S2 0.861366 0.024315 35.425698 0
## Max:S3 0.823871 0.027519 29.938221 0
## T50:S1 11.159668 0.183749 60.733165 0
## T50:S2 10.207110 0.067093 152.134036 0
## T50:S3 9.481473 0.189810 49.952556 0
```

```
plot(event.drc, log="", ylim=c(0,1), xlim=c(0,20), ylab="Proportion of \ngerminated seed", xlab="Time (d)",
legendPos=c(20,.40))
```

** Figure 3. Regression fits for the time-to-event 3 parameter log-logistic regression. Please note that the curves for the time to event do stop at 20 days, because we do not know about future germination beyond 20 days.**

In these codes, we have set the `names`

argument within the drc fitting function `LL.3()`

. By default, the three parameters are named `b, d`

, and `e`

. For users who deal with these models on a regular basis, it is certainly acceptable to leave the default parameter names and interpret them accordingly. However, renaming the parameters to `Slope`

, `Max`

, and provides a useful reminder of how to interpret the parameters in the model output. This is especially helpful when presenting the results to someone unfamiliar with `R`

syntax.

In Germination studies all the parameters have some biological significance. The upper limit being the maximal germination after 20 day; is the time taken to reach 50% germination relative to the upper limit. The slope could give an indication of the vigor of the germinating seeds. The or are the time taken to make 5 and 95 percent of the seeds germinated, respectively in relation to the upper limits .

```
#The ED5,ED50,and ED95 for the the germination
ED(event.drc,c(5,50,95),interval="delta")
```

```
##
## Estimated effective doses
## (Delta method-based confidence interval(s))
##
## Estimate Std. Error Lower Upper
## S1:5 8.055492 0.223188 7.618051 8.4929
## S1:50 11.159668 0.183749 10.799526 11.5198
## S1:95 15.460037 0.477398 14.524354 16.3957
## S2:5 8.883609 0.102082 8.683532 9.0837
## S2:50 10.207110 0.067093 10.075610 10.3386
## S2:95 11.727789 0.139815 11.453757 12.0018
## S3:5 6.180903 0.216676 5.756226 6.6056
## S3:50 9.481473 0.189810 9.109453 9.8535
## S3:95 14.544531 0.545603 13.475168 15.6139
```

Usually the is the most precise estimate and as seen below this does also apply here when we look at the 95% Confidence Interval.

```
#Coefficient of variation for ED50
T50<-ED(event.drc,c(50),display=FALSE)$EDdisplay[,2]*100/ED(event.drc,c(50),display=FALSE)$EDdisplay[,1]
T50
```

```
## S1:50 S2:50 S3:50
## 1.6465468 0.6573151 2.0018996
```

```
#Coefficient of variation for ED5
T5<-ED(event.drc,c(5),display=FALSE)$EDdisplay[,2]*100/ED(event.drc,c(5),display=FALSE)$EDdisplay[,1]
T5
```

```
## S1:5 S2:5 S3:5
## 2.770631 1.149105 3.505573
```

```
#Coefficient of variation for ED95
T95<-ED(event.drc,c(95),display=FALSE)$EDdisplay[,2]*100/ED(event.drc,c(95),display=FALSE)$EDdisplay[,1]
T95
```

```
## S1:95 S2:95 S3:95
## 3.087947 1.192167 3.751262
```

Direct comparison of model parameters can be done by using the `compParm()`

function in `drc`

. Because we have used the `names`

argument above, we need to specify those names in the `compParm()`

syntax. If we had not used the `names`

argument, we would compare slopes by specifying the *b* parameter, maximum asymptotes *d* parameter, and the *e* parameter.

```
compParm(event.drc, "Slope")
```

```
##
## Comparison of parameter 'Slope'
##
## Estimate Std. Error t-value p-value
## S1/S2 0.426064 0.043483 -13.198989 0.0000
## S1/S3 1.312694 0.134887 2.318197 0.0204
## S2/S3 3.080975 0.306211 6.795878 0.0000
```

Slopes are different between S2 and the rest as also seen in **Figure 3**. It means that the vigor of the germination process for S3 is greater that for the

other two biotypes.

Looking at the upper limits:

```
compParm(event.drc, "Max")
```

```
##
## Comparison of parameter 'Max'
##
## Estimate Std. Error t-value p-value
## S1/S2 0.840112 0.044032 -3.631179 0.0003
## S1/S3 0.878346 0.048635 -2.501398 0.0124
## S2/S3 1.045510 0.045723 0.995356 0.3196
```

The maximum germination for biotypes S2 and S3 is similar, The biotype S1 has a lower maximum germination than do the other two biotypes. The time required for 50% germination can also be compared using the `compParm()`

function.

```
compParm(event.drc, "T50")
```

```
##
## Comparison of parameter 'T50'
##
## Estimate Std. Error t-value p-value
## S1/S2 1.093323 0.019384 4.814550 0e+00
## S1/S3 1.176997 0.030508 5.801602 0e+00
## S2/S3 1.076532 0.022683 3.373972 7e-04
```

And they are in all instances different form each other.

The which can also be associated with speed of germination are different form each other.

Looking at ** Figure 3** it is noted that the logistic regression does not seem to be that good to capture the upper limits for S2 and S3 and perhaps the symmetric sigmoid curves would could be replaced by an asymmetric one. By using the Weibull 1 model, `W1.3()`

, which is asymmetric(i.e. is not a parameter in the model). We may be better to model the upper limit. However, the Weibull 1 is rather sensitive to too many zeros so we must get rid of those in order to get a proper fit.This is done by omitting observations with the functions `subset(germ.dat, Daily.germ!=0)`

. This function gives some warnings that we can ignore for the time being.

```
#Fitting LL.3() with teh same conditions as for teh Weilbull 1 model
event.drc.1<-drm(Daily.germ ~ starttime+endtime, Source, fct=LL.3(names=c("Slope","Max","T50")), type="event",data=subset(germ.dat, Daily.germ!=0))
```

```
## Warning in dose/parmMat[, 4]: longer object length is not a multiple of
## shorter object length
```

```
## Warning in resp - predVec: longer object length is not a multiple of
## shorter object length
```

```
## Warning in matrix(c(predVec, resVec), length(dose), 2): data length [152]
## is not a sub-multiple or multiple of the number of rows [31]
```

```
## Warning in log(dose) - log(parmMat[, 4]): longer object length is not a
## multiple of shorter object length
```

```
## Warning in dose/parmMat[, 4]: longer object length is not a multiple of
## shorter object length
```

```
#Fitting LL.3()
event.drc.W1 <- drm(Daily.germ ~ starttime+endtime, Source, data=subset(germ.dat, Daily.germ!=0), fct=W1.3(), type="event")
```

```
## Warning in log(dose) - log(parmMat[, 4]): longer object length is not a
## multiple of shorter object length
```

```
## Warning in resp - predVec: longer object length is not a multiple of
## shorter object length
```

```
## Warning in matrix(c(predVec, resVec), length(dose), 2): data length [152]
## is not a sub-multiple or multiple of the number of rows [31]
```

```
## Warning in log(dose) - log(parmMat[, 4]): longer object length is not a
## multiple of shorter object length
```

```
## Warning in dose/parmMat[, 4]: longer object length is not a multiple of
## shorter object length
```

```
#To extract the T50
ED(event.drc.W1,50,interval="delta")
```

```
##
## Estimated effective doses
## (Delta method-based confidence interval(s))
##
## Estimate Std. Error Lower Upper
## S1:50 11.043969 0.183815 10.683698 11.4042
## S2:50 10.151655 0.061806 10.030518 10.2728
## S3:50 9.352818 0.211253 8.938769 9.7669
```

The did no change much but in **Figure 4, right hand side** the subtle changes in the curve by going from a log-logistic `(LL.3())`

to an asymmetric `(W1.3())`

curve is not dramatic, but from the the two fits, there are some differences in the way the upper limit is being described. The Weibull 1 seems to do a better job than the log-logistic in **Figure 3**

```
par(mfrow=c(1,2), mar=c(4.2,6.2,0.5,0.5))
plot(event.drc.W1, log="", ylim=c(0,1), xlim=c(0,20), ylab="Proportion of \ngerminated seed", xlab="Time (d)",
legendPos=c(20,.40))
plot(event.drc.W1, log="", ylim=c(0,1), xlim=c(0,20), ylab="Proportion of \ngerminated seed", xlab="Time (d)",type="none", legendPos=c(20,.40))
plot(event.drc.1, log="", ylim=c(0,1), xlim=c(0,20), ylab="Proportion of \ngerminated seed", xlab="Time (d)",type="none",add=TRUE, col=c(2), legendPos=c(20,.40))
```

**Figure 4. An assymetric Weilbull 1 model( W1.3())(right) and comparison with the symmetric log-logistic model (LL.3()) (left).**

Even though the changes are not dramatic

```
AIC(event.drc.1)
```

```
## [1] 2600.177
```

```
AIC(event.drc.W1)
```

```
## [1] 2526.731
```

the AIC does drop somewhat when going form a log-logistic to an Weibull 1, which indicates that overall the asymmetric `W1.3()`

fits best to the data.