Link to home

Nonlinear regression in R

Nonlinear regression is an extended linear regression technique in which a nonlinear mathematical model is used to describe the relationship between the response variable and the predictor variables (Bates and Watts 1988). A nonlinear regression model is a model that contains at least one of the parameters in a nonlinear form. An example of a nonlinear model is Y=αXβ+e.

The general form of a nonlinear regression model is Y=η(x,β)+e, where β is a p-component vector of unknown parameters and e is a Normal (0,σ2) error term; that is, like simple linear regression, the residual errors are assumed to have a normal distribution with mean 0 and variance σ2.

Some nonlinear models have linear forms, termed intrinsically linear, which use a transformation of the original data. However, if a nonlinear model is assumed to be appropriate for the data, the nonlinear model should be used for the analysis instead of the linearized form since the normality assumption for the error term will not be appropriate for the transformed data.

The process of fitting nonlinear regression in R is similar to that for fitting linear models except that there is no explicit formula for estimation, so iterative procedures are needed that may also require the user to supply initial estimates of parameters. For more information on nonlinear regression readers can refer to Ratkowsky (1989) and Bates and Watts (1988).

Performing nonlinear regression in R

The command used for nonlinear regression in R is as follows. Please note, this R-code is a summary of how the function works and and will not execute if entered in an R command line!

nls(
  formula,
  data = parent.frame(),
  start, control = nls.control(),
  algorithm = "default",
  trace = FALSE,
  subset,
  weights,
  na.action,
  model = FALSE
)

For full documentation of the nls function in R, type ?nls or help(nls).

Logistic growth of plant disease

## Input the data that include the variables time,
# plant ID, and severity
time <- c(seq(0,10),seq(0,10),seq(0,10)) plant <- c(rep(1,11),rep(2,11),rep(3,11)) ## Severity represents the number of ## lesions on the leaf surface, standardized ## as a proportion of the maximum severity <- c( 42,51,59,64,76,93,106,125,149,171,199, 40,49,58,72,84,103,122,138,162,187,209, 41,49,57,71,89,112,146,174,218,250,288)/288 data1 <- data.frame( cbind( time, plant, severity ) ) ## Plot severity versus time ## to see the relationship between
## the two variables for each plant
plot( data1$time, data1$severity, xlab="Time", ylab="Severity", type="n" ) text( data1$time, data1$severity, data1$plant ) title(main="Graph of severity vs time")

 

Output

Click on the image for larger version.

From the graph it appears that a logistic curve might fit the data. The logistic curve is . In order to fit the logistic curve, initial values of the parameters are needed. For the logistic curve, R has functions called getInitial and SSlogis to get the initial value for the parameters. Readers can also get some ideas about what the initial parameter is by looking at the scatter plot and from considering what specific parameters do to the shape of the curve. For example, for the logistic curve the parameter represents the asymptotic limit of severity. The inflection point occurs at (, ).  So from the scatter plot, the initial estimate for might be 300 and the estimate for is about seven, so we might set the initial estimate of β as 15 and λ as two. It is important to have a good starting point for the estimation of the parameter. For more discussion about the parameters of nonlinear models refer to Ratkowsky (1989).

Here is an example of obtaining the initial value for the parameters using getInitial and SSlogis. SSlogis uses a different parameterization for the logistic model than we have used here, so as a first step we use the parameterization from SSlogis and then convert to our parameterization. (Use help(SSlogis) for more information.)

getInitial(
        severity ~ SSlogis(time, alpha, xmid, scale),
        data = data1
)

 

Output

alpha   xmid   scale 
2.212 12.507  4.572 

 

Now fit a logistic curve to the data, converting to our parameterization.

## Using the initial parameters above,
## fit the data with a logistic curve.
para0.st <- c(
        alpha=2.212,
        beta=12.507/4.572, # beta in our model is xmid/scale
        gama=1/4.572 # gamma (or r) is 1/scale
)

fit0 <- nls(
        severity~alpha/(1+exp(beta-gamma*time)),
        data1,
        start=para0.st,
        trace=T
)

 

Output

0.1621433 :  2.2120000 2.7355643 0.2187227 
0.1621427 :  2.2124095 2.7352979 0.2187056

Now check the fit of the data with the logistic curve graphed on a scatter plot.

## Plot to see how the model fits the data; plot the
## logistic curve on a scatter plot
plot(
        data1$time,
        data1$severity,
        type="n"
)

text(
        data1$time,
        data1$severity,
        data1$plant
)

title(main="Graph of severity vs time")

curve(
        2.21/(1+exp(2.74-0.22*x)),
        from=time[1],
        to=time[11],
        add=TRUE
)

 

Output

Click on the image for larger version.

From this plot of the logistic curve, it looks like the logistic curve fits the data nicely. There are many nonlinear models that can be used to analyze disease progress data. Some familiar models are discussed in the next section. So, after a scatter plot of the data over time has been constructed and a particular nonlinear model is determined to fit the data, you can use a similar procedure as in this example of fitting the logistic model to a dataset.

 

Next, modeling disease progress with growth curves