Advanced Discussion and Illustration: Beta-binomial analysis

Roumagnac, P. Pruvost, O. Chiroleu, F., and Hughes, G. 2004. Spatial and temporal analyses of bacterial blight of onion caused by Xanthomonas axonopodis pv. allii. Phytopathology 94:138-146.

Four different methods for describing spatial patterns have been discussed and illustrated thus far in this document, and now we introduce the use of probability distributions to describe spatial pattern. This is a well-established method in plant pathology. Beta-binomial analysis is useful for describing aggregated patterns of plant disease (Hughes and Madden 1993).

Bacterial blight of onion (BBO) is a serious threat to onion production world-wide. The disease appears as lenticular water soaked spots on the leaves which in later stages turn into dry chlorotic lesions. In severe cases these lesions coalesce resulting in leaf dieback and reduced bulb size. A number of different Allium spp., including garlic, welsh onion, shallot, and leek, are hosts of this bacterium. Recent studies have confirmed that the pathogen is seed borne (Roumagnac et al. 2004).

Roumagnac et al. (2004) examined the temporal and spatial dynamics of BBO to hypothesize whether BBO epidemics can arise from seed borne inoculum. The experimental plots were at the CIRAD experimental station in Reunion Island, France, 1999 to 2001. A total of five plots (P1-P5) were sown with onion seed naturally contaminated with X. axonopodis pv. alli. The incidence of X. axonopodis pv. alli in seed prior to sowing was 0.04%.

Plots were sown at different sites each year to avoid inoculum from previous crop debris. In 2000 and 2001, 70 m separated the two plots and, to reduce inoculum movement from plot to plot, each plot was surrounded by seven rows of maize windbreak followed by 15 rows of onion trap plants established from an uninfested seed lot. Each plot included 8200 plants and was watered using overhead irrigation.

BBO incidence in each plot was determined at 15 day intervals after leaf emergence and 30 day intervals in onion trap outer rings. Symptom development on each plant was recorded by visual assessment. Spatial patterns were analyzed using beta-binomial distributions, binary power law analysis, and spatial autocorrelation analysis.

As described in "An introduction to the R programming environment" (Garrett et al. 2007), the major difference between beta-binomial and binomial distributions is that the probability that a plant is scored as 'diseased' is variable in the former while it is constant in the latter. Therefore, the beta-binomial distribution has a higher variance and better fits the data derived from aggregated patterns of disease incidence (Hughes and Madden 1993). The beta-binomial is fit by estimating two parameters, the mean disease incidence and the aggregation parameter that characterizes variability in the probability of infection.

With the use of R and the data from Roumagnec et al. (2004), let's look at how the beta-binomial parameter estimates change over time for this example.

Fitting a beta-binomial model for disease incidence using R

This example will show how maximum likelihood and optimization may be used to obtain the estimates of the beta-binomial distribution. Our approach will closely follow the description of Hughes and Madden (1993) who illustrated the use of the beta-binomial distribution for handling overdispersed disease incidence data obtained from quadrats. The example will illustrate the use of optimization following the approach of Altham (2002). Although the Altham example was written for S-Plus it is still useful as a basis for our R program. Because the goal is to illustrate optimization of a likelihood function for plant pathology, some of the more technical details of using the beta-binomial distribution as specified in the analyses of Hughes and Madden (1993) will not be covered and the reader is recommended to consult that reference.

The beta-binomial distribution

Here, n corresponds to the number observed (i.e., plants assessed within a quadrat), x corresponds to the number of positive results (i.e., diseased plants assessed within a quadrat), and α and β are the beta-binomial parameters. The two parameters are both positive constants and can be varied to produce a wide range of distributional forms (which is why they are useful for describing overdispersed data).

Next, how to estimate α and β? R includes useful functions for numerical optimization, an approach that can help us determine parameter estimates from distributions, models, and functions. For our example, a general purpose optimization function (optim) from R will be used. To use the optim function, two key pieces are needed: (1) the log-likelihood function for the beta-binomial distribution and (2) initial starting estimates for the two parameters of the model, α and β. For further information on the optim function in R, type: ?optim or help(optim). Furthermore, for an excellent introduction to maximum likelihood and optimization in R, we recommend consulting Bolker (In Press).

In the following example, the data set, named tasmania_test_1.txt, were kindly provided by S.J. Pethybridge (Pethybridge et al. 2005), based on an assessment of the incidence of foliar symptoms due to ray blight disease of pyrethrum in 62 quadrats. The data are in three columns: column one lists the quadrat number, column two the number of samples assessed per quadrat (here = 6), and the third column gives the number of diseased samples in the quadrat (i.e., 0, 1, 2, 3, 4, 5, or 6). Save the file to your computer to use in this example. Refer to "An introduction to the R programming environment" for use of text data files in R (Garrett et al. 2007).

Importing the data into R

# The name of the dataset is (as shown here): tasmania_test_1.txt # Be sure to modify the file location to the location where # you saved your tasmania_test_1.txt file.

##### First, a binomial model will be fit to ##### the data using the glm function # The assumption here is a constant probability, such that # counti ~ independent Binomial(group_sizei, p), 1 = i = N # Reference, Altham 2002

attach(tasmania) # This way, column headings may be # called out for analysis names(tasmania) # What are those column headings?

#### Examine for potential overdispersion #### using non-statistical methods

table(tasmania[,3]) #### notice how there are more #### quadrats with 6 disease plants

#### Beta-Binomial fitting # In order to obtain the estimates for alpha and beta, the # log-likelihood function needs to be defined. Consult Bolker #(In Press) for further information regarding the determination # of the likelihood. For the purpose of this exercise, the # following function for the log-likelihood is used:

lbetabin = function (data, inits) { n=data[,1] # This corresponds to the group_size r=data[,2] # This corresponds to the data on incidence alpha=inits[1] # This corresponds to the initial # starting parameter for alpha beta=inits[2] # This corresponds to the initial # starting parameter for beta

# Because optim minimizes a function, the # negative log-likelihood is used # Also, the factorial is not necessary, as it does not depend # on the parameters of interest

# ?optim for further information # The method used was BFGS = Broyden-Fletcher- # Goldfarb-Shanno algorithm # Consult Bolker (In Press) for discussion of # different methods

# optim.tas$par provided the estimated alpha and beta, # which will be used to obtain the new p # optim.tas$value is the estimated log-likelihood value, # which matches results from Pethybridge using the BBD # program of Hughes and Madden

The value, var.pi, represents an estimate of the variation in p and can be thought of as an index of aggregation. When this value is larger than 0, the variance observed in the beta-binomial distribution is larger than for that of the binomial, indicating aggregation.