# Case Study #4: Variograms

Plants have a property that is very convenient for studies of disease spread: they are stationary. We can analyze the spatial pattern of diseased plants within a plant population to gain insight into how the disease has spread, how the disease will continue to spread, and what methods could be used to effectively control disease spread.

There are many ways to approach the analysis of the spatial pattern of disease. One approach is to define what it means for plants in a plot to be "close" to one another, and then determine whether plants that are close to each other are likely to have similar characteristics. This approach was taken in the previous section where the disease severity of an individual plant was estimated as the mean of the plants defined as neighbors.

Real and McElhany (1996) discussed several methods for spatial analysis of plant populations, and applied these methods to a data set describing a population of Silene latifolia and the incidence of the anther-smut fungus, Microbotryum violaceum. Here we review the methods they discuss to define proximity of plants. Then we describe a variogram analysis, one method for studying how similar are the responses of plants at different distances apart.

Often data for analysis of spatial pattern are in the form of a map of locations where infection has occurred. In natural systems where the arrangement of plants is not regular, the positions of non-infected plants may be needed to complete the map. To describe the plant population pattern, a grid can be imposed on the plot so that plant density can be evaluated in each square. Then the similarity in plant density for grid cells at different distances apart can be assessed. This analysis assumes a definition of distance between grid cells. The most common approach is to define connections based solely on the Euclidean distance between two cells; the distance is often called the "lag distance", notated by h. Another option is to take direction into consideration. In this case, start with one cell and consider it paired with cells that are in a certain direction and separated by distance h. The direction could be the "rooks move" (in the vertical and horizontal directions), the "bishops move" (along the diagonal), or defined by some other degree increment that partitions the full 360 degree range. Once the distance and directions have been chosen, a scatterplot of the response variable in appropriate plots can be used to explore the relationship. For example, along one axis might be the plant density for reference spatial points Z, and along the other axis might be plant densities for cells separated from each point Z by a selected lag distance, either omnidirectionally or along a selected direction.

#### Evaluating a map of infection points

Here's an example in which we generate a map of infections using R and then analyze their pattern using Geary's c, a measure of similarity described below. The first step is to generate a map using the `disperse2` function that was introduced in Esker et al. (2007). Then grid cells are superimposed over the map and the number of infections in each grid cell is evaluated. (Note that, while the locations of infections are generated using pseudorandom numbers, only the first generation of locations is in a random pattern. The following generations of infections tend to be near the original source of infection so the pattern becomes more aggregated as more generations pass.)

`# nstart is the number of initial infections# nnew is the number of new infections produced by #      each pre-existing infection each generation# ngen is the number of generations# exppar is the rate parameter for the exponential#      distributiondisperse2 <- function(nstart, nnew, ngen, exppar){  coord <- matrix(runif(n=2*nstart, min=0, max=1000),             nrow=nstart, ncol=2)  ntot <- nstart  if(ngen > 0){    for(i in 1:ngen){      for(j in 1:ntot){        tempx <- coord[j,1]        tempy <- coord[j,2]        newdir <- runif(n=nnew, min=-pi, max=pi)        newdist <- rexp(n=nnew, rate=exppar)        newx <- tempx + newdist * sin(newdir)        newy <- tempy + newdist * cos(newdir)        newcoord <- cbind(newx,newy)        coord <- rbind(coord,newcoord)      }    }    ntot <- length(coord[,1])  }  coord}`
`# Create the list of coordinates for infection pointstmat <- disperse2(nstart=20, nnew=20,                  ngen=2, exppar=0.02)`
`# Plot the map of infection pointsplot(tmat, xlim=c(0,1000), ylim=c(0,1000),col='orange')`
`# Draw lines indicating the sampling quadrats,#    or grid cellsfor (i in seq(from=0,to=1000,by=100)){  lines(c(i,i),c(0,1000),col='mediumblue')  lines(c(0,1000),c(i,i),col='mediumblue')}`
`# Collect the counts from each of the quadrats,#    or grid cellsquad <- function(inmat){  quadcount <- matrix(-1,10,10)  for (x in 1:10){  for (y in 1:10){      lowx <- 100*(x-1)      highx <- 100*x      lowy <- 100*(y-1)      highy <- 100* y      testx <- inmat[,1] > lowx & inmat[,1] < highx      testy <- inmat[,2] > lowy & inmat[,2] < highy        # ... this essentially rotates the matrix of values        # to make comparison to the graph more direct      quadcount[(11-y),x] <- sum(testx*testy)    }  }  quadcount}`
`quadout <- quad(tmat)quadout`

## Output

Note that the data set is generated randomly, so the pictures and data shown below will be somewhat different from those produced by submission of the same commands again.

`     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,]  9   17    5    1    1    1    4    0    0     0 [2,]  3    0   29    4    2    4    7   23    2     0 [3,]  4    1    2    2   32   37   11   26    7     0 [4,] 27   28    5    2   10   29   24    8    1     1 [5,]  0    5    4    1    4   10   12    4    2     0 [6,]  1    0    0    0    0   39   20    7   16     0 [7,]  0    0    1    1    2    8    2    9   40     1 [8,]  0    1    2   24    8   26    3    1    5     0 [9,]  2   22    9   21    9   20    2    0    0     0[10,]  3   11   28   21    4    6    0    0    0     0`

#### Geary's c

Geary's c is one measure for evaluating similarity or dissimilarity for quadrats at specific distances from each other. Here we develop a function to calculate Geary's c, but R offers other functions to calculate it, as well. Geary's c generally varies from 0 (strong positive autocorrelation) to 2 (strong negative autocorrelation), with 1 being the expected value when there is no spatial autocorrelation (Cliff and Ord 1981). The value at a distance of h is: .

W(h) are the total number of pairs in the data set for distance h

wij(h) is an indicator taking on the value 1 if the two observations i and j are at distance h, and the value 0 if they are not

zi is the observation (in our case, count) for the ith location (with coordinates xi and yi.)

At one rook's move, the distance from center to center of neighboring quadrats is 100.
At one bishop's move, the distance from center to center of neighboring quadrats is
`sqrt(100^2 + 100^2)`
`141.4214`.

At two rook's moves, the distance is 200.
At two bishop's moves, the distance is
`sqrt(200^2 + 200^2)`
`282.8427`.

The 100x100 matrix of distances between the 100 quadrats is given by the following. First, calculate matrices of x and y coordinates for the center of each quadrat. (When the matrices are converted to vectors, R does this by pasting together the columns (y values) end to end).

`# First, calculate matrices of x and y coordinates#    for the center of each quadrat. # (When the matrices are converted to vectors,#     R does this by pasting together the#     columns (y values) end to end).xcoord <- matrix(-1,10,10)for (i in 1:10){xcoord[,i] <- i*100 - 50}xcoord <- as.vector(xcoord)ycoord <- matrix(-1,10,10)for (i in 1:10){ycoord[i,] <- i*100 - 50}ycoord <- as.vector(ycoord)`
`# Calculate the 100x100 distance matrix that includes#    the distance between each pair of quadrats`
`distmat <- matrix(-1,100,100)for (i in 1:100){ for (j in 1:100){    distmat[i,j] <- sqrt((xcoord[i]-xcoord[j])^2 +                      (ycoord[i]-ycoord[j])^2)  }}`
`# Find the unique distances that we have to work#    with for this quadrat arrangementulist <- unique(as.vector(distmat))ulist`

## Output

`     0.0000  100.0000  200.0000  300.0000  400.0000  500.0000  600.0000   700.0000  800.0000  900.0000  141.4214  223.6068  316.2278  412.3106  509.9020  608.2763  707.1068  806.2258  905.5385  282.8427  360.5551  447.2136  538.5165  632.4555  728.0110  824.6211  921.9544  424.2641  583.0952  670.8204  761.5773  854.4004  948.6833  565.6854  640.3124  721.1103  894.4272  984.8858  781.0250  860.2325  943.3981 1029.5630  848.5281 1000.0000 1081.6654  989.9495 1063.0146 1140.1754 1131.3708 1204.1595 1272.7922`

Next find the number of unique distances.

`# Find the number of unique distanceslength(ulist)`

## Output

` 51`

Now create a graph that shows the number of observations for each distance.

`# Calculate the number of observations#    corresponding to each distance, W(h)wdvec <- 0*(1:length(ulist))for (i in 1:length(ulist)){  wdvec[i] <- sum(as.vector(distmat) == ulist[i])}`
`# Note that the number of pairs per distance#    varies greatlyplot(ulist, wdvec, xlab='Distance',     ylab='Number of observations')`
`# Calculate the denominator,#    [(1/(n-1)) ∑i=1 n (zi - mean(z))]meanz <- mean(quadout)quadvec <- as.vector(quadout)sum1 <- 0for (i in 1:100){sum1 <- sum1 + (quadvec[i] - meanz)^2}denom <- (1/99) * sum1`
`# Calculate the second part of the numerator for each #   distance, [∑i=1(i≠j) n ∑j=1(i≠j) n­ wij(h) (zi - zj)2]numer <- 0*(1:length(ulist))for(k in 1:length(ulist)){  tdist <- ulist[k]  dcheck <- distmat == tdist  for(x in 1:100){    for(y in 1:100){      if(dcheck[x,y] & x != y){        numer[k] <- numer[k] + (quadvec[x] - quadvec[y])^2      }    }  }}`
`# Finally calculate Geary's c(h) for each distance h    Gearyout <- 0*length(ulist)for (i in 1:length(ulist)){  Gearyout[i] <- (1/(2*wdvec[i])) * numer[i] / denom}`
`# Plot Geary's c vs. distanceplot(ulist, Gearyout, xlab='Distance',     ylab='Geary c', ylim=c(0,3))lines(c(0,1272),c(0,0))lines(c(0,1272),c(1,1))lines(c(0,1272),c(2,2))`

These commands can be condensed somewhat into the following function, and output is illustrated following the code to create the new function. Note that the `Gearyc` function assumes the existence of the `quad` function.

`# Here's a slightly condensed function that#    performs all these steps# It assumes existence of the quad functionGearyc <- function(tmat){  quadout <- quad(tmat)  xcoord <- matrix(-1,10,10)  for (i in 1:10){xcoord[,i] <- i*100 - 50}  xcoord <- as.vector(xcoord)  ycoord <- matrix(-1,10,10)  for (i in 1:10){ycoord[i,] <- i*100 - 50}  ycoord <- as.vector(ycoord)  distmat <- matrix(-1,100,100)  for (i in 1:100){    for (j in 1:100){      distmat[i,j] <- sqrt((xcoord[i]-xcoord[j])^2 +                  (ycoord[i]-ycoord[j])^2)    }  }  ulist <- unique(as.vector(distmat))  wdvec <- 0*(1:length(ulist))  for (i  in 1:length(ulist)){    wdvec[i] <- sum(as.vector(distmat) == ulist[i])}  meanz <- mean(quadout)  quadvec <- as.vector(quadout)  sum1 <- 0  for (i in 1:100){sum1 <- sum1 + (quadvec[i] - meanz)^2}  denom <- (1/99) * sum1  numer <- 0*(1:length(ulist))  for(k in 1:length(ulist)){    tdist <- ulist[k]    dcheck <- distmat == tdist    for(x in 1:100){      for(y in 1:100){        if(dcheck[x,y] & x != y){          numer[k] <- numer[k] + (quadvec[x] - quadvec[y])^2        }      }    }  }  Gearyout <- 0*length(ulist)  for (i in 1:length(ulist)){    Gearyout[i] <- (1/(2*wdvec[i])) * numer[i] / denom  }`
`  plot(ulist, Gearyout, xlab='Distance',       ylab='Geary c', ylim=c(0,3))  lines(c(0,1272),c(0,0))  lines(c(0,1272),c(1,1))  lines(c(0,1272),c(2,2))}`
`# For a new map, evaluate Geary's ctmat <- disperse2(nstart=20, nnew=20, ngen=2, exppar=0.02)Gearyc(tmat)`

## Output

#### Variograms

A variogram can be used to summarize the relationship at each distance between points (individuals or grid cells). The variogram value , or estimated semivariance, for lag distance h is defined as: , where N(h) is the number of pairs of points separated by h, z(xi) is the data value for the point xi, and z(xi+h)is the data value at cells separated from xi by the lag distance h in the chosen direction.

Variograms can be used to evaluate whether plant characteristics such as disease incidence are clustered or random. Clustered disease incidence would be reflected in positive values in the variogram at the distances corresponding to the spatial scale of clustering. If incidence is clustered, "neighbors" are more likely to share the same disease status than are plants separated by larger distances.

In the analysis of their Silene latifolia data set, Real and McElhany (1996) found that plant density was clustered, or had significant autocorrelation, at lag distances up to 4 m. They also found that there was significant autocorrelation of plants with similar disease status at lag distances up to 1 m using an omnidirectional variogram.

#### Suggested Exercises

##### Suggested Exercise #1

Try creating other maps with different degrees of clustering by modifying the example above and calculate the number of infections in each quadrat, or grid cell. Then use the matrices of grid cell counts and re-run the exercise above to compare the results to the first map analyzed.

##### Suggested Exercise #2

Try using a variogram to study the Verticillium wilt case study data. Would information from a variogram be more useful than LIP if the goal is to develop a strategy to control Verticillium wilt?

Next, a discussion on using Beta-binomial analysis