34  Correlated Data

Chapter 25 discussed three aspects of a well-designed study: replication, randomization, and reduction of noise. Of particular interest for us in this unit is the concept of blocking for creating groups which are as similar as possible to reduce external variability.

34.1 Blocking

Blocking is a way of minimizing the variability contributed by an inherent characteristic that results in dependent observations. In some cases, the blocks are the unit of observation which is sampled from a larger population, and multiple observations are taken on each unit. In other cases, the blocks are formed by grouping the units of observations according to an inherent characteristic; in these cases that shared characteristic can be thought of having a value that was sampled from a larger population.

In both cases, the observed blocks can be thought of as a random sample; within each block, we have multiple observations, and the observations from the same block are more similar than observations from different blocks.

A block is formed by subjects which have some underlying commonality. An extreme case occurs when you repeatedly measure the response on the same unit under different treatment conditions. This is exactly what happened in the Frozen Yogurt Case Study. Each participant was measured three times, once for each of the three vendors. In order to really comprehend the effect of this, consider an alternative study design:

Take each of the nine study participants and randomly assign each to one of the three yogurt vendors (3 participants to each yogurt vendor). Each participant rates only the yogurt assigned.

This alternative study design is completely valid and feasible, but there seems an intuitive advantage to having each participant rate not just one yogurt but instead rate all the yogurts being tested. This is indeed the case; by making the participants taste each of the three vendor yogurts, the three groups are as similar as possible (the same group exposed to each of the three treatments). This reduces the noise across the three levels of the factor being studied. This in turn increases the power of the study (given the same number of participants), but it also has an effect on the analysis we conduct.

Note

Examine the language in the above paragraph; we intentionally use “groups,” “treatments,” and “levels of the factor” interchangeably. Some disciplines prefer one of these terms to the others.

Suppose one of the participants in the study does not really enjoy frozen yogurt (more of an ice cream fan); then, their rating on the taste will tend to be lower than that given by other participants for each of the vendors. Similarly, a huge frozen yogurt fan will tend to have taste ratings (for each vendor) which are higher than other participants. If you are told the taste rating for the Name Brand vendor given by the anti-frozen-yogurt participant, you have a pretty good guess of what their rating will be for the other vendors. There is a correlation between that participant’s responses across the factor levels; that is, those three responses are not independent of one another. This relationship is what provides power in the study, but it is also what prevents us from using the methods previously discussed in this text to analyze the data.

In the Frozen Yogurt Case Study, the order in which the yogurt from each vendor was given to a participant was randomized. That is, participants were not randomly assigned to vendors (randomization at the unit-level); instead, the order of the vendors was assigned within each participant (randomization within the unit). When randomization occurs within the block, and the blocks are balanced with respect to the treatment groups, the study is referred to as a randomized complete block design.

Definition 34.1 (Randomized Complete Block Design) A randomized complete block design is an example of a controlled experiment utilizing blocking. Each treatment is randomized to observations within blocks such that within each block every treatment is present and the same number of observations are assigned to each treatment.

In the Frozen Yogurt Case Study, the blocks are formed by taking repeated measurements (multiple ratings) on each participant. Example 25.1 provides an example of a randomized complete block design where the blocks are not formed by taking repeated measurements on the same units; instead, the blocks were composed of separate units (golf greens) which had a similar characteristic (slope of green) which created the block.

While a randomized complete block design is a controlled experiment that uses blocking, we can also have observational studies that make use of blocks. For example, pre-post tests are an example of a block design in which each participant is tested prior to and following some intervention. The participants act as the blocks; however, the “treatment” groups (pre and post) are not randomized; instead, they occur in a specific order (pre measurement taken, treatment applied, post measurement taken). This would be an example of an observational study with blocks.

Example 34.1 (Jump Height) Biomedical Engineering students conducted a study investigating how vertical jump height changes with age. They hypothesized that jump height would increase with age. The students enrolled 19 participants into their study. Each participant was asked to perform a vertical jump, the height of which was recorded using the Qualisys Motion Capture system. The participant’s age and height were also recorded.

Students noted that the 19 participants represented 6 families (2 families of 3 individuals, 2 families of 4 individuals, and 1 family of 5 individuals). The engineering students believed that athletic ability (and therefore jump height) may have both a genetic and environmental component common among families.

The primary goal of the study was to quantify the change in the average vertical jump height as the age of subjects increase while accounting for similarities among members of the same family.

Example 34.1 is another example of an observational study using blocks. Notice that participants are not assigned to their age; so, no random allocation occurs. However, there are still blocks that pertain to an inherent characteristic among units. The jump heights from members of the same family are correlated.

In each of the examples given, we can think of the blocks as being a sample from some larger population. In the Frozen Yogurt Case Study, the participants form the blocks; we can imagine these college students are a sample from among all college students. In Example 25.1, the blocks are formed by the slope of the green; we can imagine these slopes form a sample from among all potential slopes a green could have. And, in Example 34.1, the blocks are created by families; we can imagine the families being a sample from among all families in the world, for example. Blocks are distinguished from secondary factors of interest because comparing the blocks themselves is not relevant to the question; instead, the blocks are a nuisance that result in correlated responses, and that correlation must be accounted for.

Example 34.2 (Greenhouse Study) A greenhouse study was conducted to determine the effects of two types of environmental conditions (amount of light and daily temperature) on the growth of a certain species of plant. Specifically, two amounts of light were considered: 10 hours and 12 hours. And, two temperatures were considered: 70 degrees and 80 degrees Fahrenheit. Researchers ultimately wanted to decide which of the four possible combinations of these conditions resulted in the maximum growth of the plant. A total of twelve plants were randomized to be grown under one of the four combinations of light and temperature (with three plants being randomized to each of the four combinations). After 10 weeks of exposure, the measured dry weight of the plant material for each of the twelve plants was recorded.

For Example 34.2, there are two grouping variables: amount of light and temperature. However, neither variable represents a block. In this example, researchers are particularly interested in comparing the specific levels of each variable. That is, both variables represent factors of interest. Instead of being a “nuisance,” both the amount of light and the daily temperature are important variables in examining the response. While the method for addressing a second factor and addressing a block are similar in practice, conceptually, the a secondary factor is very different from a block.

Identifying a Block Variable

It can be helpful to ask two questions when determining if a categorical variable is a factor or a block:

  1. Are we interested in comparing the levels of the variable to one another?
  2. If we were to repeat the study, would the exact same levels of the variable be present in the study?

If we answer “yes” to the first question, the variable is almost certainly a factor. This first question is helpful, and is easy to grasp, but it is not fool-proof. It is possible that a variable is a factor, but that factor is not of primary interest in the study (but may be present to adjust for confounding, for example).

If we answer “no” to the second question, the variable is almost certainly a block. Since blocks typically represent a sample from a larger population, if we were to repeat the study, we generally would observed different blocks forming (different individuals, different families, etc.).