25  Study Design

Chapter 4 discussed the impact that the design of a study has on interpreting the results. Recall that the goal of any statistical analysis is to use the sample to say something about the underlying population. Observational studies are subject to confounding. In order to use the available data in order to make causal statements that apply within the population, we need to address this confounding. There are two ways of doing this:

  1. Conduct a controlled experiment. While we do not limit our discussion to controlled experiments in this unit, our discussion will emphasize the elements of a well designed study.
  2. Use observational data and account for confounders. This can be addressed through regression modeling as suggested in Chapter 21.

As discussed in Chapter 4, controlled experiments (Definition 4.5) balance the groups being compared relative to any potential confounders. As a result, such studies permit causal conclusions to be drawn. While controlled experiments are a useful tool, there are many aspects to consider when designing a study.

25.1 Aspects of a Well-Designed Study

Generally speaking, there are three components to a well-designed study: replication, randomization, and reduction of extraneous noise.

Warning

A study is not poor just because it lacks one of these elements. That is, a study can provide meaningful insights even if it did not make use of each of these elements; every study is unique and should be designed to address the research objective. These elements are simply helpful in creating study designs.

As we have stated repeatedly, variability is inherit in any process. We know there is variability in the population; not every subject will respond exactly the same to each treatment. Therefore, our questions do not seek to answer statements about individuals but about general trends in the population. For example, as discussed in Chapter 24, we may be interested in comparing the mean response across various groups. In order to establish these general trends, we must allow that subject-to-subject variability be present within the study itself. This is accomplished through replication, obtaining data on multiple subjects from each group. Each subject’s response would be expected to be similar, with variability within the group due to the inherent variability in the data generating process.

Definition 25.1 (Replication) Replication results from taking measurements on different units (or subjects), for which you expect the results to be similar. That is, any variability across the units is due to natural variability within the population.

Warning

The term “replication” is also used in the context of discussing whether the results of a study are replicable. While our use of the term is about replicating a measurement process within a study, this does not downplay the importance of replicating an entire study.

When we talk about gathering “more data,” we typically mean obtaining a larger number of replicates. Ideally, replicates will be obtained through random selection from the underlying population to ensure they are representative. The subjects are then randomly allocated to a particular level of the factor under study (randomly allocated to a group) when performing a controlled experiment. This random allocation breaks the link between the factor and any potential confounders, allowing for causal interpretations. However, random allocation preserves any link between the factor and the response, if a link exists. These are the two aspects of randomization.

Definition 25.2 (Randomization) Randomization can refer to random selection or random allocation. Random selection refers to the use of a random mechanism (e.g., a simple random sample, Definition 4.2, or a stratified random sample, Definition 4.3) to select units from the population. Random selection minimizes bias.

Random allocation refers to the use of a random mechanism when assigning units to a specific treatment group in a controlled experiment (Definition 4.5). Random allocation eliminates confounding and permits causal interpretations.

Note

While those new to study design can typically describe random selection and random allocation, they often confuse their purpose. Random selection is to ensure the sample is representative. Random allocation balances the groups with respect to confounders.

It is tempting to manually adjust the treatment groups to achieve what the researcher views as balance between the groups. This temptation should be avoided as balancing one feature of the subjects may lead to an imbalance in other features. Remember, random allocation leads to balance. Of course, random allocation does not guarantee any particular sample is perfectly balanced; however, any differences are due to chance alone. The likelihood of such differences can then be studied and quantified (think about the construction of a model for the null distribution and the definition of a p-value). As the sample size increases, these differences due to chance are minimized.

Even with random allocation providing balance between the groups, there will still be variability within each group. The more variability present, the more difficult it is to detect a signal — to discern a difference in the mean response across groups. The study will have more power to detect the signal if the groups are similar. This leads to the third component of a well-designed study — the reduction of noise.

Definition 25.3 (Power) In statistics, power refers to the probability that a study will discern a signal when one really exists in the data generating process. More technically, it is the probability a study will provide evidence against the null hypothesis when the null hypothesis is false.

Statistical power is like asking the question “what is the probability a guilty defendant will be found guilty by the jury?”

Definition 25.4 (Reduction of Noise) Reducing extraneous sources of variability can be accomplished by fixing extraneous variables or blocking (Definition 25.5). These actions reduce the number of differences between the units under study.

Tension between Lab Settings and Reality

Scientists and engineers are trained to control unwanted sources of variability (or sources of error in the data generating process). This creates a tension between what is observed in the study (under “lab” settings) and what is observed in practice (in “real-world” settings). This tension always exists, and the proper balance depends on the goals of the researchers.

Fixing the value of extraneous variables can reduce variability in a study. For example, in the Organic Foods Case Study, the study was only conducted among college students in a psychology class. This choice indirectly impacts the value of an extraneous variable. Depending on the university, for example, this choice could imply the age of participants is most likely between 18 and 22. That reduces the “noise” in how participants respond due to generational differences, which is an extraneous (not directly being considered in the study) variable. However, note that this decision also potentially limits the scope of the study. It may no longer be appropriate to apply these results to the population at large; that is, elderly individuals may respond differently than what we observe in the study.

An additional tool for reducing noise is blocking, in which observations which are dependent on one another because of a shared characteristic are grouped together.

Definition 25.5 (Blocking) Blocking is a way of minimizing the variability contributed by an inherent characteristic that results in dependent observations. In some cases, the blocks are the unit of observation which is sampled from a larger population, and multiple observations are taken on each unit. In other cases, the blocks are formed by grouping the units of observations according to an inherent characteristic; in these cases that shared characteristic can be thought of having a value that was sampled from a larger population.

In both cases, the observed blocks can be thought of as a random sample; within each block, we have multiple observations, and the observations from the same block are more similar than observations from different blocks.

Example 25.1 (Overseeding Golf Greens) Golf is a major pastime, especially in southern states. Each winter, the putting greens need to be overseeded with grasses that will thrive in cooler weather. This overseeding can affect how the ball rolls along the green. Dudeck and Peeacock (1981) reports on an experiment that involved comparing the ball roll for greens seeded with one of five varieties of rye grass. Ball roll was measured by the mean distance (in meters) that five balls traveled on the green. In order to induce a constant initial velocity, each ball was rolled down an inclined plane.

Because the distance a ball rolls is influenced by the slope of the green, 20 greens were placed into four groups in such a way that the five greens in the same group had a similar slope. Then, within each of these four groups, each of the five greens was randomly assigned to be overseeded with one of the five types of Rye grass. The average ball roll was recorded for each of the 20 greens.

The data from Example 25.1 are shown in Table 25.1.

Table 25.1: Data from the Overseeding Golf Greens example.
Rye Grass Variety Slope of Green Grouping Mean Distance Traveled (m)
A 1 2.764
B 1 2.568
C 1 2.506
D 1 2.612
E 1 2.238
A 2 3.043
B 2 2.977
C 2 2.533
D 2 2.675
E 2 2.616
A 3 2.600
B 3 2.183
C 3 2.334
D 3 2.164
E 3 2.127
A 4 3.049
B 4 3.028
C 4 2.895
D 4 2.724
E 4 2.697

It would have been easy to simply assign 4 greens to each of the Rye grass varieties; the random allocation would have balanced any confounders across the five varieties. However, an additional layer was added to the design in order to control some of that additional variability. In particular, greens with similar slopes were grouped together; then, the random allocation to Rye grass varieties happened within the grouped greens. The blocks in this study are the “slope groups.” Each block represents greens with a different slope. Certainly, there are more than 4 potential slopes that a green might have; yet, we observed 4 such groups in our study. We can think of these 4 observed slopes as a sample of all slopes that might exist on a putting green.

Within each block, we have five units of observations; these five units were randomized to the five treatment groups (the five Rye grass varieties). Notice the random allocation strategy ensures that each variety appears exactly once within each slope grouping. This study design will allow us to compare the impact of the Rye grass variety while minimizing the extraneous variability due to the slope of the green, which is a nuisance characteristic. To see how we capitalize on blocking in the analysis, we refer you to Unit IV of the text.

Note

Blocking is often a way of gaining additional power when limited resources require your study to have a small sample size.

An extreme case of blocking occurs when you repeatedly measure the response on the same subject under different treatment conditions. For example, a pre-test/post-test study is an example of a study which incorporates blocking. In this case, the blocks are the individual subjects, the unit of observation. The response is then observed on the subject both prior to the intervention (the “test”) and following the intervention. The rationale here is to use every subject as his or her own “control.” This reduces extraneous noise because the two treatment groups (the pre-test group and the post-test group) are identical.

Big Idea

A block is a secondary grouping variable present during the data collection that records a nuisance characteristic. While it reduces extraneous noise in the sample, the block must be accounted for appropriately during the analysis of the data (as described in Unit IV).

25.2 Critiquing the Organic Food Case Study Design

In the previous section, we described various aspects of a well-designed study. We now consider how the Organic Food Case Study incorporated these aspects.

We notice that random allocation was utilized. Each of the 123 participants was randomly assigned to one of three treatment groups (type of food to which the participant was exposed). The random allocation allows us to make causal conclusions from the data as confounders will be balanced across treatment groups. For example, subjects who adhere to a strict diet for religious purposes would naturally tend toward organic foods and higher moral expectations. However, for each subject like this who was assigned to the organic foods group, there is someone like this (on average) who was assigned to the comfort foods group.

We also note that there is replication. Instead of assigning only one subject to each of the three treatment groups, we have several subjects within each group. This allows us to evaluate the degree to which the results vary within a particular treatment group.

As discussed above, the researchers chose only to collect data among college students taking a psychology class. This most likely limits the age of the participants significantly. While this does serve to reduce noise due to generational differences, it also limits the scope of the study. This is a potential limitation of the study.

The study does not make use of blocking. There are a couple of potential reasons for this; first, with such a large sample size, the researchers may not have thought it necessary. Second, it could be that there was a restriction on time. For example, researchers may have considered having students be exposed to each of the three types of food and answering different scenarios after each. However, this would take a longer amount of time to collect data. Third, it could be that researchers were not concerned about any identifiable characteristics that would generate additional variability. Regardless, the study is not worse off because it did not use blocking; it is still a very reliable design.

While it is clear that random allocation was utilized in the design, random selection was not. Students participating in the study are those from a particular lecture hall. As a result, these students were not randomly sampled from all college students (or even from the university student body). As a result, we must really consider whether the conclusions drawn from this study would apply to all college students within the United States. Having additional information on their demographics may help determine this, but in general, this is not something that can be definitively answered. It is an assumption we are either willing to make or not. More, notice that the original question was not focused on college students; however, the sample consists only of college students. This can impact the broader generalization of our results. It is quite possible that we observe an effect in college students that is not present in the larger population. We should always be careful to ensure that the sample we are using adequately represents the population of interest. When the sample does not represent the population, the study may not be useless, but we should critically consider what population the sample represents.

25.3 Collecting Observational Data

An inability to conduct a controlled experiment does not mean we neglect study design. Random sampling is still helpful in ensuring that the data is representative of the population. And, even if random sampling is not feasible, we should still aim to minimize bias and have a sample that is representative of our population. Similarly, ensuring there are a sufficient number of replications to capture the variability within the data is an important aspect of conducting an observational study. When collecting observational data, one of the most important steps is constructing a list of potential confounders and then collecting data on these variables as well. This will allow us to account for these confounders in our analysis (see Chapter 21); we cannot model what we do not collect. Finally, observational studies may still permit the blocking of subjects and accounting for this additional variability in our analysis (Unit IV).