15  Nature of Collecting Multivariable Data

For the Seismic Activity Case Study, we are primarily interested in characterizing the relationship between bracketed duration and the magnitude of the earthquake. As we discussed in the previous chapter, this general goal might be refined into one of many specific questions:

Notice that these last two questions actually require knowledge of more than just the bracketed duration and the magnitude of each seismic event. In order to address the second question, we would also need the distance from the center of the earthquake; in order to address the third question, we also need the soil conditions of where the measurement is taken. Often, research questions require knowledge of more than just a single variable; such questions are multivariable.

Definition 15.1 (Multivariable) This term refers to questions of interest which involve more than a single variable. Often, these questions involve many variables. Multivariable models typically refer to a model with two or more predictors.

Consider going to the doctor because you are feeling ill. The doctor does not have you simply enter your most prominent symptom (fever, for example) into a computer and then prescribe a medication based solely on that single symptom. Instead, a good physician will review all symptoms you are experiencing, as well as your medical history, other medications, allergies, etc. The physician operates in a multivariable world in which there are many contributing factors to a response. Therefore, when you arrive for this hypothetical visit, they record several variables that may be of interest.

Studies which collect several variables can be observational studies or controlled experiments. If an observational study, we want to ensure the sample of subjects is representative of the target population. Then, for each individual, we simply record several variables. If a controlled experiment, we randomly assign subjects to a particular “treatment” group; afterwards, we would measure the response in addition to other variables. Notice that with the latter, subjects are randomly assigned to only one of the variables; the remaining variables are simply observed; while this could of course be extended, this is the most common implementation.

Note

When a study is primarily interested in characterizing the relationship between two or more quantitative variables, the data is typically from an observational study.

What we want to emphasize here is that how we collect the data has not really changed from what we have discussed in previous units. The primary difference is that we are very aware that we are collecting several measurements on each subject. The critical element is that our sample be representative of the target population if we want to apply any findings to that population.