33 Framing the Question

“Do college students rate (based on taste) yogurts from different vendors differently?” As in the previous two units, the primary question in the Frozen Yogurt Case Study is about the relationship between two variables: the response (taste of the product on scale of 1 to 10; see Definition 3.2) and the factor of interest (vendor of yogurt; see Definition 24.1).

The question, as stated above, is ill-posed. Instead of asking about individual tastes, we want to know if, on average, there is a difference in the taste of yogurt between vendors. That is, we want to test the following hypotheses:

\(H_0:\) the average taste rating is the same for each of the three vendors.
\(H_1:\) the average taste rating differs for at least one of the three vendors.

Mathematically, we write

\(H_0: \mu_1 = \mu_2 = \mu_3\)
\(H_1:\) At least one \(\mu_j\) differs from another

where \(\mu_1, \mu_2, \mu_3\) represent the average taste ratings for the East Side, Name Brand, and South Side yogurts, respectively. Our question is now centered on the population and captured through parameters, making it a well-posed question. In fact, this seems to be a question we have already addressed. In a way, it is.

In this unit, we will be tackling the same types of questions we did in Chapter 24 — we are comparing the mean of a quantitative response across the levels of a categorical predictor. The difference is that the observations we have observed are not independent of one another, and we must account for this lack of independence in the analysis.

33.1 General Setting

This unit is concerned with comparing the mean response of a numeric variable across \(k\) groups. Let \(\mu_1, \mu_2, \dotsc, \mu_k\) represent the mean response for each of the \(k\) groups. Then, we are primarily interested in the following hypotheses:

\(H_0: \mu_1 = \mu_2 = \dotsb = \mu_k\)
\(H_1:\) At least one \(\mu_j\) differs from another.

When there are only two groups (\(k = 2\)), then this can be written as

\(H_0: \mu_1 = \mu_2\)
\(H_1: \mu_1 \neq \mu_2.\)

Here we are writing things in the mathematical notation, but let’s not forget that every hypothesis has a context. Throughout this unit, we are looking for some signal in the location of the response across the groups. Our working assumption then states that the groups are all similar, on average.