Problem Statement

Chicken farming is a multi-billion dollar industry, and any methods that increase the growth rate of young chicks can reduce consumer costs while increasing company profits, possibly by millions of dollars. An experiment was conducted to measure and compare the effectiveness of various feed supplements on the growth rate of chickens. Newly hatched chicks were randomly allocated into six groups, and each group was given a different feed supplement. Do these data provide strong evidence that the average weights of chickens that were fed different crops are different? (Tweaked a bit from Diez, Barr, and Cetinkaya-Rundel 2015 [Chapter 5])

Competing Hypotheses

In words

  • Null hypothesis: The long-run mean weights are the same under all six crops.

  • Alternative hypothesis: At least one of the long-run mean weights is different.

Another way with words

  • Null hypothesis: There is no association between feed type and weight in the population of interest.

  • Alternative hypothesis: There is an association between the variables in the population.

In symbols (with annotations)

  • \(H_0: \mu_{casein} = \mu_{horsebean} = \mu_{linseed} = \mu_{meatmeal} = \mu_{soybean} = \mu_{sunflower}\), where \(\mu\) represents the long-run mean weight.
  • \(H_A\): At least one of these parameter means is different from the others

Set \(\alpha\)

It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

Exploring the sample data

library(dplyr)
library(knitr)
library(ggplot2)
library(oilabs)
library(openintro)
data(chickwts)
wgt_summ <- chickwts %>% group_by(feed) %>%
  summarize(sample_size = n(),
    mean = mean(weight),
    sd = sd(weight),
    minimum = min(weight),
    lower_quartile = quantile(weight, 0.25),
    median = median(weight),
    upper_quartile = quantile(weight, 0.75),
    max = max(weight))
kable(wgt_summ)
feed sample_size mean sd minimum lower_quartile median upper_quartile max
casein 12 323.58 64.434 216 277.25 342.0 370.75 404
horsebean 10 160.20 38.626 108 137.00 151.5 176.25 227
linseed 12 218.75 52.236 141 178.00 221.0 257.75 309
meatmeal 11 276.91 64.901 153 249.50 263.0 320.00 380
soybean 14 246.43 54.129 158 206.75 248.0 270.00 329
sunflower 12 328.92 48.836 226 312.75 328.0 340.25 423

The boxplot below also shows the mean for each group highlighted by the red dots.

qplot(x = feed, y = weight, data = chickwts, geom = "boxplot") +
      stat_summary(fun.y = "mean", geom = "point", color = "red")

Guess about statistical significance

We are looking to see if a difference exists in the mean weight of the six levels of the explanatory variable. Based solely on the boxplot, we have reason to believe that a difference exists, but the overlap of the boxplots is a bit concerning.

Check conditions

Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.

  1. Independent observations: The observations are independent both within and across groups.

    The random assignment assures that the observations are independent across groups, but without knowing anything more about the experimental design we aren’t sure if random sampling was done. We will assume so.

  2. Approximately normal: The distribution of the response for each group should be normal or the sample sizes should be at least 30.

qplot(x = weight, data = chickwts, geom = "histogram", facets = ~ feed, 
      binwidth = 60, color = I("white"))

qplot(sample = weight, data = chickwts, facets = ~ feed)

We have some reason to doubt the normality assumption here since both the histograms and the qqplots show some deviation from a normal model fitting the data well for each group. The sample sizes are also small.

  1. Constant variance: The variance in the groups is about equal from one group to the next.

    The standard deviances above are differing more than we would like across the groups. They are all on the same order of magnitude, but it might be a stretch to assume that variance is constant.

Test statistic

The test statistic is a random variable based on the sample data. Here, we want to look at the ratio of variability between the groups over the variability within the groups. This measure we will call \(F\) and it represents a measure of the difference in means. A big observed \(F\) ratio corresponds to the variability between groups over-powering the variability within groups. A small observed \(F\) ratio means that the within group variability is much larger than the between group variability.

\(F\) is the defined as the ratio

\[ F = \frac{MSG}{MSE}. \]

Here, \(MSG\) is the within group variability. As a formula,

\[ MSG = \dfrac{\sum_{i = 1}^k n_i (\bar{X}_i - \bar{X})}{k - 1} \] where \(\bar{X}_i\) is the mean for each group \(i\), and \(\bar{X}\) is the overall mean.

Notice that this is very similar to the variance for looking at the group means compared to the overall mean. In other words, this is the between group variability.

Also, note that \(MSE\) can be thought of as a pooled variance estimate, which can be thought as a measure of the within group variability:

\[MSE = \dfrac{\sum_{i, j} (X_{ij} - \bar{X}_j)^2}{n_{total} - k} \]

where \(n_{total} = n_1 + n_2 + \cdots + n_k\) with \(n_i\) being the sample size of group \(i\).

Observed test statistic

While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the inference function in the oilabs package to perform this analysis for us. Note that to obtain the F value given here, you divide the observed \(MSG\) value of 46226 by the observed \(MSE\) value of 3009.

inference(x = chickwts$feed, 
          y = chickwts$weight, 
          est = "mean",
          type = "ht", 
          alternative = "greater", 
          method = "theoretical",
          eda_plot = FALSE, 
          inf_plot = FALSE)
## Response variable: numerical, Explanatory variable: categorical
## ANOVA
## 
## Summary statistics:
## n_casein = 12, mean_casein = 323.58, sd_casein = 64.434
## n_horsebean = 10, mean_horsebean = 160.2, sd_horsebean = 38.626
## n_linseed = 12, mean_linseed = 218.75, sd_linseed = 52.236
## n_meatmeal = 11, mean_meatmeal = 276.91, sd_meatmeal = 64.901
## n_soybean = 14, mean_soybean = 246.43, sd_soybean = 54.129
## n_sunflower = 12, mean_sunflower = 328.92, sd_sunflower = 48.836
## 
## H_0: All means are equal.
## H_A: At least one mean is different.
## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value        Pr(>F)
## x          5 231129   46226    15.4 0.00000000059
## Residuals 65 195556    3009                      
## 
## Pairwise tests: t tests with pooled SD 
##           casein horsebean linseed meatmeal soybean
## horsebean 0.0000        NA      NA       NA      NA
## linseed   0.0000    0.0152      NA       NA      NA
## meatmeal  0.0456    0.0000  0.0135       NA      NA
## soybean   0.0007    0.0003  0.2041   0.1726      NA
## sunflower 0.8125    0.0000  0.0000   0.0264  0.0003

We see here that the \(f_{obs}\) value is around 15.4 with \(df_G = k - 1 = 6 - 1 = 5\) and \(df_E = n_{total} - k = 71 - 6 = 65\).

Compute \(p\)-value

The \(p\)-value—the probability of observing an \(F(df_G = 5, df_E = 65)\) value of 15.4 or more in our null distribution—is around 0.0000000006. This can also be calculated in R directly:

1 - pf(15.4, df1 = 5, df2 = 65)
## [1] 0.00000000057104

Note that we could also do this test directly without invoking the inference function using the aov and anova functions. aov stands for analysis of variance and its form is similar to what is done using the lm function with linear regression. It fits an analysis of variance model to the data in the same way that lm fits a linear regression model to the data. anova displays the resulting ANOVA table for the model fit.

wgt_anova <- aov(formula = weight ~ feed, data = chickwts)
anova(wgt_anova)
## Analysis of Variance Table
## 
## Response: weight
##           Df Sum Sq Mean Sq F value        Pr(>F)    
## feed       5 231129   46226    15.4 0.00000000059 ***
## Residuals 65 195556    3009                          
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

State conclusion

We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference existed in the means was backed by this statistical analysis. We have evidence to suggest that weight of chickens is affected by feed given.

Final note

With the conditions near being (or possibly) violated, one should use randomization to compare our \(p\)-value there with the value here to see if the assumptions may have been violated. One could also assess whether the sampling distribution of the \(F\) statistic matches well with a Fisher’s \(F\) distribution using randomization as well. If the conditions are reasonable, the next step would be to calculate pairwise analyses to better understand the reasons for the rejection of the null hypotheses in the ANOVA.


Diez, David, Christopher Barr, and Mine Cetinkaya-Rundel. 2015. OpenIntro Statistics, Third Edition.