Ethnicity | Asian | Black | Hispanic | White | Other | Total |
---|---|---|---|---|---|---|
Reed count | 46 | 31 | 42 | 258 | 35 | 412 |
Oregon proportion | .043 | .02 | .125 | .77 | .042 | 1 |
If the students at Reed were drawn from a population with these proportions, how many counts would we expect in each group?
\[\textrm{exp. count} = n \times p_i\]
Ethnicity | Asian | Black | Hispanic | White | Other | Total |
---|---|---|---|---|---|---|
Obs. data | 46 | 31 | 42 | 258 | 35 | 412 |
Exp. counts | 17.716 | 8.24 | 51.5 | 317.24 | 17.304 | 412 |
n <- 412 p <- c(.043, .02, .125, .77, .042) samp <- sample(c("asian", "black", "hispanic", "white", "other"), size = n, replace = TRUE, prob = p) table(samp)
## samp ## asian black hispanic other white ## 16 10 45 17 324
obs <- c(46, 31, 42, 258, 35)
obs <- c(46, 31, 42, 258, 35) samp <- sample(c("asian", "black", "hispanic", "white", "other"), size = n, replace = TRUE, prob = p) table(samp)
## samp ## asian black hispanic other white ## 12 4 42 18 336
samp <- sample(c("asian", "black", "hispanic", "white", "other"), size = n, replace = TRUE, prob = p) table(samp)
## samp ## asian black hispanic other white ## 22 7 57 18 308
We could do a tests/CIs on \(p_{reed} - p_{oregon}\) for each group, however:
For each of \(k\) categories:
Then add them all up.
\[\chi^2 = \sum_{i = 1}^k \frac{(obs - exp)^2}{exp}\]
Ethnicity | Asian | Black | Hispanic | White | Other | Total |
---|---|---|---|---|---|---|
Obs. data | 46 | 31 | 42 | 258 | 35 | 412 |
Exp. counts | 17.716 | 8.24 | 51.5 | 317.24 | 17.304 | 412 |
\[ Z_{asian}^2 = (46 - 17.7)^2/17.7 = 45.16 \\ Z_{black}^2 = (31 - 8.24)^2/8.24 = 62.8 \\ Z_{hispanic}^2 = (42 - 51.5)^2/51.5 = 1.75 \\ Z_{white}^2 = (258 - 317.24)^2/317.24 = 11.82 \\ Z_{other}^2 = (35 - 17.3)^2/17.3 = 18.11 \]
\[ Z_{asian}^2 + Z_{black}^2 + Z_{hispanic}^2 + Z_{white}^2 + Z_{other}^2 = 139.64 = \chi^2_{obs} \]
n <- 412 p <- c(.043, .02, .125, .042, .77) chisqs <- rep(0, 1000) set.seed(405) for(i in 1:1000) { samp <- sample(c("asian", "black", "hispanic", "other", "white"), size = n, replace = TRUE, prob = p) obs <- c(table(samp)) chisqs[i] <- chisq.test(obs, correct = FALSE, p = p)$statistic }
What is the probability of observing our data or more extreme (\(\chi^2 = 139.64\)) under the null hypothesis that Reedies share the same ethnicity proportions as Oregon?
About zero.
If…
then our statistic can be well-approximated by the \(\chi^2\) distribution with \(k - 1\) degrees of freedom.
1 - pchisq(139.64, df = 4)
## [1] 0