Problem Statement

A 2010 survey asked 827 randomly sampled registered voters in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of California? Or do you not know enough to say?” Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates. (Tweaked a bit from Diez, Barr, and Cetinkaya-Rundel 2015 [Chapter 6])

Competing Hypotheses

In words

  • Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010.

  • Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010.

Another way in words

  • Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the same as that of a non-college graduate.

  • Alternative hypothesis: These parameter probabilities are different.

In symbols (with annotations)

  • \(H_0: \pi_{college} = \pi_{no\_college}\) or \(H_0: \pi_{college} - \pi_{no\_college} = 0\), where \(\pi\) represents the probability of not having an opinion on drilling.
  • \(H_A: \pi_{college} - \pi_{no\_college} \ne 0\)

Set \(\alpha\)

It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

Exploring the sample data

library(dplyr)
library(knitr)
library(ggplot2)
library(readr)
download.file("ismayc.github.io/teaching/sample_problems/offshore.csv", 
  destfile = "offshore.csv",
  method = "curl")
offshore <- read_csv("offshore.csv")
table(offshore$college_grad, offshore$response)
##      
##       no opinion opinion
##   no         131     258
##   yes        104     334
(off_summ <- offshore %>% group_by(college_grad) %>% 
  summarize(prop_no_opinion = mean(response == "no opinion"),
    sample_size = n())
)
## # A tibble: 2 × 3
##   college_grad prop_no_opinion sample_size
##          <chr>           <dbl>       <int>
## 1           no         0.33676         389
## 2          yes         0.23744         438
offshore %>% ggplot(aes(x = college_grad, fill = response)) +
  geom_bar(position = "fill") +
  coord_flip()

Guess about statistical significance

We are looking to see if a difference exists in the heights of the bars corresponding to no opinion for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same height, BUT…it’s important to use statistics to see if that difference is actually statistically significant!


Non-traditional methods

Collecting summary info

Next we will assign some key values to variable names in R:

phat_nograd <- off_summ$prop_no_opinion[1]
phat_grad <- off_summ$prop_no_opinion[2]
obs_diff <- phat_grad - phat_nograd
n_nograd <- off_summ$sample_size[1]
n_grad <- off_summ$sample_size[2]

Randomization for Hypothesis Test

In order to look to see if the observed sample proportion of no opinion for college graduates of 0.33676 is statistically different than that for graduates of 0.23744, we need to account for the sample sizes. Note that this is the same as looking to see if \(\hat{p}_{grad} - \hat{p}_{nograd}\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 389 and 438 were selected.

We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability.

library(mosaic)
set.seed(2016)
many_shuffles <- do(10000) * 
  (offshore %>%
     mutate(response = shuffle(response)) %>% 
     group_by(college_grad) %>%
     summarize(prop_no_opinion = mean(response == "no opinion"))
   )
null_distn <- many_shuffles %>% 
  group_by(.index) %>%
  summarize(diffprop = diff(prop_no_opinion))
null_distn %>% ggplot(aes(x = diffprop)) +
  geom_histogram(bins = 25, color = "white")

We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.09932 or less than or equal to 0.09932 for our \(p\)-value.

null_distn %>% ggplot(aes(x = diffprop)) +
  geom_histogram(bins = 20, color = "white") +
  geom_vline(color = "red", xintercept = obs_diff) +
  geom_vline(color = "red", xintercept = -obs_diff)

Calculate \(p\)-value

pvalue <- null_distn %>%
  filter( (diffprop <= obs_diff) | (diffprop >= -obs_diff) ) %>%
  nrow() / nrow(null_distn)
pvalue
## [1] 0.0025

So our \(p\)-value is 0.0025 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tails of the null distribution.

Bootstrapping for Confidence Interval

We can also create a confidence interval for the unknown population parameter \(\pi_{college} - \pi_{no\_college}\) using our sample data with bootstrapping. Here we will bootstrap each of the groups with replacement instead of shuffling. This is done using the groups argument in the resample function to fix the size of each group to be the same as the original group sizes of 389 for non-college graduates and 438 for college graduates.

boot_props <- do(10000) *
  offshore %>% 
  resample(replace = TRUE, groups = college_grad) %>% 
  group_by(college_grad) %>% 
  summarize(prop_no_opinion = mean(response == "no opinion"))

Next, we calculate the difference in sample proportions for each of the 10,000 replications:

boot_distn <- boot_props %>% 
  group_by(.index) %>% 
  summarize(diffprop = diff(prop_no_opinion))
boot_distn %>% ggplot(aes(x = diffprop)) +
  geom_histogram(bins = 30, color = "white")

(ci_boot <- boot_distn %>% summarize(lower = quantile(diffprop, probs = 0.025),
    upper = quantile(diffprop, probs = 0.975)))
## # A tibble: 1 × 2
##      lower    upper
##      <dbl>    <dbl>
## 1 -0.15958 -0.03792

We see that 0 is not contained in this confidence interval as a plausible value of \(\pi_{college} - \pi_{no\_college}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates.

Interpretation: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates.

Note: You could also use the null distribution based on randomization with a shift to have its center at \(\hat{p}_{college} - \hat{p}_{no\_college} = \$-0.1\) instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above.


Traditional methods

Check conditions

Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.

  1. Independent observations: Each case that was selected must be independent of all the other cases selected.

    This condition is met since cases were selected at random to observe.

  2. Sample size: The number of pooled successes and pooled failures must be at least 10 for each group.

    We need to first figure out the pooled success rate: \[\hat{p}_{obs} = \dfrac{131 + 104}{827} = 0.28.\] We now determine expected (pooled) success and failure counts:

    \(0.28 \cdot (131 + 258) = 108.92\), \(0.72 \cdot (131 + 258) = 280.08\)

    \(0.28 \cdot (104 + 334) = 122.64\), \(0.72 \cdot (104 + 334) = 315.36\)

  3. Independent selection of samples: The cases are not paired in any meaningful way.

    We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected.

Test statistic

The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (\(\hat{p}_{college, obs} - \hat{p}_{no\_college, obs}\) = 0.03265) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions (\(\hat{P}_{college} - \hat{P}_{no\_college}\)) using the standard error of \(\hat{P}_{college} - \hat{P}_{no\_college}\) and the pooled estimate:

\[ Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) \] where \(\hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}}.\)

Observed test statistic

While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the prop.test function to perform this analysis for us.

stats::prop.test(x = table(offshore$college_grad, offshore$response),
       n = nrow(offshore),
       alternative = "two.sided",
       correct = FALSE)
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  table(offshore$college_grad, offshore$response)
## X-squared = 9.99, df = 1, p-value = 0.0016
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.037725 0.160911
## sample estimates:
##  prop 1  prop 2 
## 0.33676 0.23744

prop.test does a \(\chi^2\) test here but this matches up exactly with what we would expect from the test statistic above since \(Z^2 = \chi^2\) so \(\sqrt{9.99} = 3.16 = z_{obs}\): The \(p\)-values are the same because we are focusing on a two-tailed test. The observed difference in sample proportions is 3.16 standard deviations larger than 0.

The \(p\)-value—the probability of observing a \(Z\) value of 3.16 or more extreme in our null distribution—is 0.0016. This can also be calculated in R directly:

2 * pnorm(3.16, lower.tail = FALSE)
## [1] 0.0015777

The 95% confidence interval is also stated above in the prop.test results. (Note that the levels of college_grad have been swapped here resulting in flip to the positive side in the confidence interval.)

State conclusion

We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians.


Comparing results

Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results.

Diez, David, Christopher Barr, and Mine Cetinkaya-Rundel. 2015. OpenIntro Statistics, Third Edition.