Problem Statement

A 2010 survey asked 827 randomly sampled registered voters in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of California? Or do you not know enough to say?” Conduct a hypothesis test to determine if the data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates. (Tweaked a bit from Diez et al. 2015 [Chapter 6])

Competing Hypotheses

In words

Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010.
Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010.

Another way in words

Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the same as that of a non-college graduate.
Alternative hypothesis: These parameter probabilities are different.

In symbols (with annotations)

\(H_0: \pi_{college} = \pi_{no\_college}\) or \(H_0: \pi_{college} - \pi_{no\_college} = 0\), where \(\pi\) represents the probability of not having an opinion on drilling.
\(H_A: \pi_{college} - \pi_{no\_college} \ne 0\)

Set \(\alpha\)

It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

Exploring the sample data

offshore <- read_csv("offshore.csv") |>
  mutate(response = fct_rev(response))

offshore |>
  tabyl(college_grad, response)

##  college_grad opinion no opinion
##            no     258        131
##           yes     334        104

off_summ <- offshore |>
  group_by(college_grad) |>
  summarize(prop_no_opinion = mean(response == "no opinion"),
    sample_size = n())
off_summ

## # A tibble: 2 × 3
##   college_grad prop_no_opinion sample_size
##   <chr>                  <dbl>       <int>
## 1 no                     0.337         389
## 2 yes                    0.237         438

offshore |> ggplot(aes(x = college_grad, fill = response)) +
  geom_bar(position = "fill") +
  coord_flip()

Guess about statistical significance

We are looking to see if a difference exists in the heights of the bars corresponding to no opinion for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same height, BUT…it’s important to use statistics to see if that difference is actually statistically significant!

Non-traditional methods

We’ll use the infer package to carry out the simulation-based workflow.

Observed statistic

d_hat <- offshore |>
  specify(response ~ college_grad, success = "no opinion") |>
  calculate(stat = "diff in props", order = c("yes", "no"))
d_hat

## Response: response (factor)
## Explanatory: college_grad (factor)
## # A tibble: 1 × 1
##      stat
##     <dbl>
## 1 -0.0993

Randomization for Hypothesis Test

In order to look to see if the observed difference in proportions with no opinion (college minus non-college) of -0.09932 is statistically different than 0, we need to account for the sample sizes. We can use the idea of randomization testing (also known as permutation testing): under the null of no association, the response labels are exchangeable across college_grad, so infer shuffles them with hypothesize(null = "independence") and recomputes the difference 10,000 times.

set.seed(2018)
null_distn_two_props <- offshore |>
  specify(response ~ college_grad, success = "no opinion") |>
  hypothesize(null = "independence") |>
  generate(reps = 10000) |>
  calculate(stat = "diff in props", order = c("yes", "no"))

We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be shading values at least as extreme as our observed difference in both directions.

null_distn_two_props |>
  visualize() +
  shade_p_value(obs_stat = d_hat, direction = "both")

Calculate \(p\)-value

pvalue <- null_distn_two_props |>
  get_p_value(obs_stat = d_hat, direction = "both")
pvalue

## # A tibble: 1 × 1
##   p_value
##     <dbl>
## 1  0.0024

So our \(p\)-value is 0.0024 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tails of the null distribution.

Bootstrapping for Confidence Interval

We can also create a confidence interval for the unknown population parameter \(\pi_{college} - \pi_{no\_college}\) using our sample data with bootstrapping. Here we omit hypothesize() and use generate(type = "bootstrap") so each group is resampled with replacement at its original size.

boot_distn_two_props <- offshore |>
  specify(response ~ college_grad, success = "no opinion") |>
  generate(reps = 10000, type = "bootstrap") |>
  calculate(stat = "diff in props", order = c("yes", "no"))

ci <- boot_distn_two_props |>
  get_confidence_interval(level = 0.95)
ci

## # A tibble: 1 × 2
##   lower_ci upper_ci
##      <dbl>    <dbl>
## 1   -0.160  -0.0379

boot_distn_two_props |>
  visualize() +
  shade_confidence_interval(endpoints = ci)

We see that 0 is not contained in this confidence interval as a plausible value of \(\pi_{college} - \pi_{no\_college}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates.

Interpretation: We are 95% confident the true difference in the proportion with no opinion on offshore drilling (college graduates minus non-college graduates) in California is between -0.16003 and -0.03791.

Traditional methods

Check conditions

Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.

Independent observations: Each case that was selected must be independent of all the other cases selected.

This condition is met since cases were selected at random to observe.
Sample size: The number of pooled successes and pooled failures must be at least 10 for each group.

We need to first figure out the pooled success rate: \[\hat{p}_{obs} = \dfrac{131 + 104}{827} = 0.28.\] We now determine expected (pooled) success and failure counts:

\(0.28 \cdot (131 + 258) = 108.92\), \(0.72 \cdot (131 + 258) = 280.08\)

\(0.28 \cdot (104 + 334) = 122.64\), \(0.72 \cdot (104 + 334) = 315.36\)
Independent selection of samples: The cases are not paired in any meaningful way.

We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected.

Test statistic

The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (\(\hat{p}_{college, obs} - \hat{p}_{no\_college, obs}\)) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions using the standard error and the pooled estimate:

\[ Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) \] where \(\hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}}.\)

Observed test statistic

The observed standardized statistic can be computed directly with infer via calculate(stat = "z"):

z_hat <- offshore |>
  specify(response ~ college_grad, success = "no opinion") |>
  hypothesize(null = "independence") |>
  calculate(stat = "z", order = c("yes", "no"))
z_hat

## Response: response (factor)
## Explanatory: college_grad (factor)
## Null Hypothesi...
## # A tibble: 1 × 1
##    stat
##   <dbl>
## 1 -3.16

We can also do this test directly with infer’s prop_test() wrapper, which reports the \(p\)-value and a theory-based confidence interval:

offshore |>
  prop_test(response ~ college_grad,
    success = "no opinion",
    order = c("yes", "no"),
    alternative = "two.sided",
    correct = FALSE,
    z = TRUE)

## # A tibble: 1 × 5
##   statistic p_value alternative lower_ci upper_ci
##       <dbl>   <dbl> <chr>          <dbl>    <dbl>
## 1     -3.16 0.00157 two.sided     -0.161  -0.0377

State conclusion

We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians.

Comparing results

Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results here.

Diez, David, Christopher Barr, and Mine Cetinkaya-Rundel. 2015. OpenIntro Statistics, Third Edition.

Two Proportions Example