Problem Statement

The National Survey of Family Growth conducted by the Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men’s and women’s health. One of the variables collected on this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? (Tweaked a bit from Diez, Barr, and Cetinkaya-Rundel 2015 [Chapter 4])

Competing Hypotheses

In words

Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years.
Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.

In symbols (with annotations)

\(H_0: \mu = \mu_{0}\), where \(\mu\) represents the mean age of first marriage for all US women from 2006 to 2010 and \(\mu_0\) is 23.
\(H_A: \mu > 23\)

Set \(\alpha\)

It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

Exploring the sample data

library(dplyr)
library(knitr)
library(readr)
download.file("ismayc.github.io/teaching/sample_problems/ageAtMar.csv", 
  destfile = "ageAtMar.csv",
  method = "curl")
ageAtMar <- read_csv("ageAtMar.csv")

age_summ <- ageAtMar %>%
  summarize(sample_size = n(),
    mean = mean(age),
    sd = sd(age),
    minimum = min(age),
    lower_quartile = quantile(age, 0.25),
    median = median(age),
    upper_quartile = quantile(age, 0.75),
    max = max(age))
kable(age_summ)

sample_size	mean	sd	minimum	lower_quartile	median	upper_quartile	max
5534	23.44	4.7214	10	20	23	26	43

The histogram below also shows the distribution of age.

ageAtMar %>% ggplot(aes(x = age)) +
  geom_histogram(binwidth = 3, color = "white")

Guess about statistical significance

We are looking to see if the observed sample mean of 23.44019 is statistically greater than \(\mu_0 = 23\). They seem to be quite close, but we have a large sample size here. Let’s guess that the large sample size will lead us to reject this practically small difference.

Non-traditional methods

Bootstrapping for Hypothesis Test

In order to look to see if the observed sample mean of 23.44019 is statistically greater than \(\mu_0 = 23\), we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected.

We can use the idea of bootstrapping to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context:

Sample with replacement from our original sample of 5534 women and repeat this process 10,000 times,
calculate the mean for each of the 10,000 bootstrap samples created in Step 1.,
combine all of these bootstrap statistics calculated in Step 2 into a boot_distn object, and
shift the center of this distribution over to the null value of 23. (This is needed since it will be centered at 23.44019 via the process of bootstrapping.)

library(mosaic)
set.seed(2016)
mu0 <- 23
shift <- mu0 - age_summ$mean
null_distn <- do(10000) * 
  resample(ageAtMar, replace = TRUE) %>% 
  mutate(age = age + shift) %>% 
  summarize(mean_age = mean(age))
null_distn %>% ggplot(aes(x = mean_age)) +
  geom_histogram(bins = 30, color = "white")

We can next use this distribution to observe our \(p\)-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44019 for our \(p\)-value.

obs_mean <- age_summ$mean
null_distn %>% ggplot(aes(x = mean_age)) +
  geom_histogram(bins = 30, color = "white") +
  geom_vline(color = "red", xintercept = obs_mean)

Calculate \(p\)-value

pvalue <- null_distn %>%
  filter( mean_age >= obs_mean ) %>%
  nrow() / nrow(null_distn)
pvalue

## [1] 0

So our \(p\)-value is 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tail of the null distribution.

Bootstrapping for Confidence Interval

We can also create a confidence interval for the unknown population parameter \(\mu\) using our sample data using bootstrapping. Note that we don’t need to shift this distribution since we want the center of our confidence interval to be our point estimate \(\bar{x}_{obs} = 23.44019\).

boot_distn <- do(10000) * 
  resample(ageAtMar, replace = TRUE) %>% 
  summarize(mean_age = mean(age))

boot_distn %>% ggplot(aes(x = mean_age)) +
  geom_histogram(bins = 30, color = "white")

(ci_boot <- boot_distn %>% summarize(lower = quantile(mean_age, probs = 0.025),
    upper = quantile(mean_age, probs = 0.975)))

##    lower  upper
## 1 23.318 23.564

We see that 23 is not contained in this confidence interval as a plausible value of \(\mu\) (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (\(\mu > 23\)).

Interpretation: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.31821 and 23.56361 years.

Traditional methods

Check conditions

Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.

Independent observations: The observations are collected independently.

The cases are selected independently through random sampling so this condition is met.
Approximately normal: The distribution of the response variable should be normal or the sample size should be at least 30.

The histogram for the sample above does show some skew. The sample size here is quite large though (\(n = 5534\)) so both conditions are met.

Test statistic

The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean \(\mu\). A good guess is the sample mean \(\bar{X}\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \(\bar{x}_{obs} = 23.44019\) or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can “standardize” this original test statistic of \(\bar{X}\) into a \(T\) statistic that follows a \(t\) distribution with degrees of freedom equal to \(df = n - 1\):

\[ T =\dfrac{ \bar{X} - \mu_0}{ S / \sqrt{n} } \sim t (df = n - 1) \]

where \(S\) represents the standard deviation of the sample and \(n\) is the sample size.

Observed test statistic

While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t.test function to perform this analysis for us.

t.test(x = ageAtMar$age, 
       alternative = "greater",
       mu = 23)

## 
##  One Sample t-test
## 
## data:  ageAtMar$age
## t = 6.94, df = 5530, p-value = 0.0000000000023
## alternative hypothesis: true mean is greater than 23
## 95 percent confidence interval:
##  23.336    Inf
## sample estimates:
## mean of x 
##     23.44

We see here that the \(t_{obs}\) value is around 6.94. Recall that for large sample sizes the \(t\) distribution is essentially the standard normal distribution and this is why the statistic is reported as Z.

Compute \(p\)-value

The \(p\)-value—the probability of observing an \(t_{obs}\) value of 6.94 or more in our null distribution of a \(t\) with 5433 degrees of freedom—is essentially 0. This can also be calculated in R directly:

pt(6.936, df = nrow(ageAtMar) - 1, lower.tail = FALSE)

## [1] 0.0000000000022474

We can also use the \(N(0, 1)\) distribution here:

pnorm(6.936, lower.tail = FALSE)

## [1] 0.0000000000020168

State conclusion

We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.

Confidence interval

The confidence interval reported above with t.test is known as a one-sided confidence interval and gives the lowest value one could expect \(\mu\) to be with 95% confidence. We usually want a range of values so we can use alternative = "two.sided" to get the similar values compared to the bootstrapping process:

t.test(x = ageAtMar$age, 
       alternative = "two.sided",
       mu = 23)$conf

## [1] 23.316 23.565
## attr(,"conf.level")
## [1] 0.95

Comparing results

Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.

Diez, David, Christopher Barr, and Mine Cetinkaya-Rundel. 2015. OpenIntro Statistics, Third Edition.

One Mean Example