The National Survey of Family Growth conducted by the Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and men’s and women’s health. One of the variables collected on this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? (Tweaked a bit from Diez, Barr, and Cetinkaya-Rundel 2015 [Chapter 4])
Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years.
Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.
It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.
library(dplyr)
library(knitr)
library(readr)
download.file("ismayc.github.io/teaching/sample_problems/ageAtMar.csv",
destfile = "ageAtMar.csv",
method = "curl")
ageAtMar <- read_csv("ageAtMar.csv")
age_summ <- ageAtMar %>%
summarize(sample_size = n(),
mean = mean(age),
sd = sd(age),
minimum = min(age),
lower_quartile = quantile(age, 0.25),
median = median(age),
upper_quartile = quantile(age, 0.75),
max = max(age))
kable(age_summ)
sample_size | mean | sd | minimum | lower_quartile | median | upper_quartile | max |
---|---|---|---|---|---|---|---|
5534 | 23.44 | 4.7214 | 10 | 20 | 23 | 26 | 43 |
The histogram below also shows the distribution of age
.
ageAtMar %>% ggplot(aes(x = age)) +
geom_histogram(binwidth = 3, color = "white")
We are looking to see if the observed sample mean of 23.44019 is statistically greater than \(\mu_0 = 23\). They seem to be quite close, but we have a large sample size here. Let’s guess that the large sample size will lead us to reject this practically small difference.
In order to look to see if the observed sample mean of 23.44019 is statistically greater than \(\mu_0 = 23\), we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected.
We can use the idea of bootstrapping to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context:
boot_distn
object, andlibrary(mosaic)
set.seed(2016)
mu0 <- 23
shift <- mu0 - age_summ$mean
null_distn <- do(10000) *
resample(ageAtMar, replace = TRUE) %>%
mutate(age = age + shift) %>%
summarize(mean_age = mean(age))
null_distn %>% ggplot(aes(x = mean_age)) +
geom_histogram(bins = 30, color = "white")
We can next use this distribution to observe our \(p\)-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44019 for our \(p\)-value.
obs_mean <- age_summ$mean
null_distn %>% ggplot(aes(x = mean_age)) +
geom_histogram(bins = 30, color = "white") +
geom_vline(color = "red", xintercept = obs_mean)
pvalue <- null_distn %>%
filter( mean_age >= obs_mean ) %>%
nrow() / nrow(null_distn)
pvalue
## [1] 0
So our \(p\)-value is 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tail of the null distribution.
We can also create a confidence interval for the unknown population parameter \(\mu\) using our sample data using bootstrapping. Note that we don’t need to shift this distribution since we want the center of our confidence interval to be our point estimate \(\bar{x}_{obs} = 23.44019\).
boot_distn <- do(10000) *
resample(ageAtMar, replace = TRUE) %>%
summarize(mean_age = mean(age))
boot_distn %>% ggplot(aes(x = mean_age)) +
geom_histogram(bins = 30, color = "white")
(ci_boot <- boot_distn %>% summarize(lower = quantile(mean_age, probs = 0.025),
upper = quantile(mean_age, probs = 0.975)))
## lower upper
## 1 23.318 23.564
We see that 23 is not contained in this confidence interval as a plausible value of \(\mu\) (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (\(\mu > 23\)).
Interpretation: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.31821 and 23.56361 years.
Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.
Independent observations: The observations are collected independently.
The cases are selected independently through random sampling so this condition is met.
Approximately normal: The distribution of the response variable should be normal or the sample size should be at least 30.
The histogram for the sample above does show some skew. The sample size here is quite large though (\(n = 5534\)) so both conditions are met.
The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean \(\mu\). A good guess is the sample mean \(\bar{X}\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \(\bar{x}_{obs} = 23.44019\) or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can “standardize” this original test statistic of \(\bar{X}\) into a \(T\) statistic that follows a \(t\) distribution with degrees of freedom equal to \(df = n - 1\):
\[ T =\dfrac{ \bar{X} - \mu_0}{ S / \sqrt{n} } \sim t (df = n - 1) \]
where \(S\) represents the standard deviation of the sample and \(n\) is the sample size.
While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t.test
function to perform this analysis for us.
t.test(x = ageAtMar$age,
alternative = "greater",
mu = 23)
##
## One Sample t-test
##
## data: ageAtMar$age
## t = 6.94, df = 5530, p-value = 0.0000000000023
## alternative hypothesis: true mean is greater than 23
## 95 percent confidence interval:
## 23.336 Inf
## sample estimates:
## mean of x
## 23.44
We see here that the \(t_{obs}\) value is around 6.94. Recall that for large sample sizes the \(t\) distribution is essentially the standard normal distribution and this is why the statistic is reported as Z
.
The \(p\)-value—the probability of observing an \(t_{obs}\) value of 6.94 or more in our null distribution of a \(t\) with 5433 degrees of freedom—is essentially 0. This can also be calculated in R directly:
pt(6.936, df = nrow(ageAtMar) - 1, lower.tail = FALSE)
## [1] 0.0000000000022474
We can also use the \(N(0, 1)\) distribution here:
pnorm(6.936, lower.tail = FALSE)
## [1] 0.0000000000020168
We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.
The confidence interval reported above with t.test
is known as a one-sided confidence interval and gives the lowest value one could expect \(\mu\) to be with 95% confidence. We usually want a range of values so we can use alternative = "two.sided"
to get the similar values compared to the bootstrapping process:
t.test(x = ageAtMar$age,
alternative = "two.sided",
mu = 23)$conf
## [1] 23.316 23.565
## attr(,"conf.level")
## [1] 0.95
Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.
Diez, David, Christopher Barr, and Mine Cetinkaya-Rundel. 2015. OpenIntro Statistics, Third Edition.