Mileage of used cars is often thought of as a good predictor of sale prices of used cars. Does this same conjecture hold for so called “luxury cars”: Porches, Jaguars, and BMWs? More precisely, do the slopes and intercepts differ when comparing mileage and price for these three brands of cars? To answer this question, data was randomly selected from an Internet car sale site. (Tweaked a bit from Cannon et al. 2013 [Chapter 1 and Chapter 4])

There are many hypothesis tests to run here. It’s important to first think about the model that we will fit to address these questions. We want to predict `Price`

(in thousands of dollars) based on `Mileage`

(in thousands of miles). A simple linear regression equation for this would be \(\hat{Price} = b_0 + b_1 * Mileage\).

We are dealing with a more complicated example in this case though. We need to also include in `CarType`

to our model. Since `CarType`

has three levels: `BMW`

, `Porche`

, and `Jaguar`

, we encode this as two dummy variables with `BMW`

as the baseline (since it occurs first alphabetically in the list of three car types). This model would help us determine if there is a statistical difference in the intercepts of predicting `Price`

based on `Mileage`

for the three car types, assuming that the slope is the same for all three lines:

\[\hat{Price} = b_0 + b_1 * Mileage + b_2 * Porche + b_3 * Jaguar.\]

This is not exactly what the problem is asking for though. It wants us to see if there is also a difference in the slopes of the three fitted lines for the three car types. To do so, we need to incorporate *interaction* terms on the dummy variables of `Porche`

and `Jaguar`

with `Mileage`

. This also creates a baseline interaction term of `BMW:Mileage`

, which is not specifically included in the model but comes into play by setting `Jaguar`

and `Porche`

equal to 0:

\[\hat{Price} = b_0 + b_1 * Mileage + b_2 * Porche + b_3 * Jaguar + b_4 Mileage*Jaguar + b_5 Mileage*Porche.\]

Null hypothesis: The coefficients on the parameters (including interaction terms) of the least squares regression modeling price as a function of mileage and car type are zero.

Alternative hypothesis: At least one of the coefficients on the parameters (including interaction terms) of the least squares regression modeling price as a function of mileage and car type are nonzero.

- \(H_0: \beta_i = 0\), where \(\beta_i\) represents the population coefficient of the least squares regression modeling price as a function of mileage and car type.
- \(H_A:\) At least one \(\beta_i \ne 0\)

It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

```
library(dplyr)
library(knitr)
library(ggplot2)
library(Stat2Data)
data(ThreeCars)
ThreeCars <- ThreeCars %>%
select(CarType, Price, Mileage) %>%
mutate(CarType = as.character(CarType))
options(digits = 5, scipen = 20, width = 90)
```

The scatterplot below shows the relationship between mileage, price, and car type.

`qplot(x = Mileage, y = Price, color = CarType, data = ThreeCars, geom = "point")`