Pre-Bootcamp HW

Use the computational tools/statistical analysis workflow you are familiar with to perform the following tasks. By “statistical analysis workflow” I mean something like the following (there are plenty of other ways as well):

Problem Description: The nyc-weather-13.csv file available from http://bit.ly/nyc-weather-13 contains hourly meteorological data from 2013 for each of the three New York City airports:

Tasks to complete:

  1. Produce a plot exploring the relationship between month and temp.
  2. Calculate the minimum temperature recorded for each month across all three airports.
    • This should result in a dataset with 12 rows and two columns. (One row for each month.)
  3. Produce a plot showing how minimum temperature varies across the 12 months.
  4. Calculate the minimum temperature recorded for each month FOR EACH OF the three airports.
    • This should result in a dataset with 36 rows and three columns. (One row for each month for each airport.)
  5. Explore the multivariate relationship between month, airport, and minimum temperature via a statistical graphic.

Last but not least: Produce a report providing the calculations done in table form as well as the graphics made. Also include a brief discussion about what stands out from the plots and calculations.

Pre-Bootcamp HW R Solutions


Day 1 HW

Produce appropriate 5NG (Five Named Graphs) with R package & data set in [ ], e.g., [nycflights13 \(\rightarrow\) weather]. Try to look through the help documentation/Google to improve/customize your plots.

  1. Does age predict recline_rude?
    [fivethirtyeight \(\rightarrow\) na.omit(flying)]

  2. Distribution of age by sex
    [okcupiddata \(\rightarrow\) profiles]

  3. Does budget predict rating?
    [ggplot2movies \(\rightarrow\) movies]

  4. Distribution of log base 10 scale of budget_2013
    [fivethirtyeight \(\rightarrow\) bechdel]