Putting the R in Reed and in Lewis and ClaRk

# Putting the <a href=""><code>R</code></a> in <a href=""><code>R</code></a>eed and <br> in Lewis and Cla<a href=""><code>R</code></a>k
### Chester Ismay <br><br> GitHub: <a href="https://github.com/ismayc"><code>ismayc</code></a> <br> Twitter: <a href="https://twitter.com/old_man_chester"><code>@old_man_chester</code></a>
### 2017-05-25 & 2017-05-26 <br><br> Bootcamp website at <a href="http://bit.ly/rbootcamp17" class="uri">http://bit.ly/rbootcamp17</a> <br> Slides available at <a href="http://bit.ly/rbootcamp17-slides" class="uri">http://bit.ly/rbootcamp17-slides</a>

---

# Table of Contents

- [Data Viz](#viz)
- [Data Wrangling](#wrangle)
- [Data Importing](#import)
- [Data Tidying](#tidy)
- [Data Modeling](#model)
- [Data Communicating](#rmarkdown)

---

# Pre-bootcamp HW

## Talk about your analysis on the <br> `weather` data <br> in groups of 2-3

---

# Let's get it all out there

## When you think of R what comes to mind?

---

## Why are you here wanting to learn R?

- ~~Because you enjoy partaking in [Schadenfreude](https://en.wikipedia.org/wiki/Schadenfreude)~~

---

## Why should you learn R?

- R is free, R improves, R has existed for 40+ years
- Students can show what they have learned to a potential employer easily
- The support system for R is actually much better than some people say

---

## But as a professor, why should you learn R?

- R and R Markdown will save you TONS of time. Long term thinking is key.
    - You can easily tweak your code if you need to do another analysis
    - Remembering what drop-down menu some thing is in two years from now in a different program will be hard
    - Remembering to copy-and-paste your updated plots/analysis into your word processor is a pain and error prone

---

## But as a professor, why should you learn R?
    
<center>
<img src="figure/tweet.jpg" style="width: 700px;"/>
</center>

---

# DEMO in RStudio <br> with R Markdown

---

## Analogy

R            |  RStudio |  DataCamp
:-------------------------:|:-------------------------:|:-------------------------:
<img src="figure/engine.jpg" alt="Drawing" style="width: 350px;"/>  |  <img src="figure/dashboard.jpg" alt="Drawing" style="width: 350px;"/>  |  <img src="figure/instructor.jpg" alt="Drawing" style="width: 350px;"/>

---

## What this bootcamp is

![](https://ourcodingclub.github.io/img/tidyverse.png)

- Designed for all levels of knowledge and background

---

## What this bootcamp is not

- A thorough development of machine learning <br> techniques applied to big data

- A discussion on the role of _p_-values in science

- A tutorial on how to work with base R subsetting and base R graphics

---

## What I hope you'll learn

- That R is not as scary as it used to be.

- Learning how to Google is an incredibly valuable skill.

- That having students turn in assignments using R scripts and R Markdown provides an efficient way to check their work and their analyses easily.

---

## What I hope you'll learn

- That the R packages you will use in this workshop are the same ones that are used by scientists and graduate programs all over the world

- That teaching students how to use open-source tools is what is best for them long term

- That students with no programming background can do great things in only a few weeks
    - [Sociology/Criminal Justice majors at Pacific U.](https://ismayc.github.io/soc301_s2017/group-projects/index.html)
    - [A wide range of students at Middlebury College](https://rudeboybert.github.io/MATH116/PS/final_project/final_project_outline.html)

---

## Pieces of advice

- Scaffold & support as a foreign languages do

- To be able to use R (and really any other language), students need more than a few assignments and more than a weekly lab

- Make use of RStudio Projects (students have a really hard time navigating directory structures)

---

## Pieces of advice

### My opinion

- Have students work on writing their code in R script files and documenting their errors
    - This is the same workflow that DataCamp uses

- Show students [R Markdown after a few weeks](https://ismayc.github.io/soc301_s2017/schedule/) of working with the script file
    - I've found it is hard for students to learn R and R Markdown from the start
    - Better to have them use R Markdown in groups initially

---

# R Data Types

---

## The bare minimum needed for understanding today

Vector/variable
  - Type of vector (`int`, `num`, `chr`, `lgl`, `date`)

Data frame
  - Vectors of (potentially) different types
  - Each vector has the same number of rows

---

## The bare minimum needed for understanding today

```r
library(tibble)
library(lubridate)
ex1 <- data_frame(
    vec1 = c(1980, 1990, 2000, 2010),
    vec2 = c(1L, 2L, 3L, 4L),
    vec3 = c("low", "low", "high", "high"),
    vec4 = c(TRUE, FALSE, FALSE, FALSE),
    vec5 = ymd(c("2017-05-23", "1776/07/04", "1983-05/31", "1908/04-01"))
  )
ex1
```

```
# A tibble: 4 x 5
   vec1  vec2  vec3  vec4       vec5
  <dbl> <int> <chr> <lgl>     <date>
1  1980     1   low  TRUE 2017-05-23
2  1990     2   low FALSE 1776-07-04
3  2000     3  high FALSE 1983-05-31
4  2010     4  high FALSE 1908-04-01
```
  
---

class: center, middle  
  
# Welcome to the [tidyverse](https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/)!
  
The `tidyverse` is a collection of R packages that share common philosophies and are designed to work together. <br><br> 
  
<a href="http://tidyverse.tidyverse.org/logo.png"><img src="figure/tidyverse.png" style="width: 200px;"/><a>

---

## Beginning steps

Frequently the first thing to do when given a dataset is to
- identify the observational unit,
- specify the variables,
- give the types of variables you are presented with, and
- check that the data is <u>tidy</u>. (TO COME)

This will help with 
- choosing the appropriate plot, 
- summarizing the data, and 
- understanding which inferences/models can be applied.

---

# Data Visualization

Inspired by [Hans Rosling](https://www.youtube.com/watch?v=jbkSRLYSojo)

---

- What are the variables here?
- What is the observational unit?
- How are the variables mapped to aesthetics?

---

## Grammar of Graphics

.right-column[
<a href="http://www.powells.com/book/the-grammar-of-graphics-9780387245447"><img src="figure/graphics.jpg" style="width: 300px;"></a>
]

---

## Grammar of Graphics in R

.right-column[
<a href="http://www.powells.com/book/ggplot2-elegant-graphics-for-data-analysis-9783319242750/68-428"><img src="figure/ggplot2.jpg" style="width: 300px;"></a>
]

---

## Grammar of Graphics elsewhere

- [Make plots online with plotly](https://plot.ly/)

- [Leland Wilkinson](https://research.tableau.com/user/leland-wilkinson) works at [Tableau](https://www.tableau.com/)

- The Grammar of Graphics provides a theoretical framework for building and deciphering statistical graphics

---

# What is a statistical graphic?
--

## A `mapping` of <br> `data` variables
--

## to <br> `aes()`thetic attributes

--
## of <br> `geom_`etric objects.

---

# Back to basics

---

### Consider the following data in tidy format:

```r
simple_ex <-
  data_frame(
    A = c(1980, 1990, 2000, 2010),
    B = c(1, 2, 3, 4),
    C = c(3, 2, 1, 2),
    D = c("low", "low", "high", "high")
  )
simple_ex
```

```
# A tibble: 4 x 4
      A     B     C     D
  <dbl> <dbl> <dbl> <chr>
1  1980     1     3   low
2  1990     2     2   low
3  2000     3     1  high
4  2010     4     2  high
```

---

### Consider the following data in tidy format:

- Sketch the graphics below on paper, where the `x`-axis is variable `A` and the `y`-axis is variable `B`

1. <small>A scatter plot</small>
1. <small>A scatter plot where the `color` of the points corresponds to `D`</small>
1. <small>A scatter plot where the `size` of the points corresponds to `C`</small>
1. <small>A line graph</small>
1. <small>A line graph where the `color` of the line corresponds to `D` with points added that are all forestgreen of size 4.</small>

---