Learning Goals
Have students engage in the data/science research pipeline in as faithful a manner as possible while maintaining a level suitable for novices
Contribute to the R community by developing a vignette for others to learn from on a
fivethirtyeight
R package data set usingggplot2
anddplyr
Design a data journalism piece focused on effective data storytelling ideas from class
Format
- Your final project will center around writing a data journalism article that acts as both a vignette for the
fivethirtyeight
R package and is in the style of quality data journalism. You should focus on merging these two types when you write your group project. You’ll see that the current vignettes include some advanced material beyond what we cover in class. This is not an expectation of your Group Project, but rather a way for you to see the power of R and how it ties nicely with data journalism.- Examples of such types of journalism can be found on:
- FiveThirtyEight
- The New York Times’ The Upshot
- Examples of vignettes can be found at
- Examples of such types of journalism can be found on:
- You must flex your statistical and data science muscles you’ve built so far this semester. In particular:
- Data visualization using
ggplot2
- Data manipulating/wrangling using
dplyr
- Data visualization using
- You will need to include AT LEAST two publication quality graphics produced using
ggplot2
. - Your project also needs to include AT LEAST one instance of using
dplyr
to create a summary table to be displayed directly or for use in creating a graphic usingggplot2
. - When you are writing the vignette, you should be thinking about explaining ideas to someone that has little to no statistical background. The best articles/vignettes are ones that captivate a broad audience.
- You should carefully convey what the point of your article is throughout your vignette. You should think about the following as you work as a team.
- Why is this important for others to know?
- What might someone be confused about as they read this?
Administrative Notes
Submission Format: Each group must create a single RStudio project that is shared amongst the group members and with Chester. In the same folder as the RStudio project, you will include an R Markdown file containing all R code needed and the text of your vignette as well as a knitted HTML document that shows your analysis and writing. You’ll be putting all of your documents into a Google Drive folder on Chester’s account and your specific link will be shared with you after you have decided on groups. All your work will be centrally located here; do not email or message me on Slack with any files. You can request assistance by specifying on what line your error message occurs and in what file you are working. Since everything will be synced, I will be able to go look into your work as needed.
- In the shared Google Drive folder will be a file called
final_project.Rmd
. This is where you will put your final analysis.
- Use the
scratch.R
file for all of your preliminary work. I want you to keep track of your progress in this file so try not to delete incorrect code, but rather document the error and then work to fix it.
- You’ll also see a
.Rproj
file in this folder. This is the “single RStudio project” I referred to earlier. You should always open this file when you are starting/resuming your analysis in RStudio. It makes sure the RStudio is working in the correct directory and stores a lot of settings related to your project.
- In the shared Google Drive folder will be a file called
Project Proposal
- Submit by the beginning of class on Tuesday, March 7
- This should be typed in a Google Doc and shared with isma5720@pacificu.edu.
- The sooner you get this to me the sooner I can give you the go-ahead with which data set you will creating a vignette for.
- Identify as soon as possible who the three members of your group will be. Send me a message on Slack when you have identified who you’d like to work with.
- You should identify what are the top two data sets from the
fivethirtyeight
package you’d be interested in analyzing. - Identify at least two questions you’d like to answer for each data set
- This should include what variables you will use, what types they are, and which of the 5NG would be appropriate for assisting in answering these questions.
- I will review your submissions and assign you your data set by the end of the day on Tuesday, March 7.
Group Project Draft
- Due at the beginning of class on Tuesday, March 14
- This should be a nearly complete version of your project. All typos should be fixed and it should include all of the requirements of the project. Failure to have this completed at this time will likely result in a failing grade for the Group Project. This means that your work should be nearly entirely done so that I can review it and provide some comments on style that you can fix relatively quickly.
Electronic-Only Group Project Submission
- Due Thursday, March 23 at 10 AM
- In the shared folder on Google Drive:
- A
group_project.Rmd
file that completely reproduces your analysis. I should have to press Knit only once to recreate the entire HTML page. - A
group_project.html
file that shows that you were successful in knitting your R Markdown file into a nicely formatted vignette. - All necessary data files (saved as CSV files) if you needed to join with other data set(s).
- Individually: A Google Forms survey to ensure all members of the group were active members of completing the group project. You will not receive a passing grade for the Group Project on an individual basis if you were not a good and supportive team member. I will include a link to this on Slack on or before Wednesday, March 22.
- Your project won’t be considered submitted until I give you Slack confirmation that everything looks good.
Honor Code: This is the equivalent of an academic term paper; all honor code rules about plagarism and citations apply.