Before finishing and turning in any project, or sharing a project, it is recommended that you run this:
?
functionWhich one of these R chunk options dictates whether the code that produces the result should be printed before the corresponding R output:
eval
include
echo
na.rm
Which of the following is one way to correctly extract and list the police_force_size
data from the police_resid2
data frame:
police_resid2 + extract "police_force_size"
data = police_resid2, list = police_force size
police_resid2$police_force_size
police_resid2$c(police_force_size)
What would be the most appropriate plot for a situation that was attempting to plot the number of text messages sent daily by gender?
What R chunk would produce a faceted histogram that plots age
by orientation
, with a binwidth of 10, and with each orientation a different color?
ggplot(data = profiles, mapping = aes(x = age)) +
geom_histogram(binwidth = 10, aes(fill = orientation)) +
facet_wrap(~ orientation)
ggplot(data = okcupiddata, mapping = aes(x = age)) +
geom_histogram(binwidth = 10, fill = orientation) +
facet_wrap(~ orientation)
C.
ggplot(data = profiles, mapping = aes(x = age)) +
geom_histogram(bins = 10, aes(fill = "orientation")) +
facet_wrap(~ orientation)
ggplot(data = okcupiddata, mapping = aes(x = age)) +
geom_bar(binwidth = 10, fill = orientation) +
facet_wrap(~ orientation)
You are asked to produce an appropriate plot for looking at the relationship between age
and height
. What kind of plot do you make?
What makes a data set tidy?
A boxplot is used for…
What is faceting?
Which of the following does aes()
thetic NOT affect?
What is an appropriate graph to use when plotting a graph for two categorical variables?
What is the most appropriate code for determining the relationship between sex
and age
?
ggplot(data = profiles, mapping = aes(x = sex, y = age)) +
geom_boxplot()
ggplot(data = profiles, mapping = aes(x = sex, fill = name)) +
geom_bar()
C.
ggplot(data = profiles, mapping = aes(x = age, fill = sex)) +
geom_bar()
ggplot(data = profiles, mapping = aes(x = age, y = sex)) +
geom_boxplot()
Why should scatterplots be jittered or points transparent?
What graph should be used when dealing with one categorical variable?
Which R chunk produces the most appropriate faceted barplot of carrier
based on name
?
A.
ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
geom_bar() +
facet_grid(name ~ .)
ggplot(data = flights_namedports, mapping = aes(x = name, fill = carrier)) +
geom_bar() +
facet_grid(. ~ name)
C.
ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
geom_bar() +
facet_grid(carrier ~ .)
ggplot(data = flights_namedports, mapping = aes(x = name, fill = carrier)) +
geom_bar() +
facet_grid(. ~ carrier)
What type of plot would be used for a categorical predictor and a numerical response?
Why do we use R?
Which of these codes is the most correct?
A.
ggplot(data = profiles, mapping = aes(x = drugs))
geom_bar(color = coral, fill = "green", na.rm = TRUE)
ggplot(data=profiles, mapping= aes(x=drugs)) +
geom_bar(color= "coral", fill= "green", na.rm = FALSE)
C.
ggplot(data = profiles, mapping = aes(x = drugs)) +
geom_bar(color = "coral", fill = "green", na.rm = TRUE)
ggplot(mapping = (aes) x = drugs, data = profiles,) +
geom_bar(color = "coral", fill = "green", na.rm = TRUE)
When producing a code for a scatterplot, do we use the fill
function or the color
function to change the color of the points?
color
fill
Tidy data is…
What is one difference between a histogram and a bar plot?
The Code Editor/View Window is?
If you would like to list many entries in a vector
object, you can do so by?
?c
in the R consoleView
functionRules for naming objects in R include?
geom
refers to?
Box plots are best for?
One categorical variable is best displayed in which type of plot?
What are the values given in a 5-number Summary?
Which chunk produces the following graph with warning output?
## Warning: Removed 9430 rows containing non-finite values (stat_bin).
A.
ggplot(data = flights, mapping = aes(x = arr_delay)) +
geom_histogram(bins = 1000, color = "black", fill = "gold")
ggplot(data = flights, mapping = aes(x = arr_delay)) +
geom_bar(binwidth = 15, color = "black", fill = "gold")
C.
ggplot(data = flights, mapping = aes(x = arr_delay, fill = gold)) +
geom_histogram(bins = 10, color = "black")
ggplot(data = flights, mapping=aes(x = arr_delay)) +
geom_histogram(binwidth = 100, color = black, fill = gold)
What are the correct definitions of the echo
, eval
, and include
chunk options?
echo
specifies whether the code AND its output should be in the resulting knitted document, eval
specifies whether the code should be assessed or just displayed without its output, include
dictates whether the code that produces the result should be printed before the corresponding R output.echo
specifies whether the code should be assessed or just displayed without its output, eval
dictates whether the code that produces the result should be printed before the corresponding R output, include
specifies whether the code AND its output should be in the resulting knitted document.echo
dictates whether the code that produces the result should be printed before the corresponding R output, eval
specifies whether the code AND its output should be in the resulting knitted document, include
specifies whether the code should be assessed or just displayed without its output.echo
dictates whether the code that produces the result should be printed before the corresponding R output, eval
specifies whether the code should be assessed or just displayed without its output, include
specifies whether the code AND its output should be in the resulting knitted document.Given the flights
data set, what plot would be most appropriate for comparing the variables carrier
and arr_delay
?
What is NOT a feature of Tidy Data?
What is not a reason we use R?
What is not the name of a type of graph?
Which one is not a panel seen on RStudio?
What is not a key element of tidy data?
What is not a component of Hadley Wickham’s workflow graphic?
Which one of the following is a valid R code for a faceted plot?
facet_grid()
Facets()
facet_wrap()
What plot would correspond to a data set that compares one continuous numerical variable to one categorical variable?
When usually would be a good reason to use a line graph over a scatter plot?
What does it mean to “jitter” the points on a plot?
What reason did Naomi Robbins, author of “Creating More Effective Graphs”, for why we shouldn’t use pie charts?
What are the main features of a tidy data set?
If you want to plot two variables, where the explanatory is categorical and the response is numeric, which types of graphs would be appropriate to use?
If you want to plot two categorical variables which type of graph would be the most appropriate to use?
What is the difference between a faceted histogram and a faceted barplot?
What is one thing the factor
function does?
Which of these would produce a side by side barplot of status
faceted by sex
with fill
color determined by orientation
?
A.
ggplot(data = profiles, aes(status, fill = orientation)) +
geom_bar() +
facet_wrap(~sex) +
(position = "dodge")
ggplot(data = profiles, aes(status, fill = orientation)) +
geom_bar(position = "dodge") +
facet_wrap(~sex)
C.
ggplot(data = profiles, aes(status, fill = orientation)) +
geom_bar() +
facet_wrap(~sex)
ggplot(data = profiles, aes(status, fill = orientation)) +
geom_bar(position = dodge) +
facet_wrap(~sex)
Which of these will give us rows 9 through 12 of the variable distance
in the flights
data set?
flights$distance[9:12]
flights$distance[-9:12]
flights$distance[c(9, 12)]
flights$distance [-c(8, 13)]
Which of these plots is not used when the explanatory variable is categorical?
If you were to compare two variables such as percentage of voters by region, what plot would you use to display this distribution of data?
What is the purpose of jittering when you have a large mass of points in a Scatterplot?
What 2 functions are most crucial when producing a faceted barplot?
factor
and geom_bar
functionsgeom_bar
and facet_grid
functionsbins
and facet_grid
functionsWhich of the following would be the result of running the following R code in the Console?
> library(okcupiddata)
okcupiddata
package if the package is installed.okcupiddata
dataset.okcupiddata
dataset.Why is the labs
function useful in keeping your data distributions organized?
labs
function is useful because it allows you to label the x and y axis.labs
function is useful because it allows you to change the default labels on the plot and add a title.labs
function is useful because others are able to receive clarification of the data distribution, with the help of labels.How are tidy datasets formed?
If a dataset called class_examscore
included 10 students names and grades that they received on three exams, how would you extract the first five student names in the dataset?
class_examscore$name[1:5]
class_examscore$name(c(1, 5))
class_examscore$name1:5
A faceted barplot is most likely used when you are graphing….
To produce a histogram with 4 bins, fill color blue and border color green, what would the ending of the R chunk look like?
geom_histogram(bins = 4, color = "green", fill = "blue")
geom_histogram(bins = 4, color = green, fill = blue)
geom_histogram(bins = 4, color = "blue", fill = "green")
geom_histogram(bins = 4, color = blue, fill = green)
When would sometimes using the median be better and more reliable than using the mean?
Which code below corresponds to a histogram with 10 bins, a black outline, and a blue fill (assuming the appropriate call to ggplot
with a +
was made just before this)?
ggplot(binwidth = 10, color = "white", fill = "forestgreen")
geom_histogram(binwidth = 10, color = "black", fill = "blue")
geom_histogram(bins = 10, color = "black", fill = "blue")
geom_boxplot(binwidth = 10, color = "black", fill = "blue")
What is known as the five-number summary?
What is the biggest problem with pie charts compared to barplots?
“Error: could not function 'ggplot'
” appears in the R Console after running:
ggplot(data = alaska_flights, aes(x = dep_delay, y = arr_delay)) +
geom_point(alpha = 0.2)
What needs to be done to fix this error?
ggplot2
rather than ggplot
in your code.ggplot
is a function in ggplot2
, so you need to make sure ggplot2
is loaded via library(ggplot2)
in a chunk above this chunk.g
in ggplot
needs to be capitalized: (Ggplot
)What does the following code do when it is run in R? (assuming that we have downloaded the police_resid2
data and loaded the ggplot2
package)?
ggplot(data = police_resid2, mapping = aes(x = white)) +
geom_histogram(bins = 5, color = "white", fill = "red")
white
variable using the police_resid2
dataWhat does NOT make a dataset tidy?
Below are a few of the symbols and panes that are the most common within RStudio. Find the incorrect symbol or pane description.
<-
. You can read this as putting your contents of the left-hand-side into an object name whenever it appears in the right-hand-side.When would it be most appropriate to use a boxplot?
What can be done when missing values from a data set need to be excluded from analysis?
na.rm = TRUE
narm = FALSE
na = TRUE
na.rm = FALSE
Carefully analyze the code below:
if(!require("ggplot2"))
install.packages("ggplot2", repos = "http://cran.rstudio.org")
library(ggplot2)
ggplot(data = weather, mapping = aes(x = temp))
If you wanted to produce a histogram, would the code above plot a histogram? Why or why not?
A. Yes. The code above specifically allows RStudio to plot the dataset.
B. No. The code just provided a backdrop. We need to be able to plot, thus we will follow up the code with a “+”. As seen in the code below.
ggplot(data = weather, mapping = aes(x = temp)) +
geom_histogram()
C. Yes. The code above has provided the backdrop and added the layer needed to plot.
D. No. The code just provided a backdrop. We need to be able to plot, thus we will begin the following line with a “+”. As seen in the code below.
ggplot(data = weather, mapping = aes(x = temp))
+ geom_histogram()
Which plot is appropriate to plot the relationship between sex
and education
in the profiles
okcupiddata
data set?
Which code will correctly produce the appropriate plot (though it won’t be a really nice thing to look at…) to plot sex
and education
in the profiles
okcupiddata
data set?
A.
ggplot(data = profiles, mapping = aes(x = education)) +
geom_histogram(aes(fill = education), na.rm = TRUE)
B.
ggplot(data = profiles, mapping = aes(x = education)) +
geom_bar(aes(fill = "education"), na.rm = TRUE) +
facet_wrap(~sex)
C.
ggplot(data = profiles, mapping = aes(x = education))+
geom_bar(aes(fill = education), na.rm = TRUE) +
facet_wrap(~ethnicity)
ggplot(data = profiles, mapping = aes(x = education)) +
geom_bar(aes(fill = education), na.rm = TRUE) +
facet_wrap(~sex)
What is the correct code to produce this graph?
ggplot(data = profiles, mapping = aes(x = smokes)) +
geom_bar(aes(fill = smokes), na.rm = TRUE) +
facet_wrap(~status)
B.
ggplot(data = profiles, mapping = aes(y = smokes)) +
geom_boxplot(aes(fill = smokes), na.rm = TRUE) +
facet_wrap(~status)
C.
ggplot(data = profiles, mapping = aes(x = smokes)) +
geom_bar(aes(fill= "smokes"), na.rm = TRUE) +
facet_wrap(~status)
D.
ggplot(data = profiles, mapping = aes(x = status)) +
geom_bar((aes(fill = smokes, na.rm = TRUE))) +
facet_wrap(~smokes)
How does the placement of a ~
affect the display of the plot? For example, if it is facet_grid(. ~ size)
compared to facet_grid(size ~ .)
?
~
does not effect the plot display in this case.size
are displayed on the top of the graph with (. ~ size)
size
are displayed on the right-hand-side of the graph (size ~)
When a plot is skewed to the right it…?
When using two categorical variables, what is the most appropriate plot to use?
What precisely does the “box” of a boxplot represent?
What is needed to make the following chunk of code correct?
ggplot(data = profiles, mapping = aes(x = drugs)) +
geom_bar(color = "white", (fill = drugs), na.rm = TRUE)
(fill = "drugs")
, (na.rm = TRUE))
aes(fill = drugs)
What is the c
function used for?
How would we extract entries 2-34 from a variable of 72 entries?
[c(2-34)]
[c(2,34)]
[-c(2,34)]
[2:34]
Which of the following code is needed to produce a faceted histogram of age
based on and colored by sex
using the data set profiles
from okcupiddata
?
mapping = aes(x = age, fill = sex)
mapping = aes(x = sex, fill = age)
mapping = aes(x = age, y = sex)
mapping = aes(x = sex, y = age)
What type of graph is included in mapping two categorical variables?
What function do I use to label/rename parts of a graph?
aes
ggplot
labs
wrap
What are the five named graphics (5NG)?
Which code is correct for producing a histogram with the rain
variable with specified bins and color with no warning messages?
ggplot(data = weather, mapping = aes(x = rain)) +
geom_histogram(binwidth = 10, fill = "limegreen", color = "black", na.rm = TRUE)
ggplot(data = weather, mapping = aes(x = rain, y = humid)) +
geom_histogram(binwidth = 10, fill = "limegreen", color = "black", na.rm = TRUE)
ggplot(data = weather, mapping = aes(x = rain)) +
geom_boxplot(binwidth = 10, fill = "limegreen", color = "black", na.rm = TRUE)
D.
ggplot(data = weather, mapping = aes(x = rain)) +
geom_histogram(fill = "limegreen", color = "black", na.rm = TRUE)
Which of the following is most appropriate for plotting the temperature in Oregon for the past month?
What does skewed right mean when looking at a bar plot?
Which of the following is the best option for comparing the distributions of two variables that are not categorical?
How do you plot the relationship between age
and height
?
ggplot(data = profiles, mapping = sex, height) +
geom_histogram()
B.
ggplot(data = profiles, aes(x = age, y = height)) +
geom_bar()
C.
ggplot(data = profiles, aes(x = age, y = height)) +
facet_histogram(na.rm = TRUE)
D.
ggplot(data = profiles, aes(x= age, y= height)) +
geom_point(na.rm = TRUE)
Which of the following would be most appropriate when plotting one numerical and one categorical variable?
Which chunk would create a faceted histogram of sex
and age
, with border color pink, and fill color based on the values of smokes
, and 20 bins?
A.
ggplot(data = profiles, mapping = (x = age, y = sex)) +
geom_histogram(bins = 20, color = pink, fill = smokes)
B.
ggplot(data = profiles, mapping = aes(x = age)) +
geom_histogram(bins = 20, color = "pink", aes(fill = smokes)) +
facet_grid(~sex)
C.
ggplot(data = profiles, mapping = aes(x=age, y=sex)) +
geom_histogram(bins = 20, color = pink, fill = "smokes")
D.
ggplot(data=profiles, mapping=aes(x=age))+
geom_histogram(bins=20, color = "pink", aes(fill = smokes)) +
facet_plot(sex)
Why do we wrap things in aes
?
aes
What is the biggest difference between a bar plot and a histogram?
What two types of variables do box plots compare?
What does the code look like for a histogram?
ggplot(data = , mapping = aes(x = )) + geom_histogram()
ggplot(data= , mapping = (x = )) + geom_histogram()
ggplot(data= , mapping = (aes=)) + geom_histogram ()
ggplot(data= , mapping = aes(x = )) + create_histogram()
What types of variables does a scatterplot graph?
How do you set the transparency of point
s?
alpha =
trans =
add =
Why should you use a lineplot?
x
and one corresponding value of y
for each x
value.