3 R and RStudio Basics
3.1 What is R?
In Chapter 2, I discussed many of the reasons why you should begin doing your analyses (especially those of the data type) using R. If you skipped over that chapter in the hopes of just diving in to learning about R, I suggest you go back and read it over carefully. As you begin building fluency in working with R, it is especially important to review that introductory chapter from time to time.
3.1.1 R beginnings
R was developed by a group of statisticians who wanted an open-source alternative to the costly proprietary options that were (and still are) popular. Because it was created by statisticians (instead of computer scientists), R has some quirky aspects to it that take some time to get used to. We’ll see that many packages have been developed to help with this, and these days, you don’t need an advanced degree in statistics to work with R.
Getting back to the development of R… R was created by Ross Ihaka and Robert Gentleman in New Zealand at the University of Auckland. It is a spin-off of the S programming language and was named partly after the first names of its developers (as you can see from the emphasis above). The beginning ideas for creating R came in 1992, and the first version of R was released in 1994. You can find much more about the background of R, its features, and its connections to the S language on Wikipedia.
3.1.2 R packages
I first learned to use R as a graduate student at Northern Arizona University from Dr. Philip Turk in 2007. At the time, I never thought that R could have exploded in users as we have seen since 2011. I never would have thought that students taking an introductory statistics course would be encouraged to learn to use R.
In 2007, R was still largely an esoteric and tricky language used by statisticians to do analyses. Getting used to the syntax for producing plots and working with data was especially tricky for those with little to no programming experience. So what has changed since 2007 about learning R?
I believe one of the biggest developments has been the creation of packages to make R easier to work with for newbies. Packages are add-ons created by users of R to increase the functionality of the base R installation. Packages created by Hadley Wickham and others recently have greatly expanded the capabilities of R, while also working to make beginning with R simpler. As of April 2017, more than 10,400 packages were available on common R repositories.1
Another great development is the graphical user interface called RStudio and a package developed by RStudio, Inc. called
rmarkdown. We will discuss
rmarkdown (also referred to as R Markdown) in a Chapter 4, and will now focus on discussing RStudio.
3.2 What is RStudio?
RStudio is a powerful, free, open-source integrated development environment for R. Development on RStudio began in 2010, and the first beta was released in February 2011. It is available in two editions: RStudio Desktop and RStudio Server. This book will focus mostly on RStudio Server, but both versions are nearly identical to work with.
Instructions for downloading and installing R and RStudio on Windows and Mac machines are linked below. If you are using RStudio Server, your professor or members of your organization’s IT department have done these steps for you. For RStudio Server, you log on using a web browser to an account on the cloud. There are many advantages to using the RStudio Server for beginning users, including sharing of R projects to help with feedback and error resolution. Installation of software can also cause its own headaches, which are eliminated by using the RStudio Server.
Note for advanced users: You can also install your very own RStudio Server for around $5 per month on Digital Ocean. Instructions to do so can be found from Dean Attali here and on the Digital Ocean site here.
After you complete a few months of work with the RStudio Server, it is recommended that you download RStudio Desktop to your computer. The instructions to do so are below.
3.2.1 Installing R and RStudio Desktop
It is worth noting that you can’t just install RStudio Desktop without first installing R, as RStudio needs to have R installed to run. Step-by-step guides to installing R and RStudio Desktop with screenshots are available
Unless you plan to create PDF documents (which requires a multiple gigabyte download of LaTeX), you can skip some of the later steps of the installation. It is recommended that you select HTML as the Default Output Format for R Markdown. You’ll see more about this in Chapter 4.
3.3 Working in RStudio Server
3.3.1 Logging in and initial screen
The RStudio Server provides a web-based interface to run analyses in R. This means that you will only need an internet connection and a web browser to run your analyses. Your professor or administrator will provide you with a link to the web location of your RStudio Server. After entering the link, you’ll see a page that looks something like:
After logging in with your username and password, you should see a layout similar to what follows.
For reference, a screenshot of RStudio Desktop looks similar:
This makes switching between the two RStudio set-ups painless. A discussion of each of the different RStudio panes and their corresponding tabs is in Chapter 4. You’ll find that a lot of what follows also applies to RStudio Desktop (except for the Shared Projects feature), but it is always recommended to create an RStudio project regardless of whether you are on the cloud or working locally.
3.3.2 Basic Workflow with RStudio
When starting a new R project, it is good practice to create a new RStudio project to go along with it. RStudio project files have the extension
.Rproj and store metadata and information about the R environment you are working in. More information about RStudio projects is available from RStudio, Inc.
If you are sharing homework or lab assignments with your instructor using RStudio Server, for example, it might make sense to create an RStudio project, share it with your instructor, and then create new folders for each lab. We will follow this example below.
The video below shows you how to create a new RStudio project called
initial and add your first R Markdown file. Note that you also may see a description about what version of R is running on your initial login, as shown in the screencast below in the Console pane.
We have our first_rmarkdown.Rmd file set up.
3.4 RStudio Layout
Initially, you may be a little overwhelmed by all the different panes and tabs that are available in RStudio, but you will soon learn to appreciate this layout. We will begin with the top left pane and proceed clockwise. The layout of these panes can be customized, but it is recommended that beginning users keep the standard layout.
3.4.1 Code Editor / View Window
The pane in which you will likely spend a majority of your time is the pane on the top left. When you first login, this pane isn’t there, but it appeared when we made the first_rmarkdown.Rmd. This pane serves as a place to view the contents of files and objects in R. In the screencast below, you can see that we can change the text in this file and then save the file.
Note that, when you edit the file, the tab with the file name changes from black to red, and an asterisk appears after the file name, indicating that the latest changes have not been saved. You should get in the habit of saving files frequently. You can have multiple tabs open and view different files in this pane. In Chapter 4, you’ll also see that you can
View datasets in this pane.
It may not be clear what these additions are really doing just yet. That’s ok! When finished, you’ll be pressing the Knit HTML button near the top of the pane to put all of your text, code, and its output together. We aren’t quite there just yet though!
3.4.2 Environment / History
By default, the top right pane includes an Environment tab and a History tab. To get a sense of what these tabs provide, we’ll need to also use the bottom left pane and the Console tab. I’ll show you how to create multiple objects in R using the Console. Initially, you will see that the Environment tab tells us that the “Environment is empty.” This means there are no objects (e.g., data) yet. If you click on the History tab, you should also see a blank screen with a few icons. We won’t go over all of these buttons here, but I encourage you to hover over them and click on them to get a sense for what they do. As I enter code into the Console, watch to see how the Environment and History tabs change.
You can think of the Console as a place to play around. It is your R sandbox. You can test your code to make sure it is working and then copy that text into your Rmd file once you are satisfied with it. We’ll see more examples of this in the chapters to come.
Note that, by default, when you enter the name of an object into the console like I did with
sum_1_2 it displays the result. You’ve also been shown what is called the assignment operator denoted by
<-. You can read this as putting the contents of the right-hand-side into an object named whatever appears on the left-hand-side. In the example,
num1 is the name of an object that stores the value 7.
A powerful feature of the R language has been introduced in the
sum_1_2 <- sum(num1, num2) line.
sum is a function. Functions are denoted by their name and then a parenthesis, followed by one or more arguments separated by commas, and then a closing parenthesis. You’ll see many examples of these going forward. Of course, we’ll be using R for more than just a basic calculator shown here but this should give you an idea of what the Environment and History tabs store.
You’ll frequently use the Console as a way to check your work or experiment with how to solve a problem using R. Before RStudio, most users of R just had a window like the Console provides where they entered their commands and then looked at the results in different windows. We will see that the Console and the Code Editor/View Window will allow us to store all of our code in a file and then “run” that code through the Console to check that it works. The example video below shows how this is done.
Rules for naming objects
It is good practice to get in the habit of naming variables corresponding to what they actually represent. If you are dividing two different sums of numbers, you might want to choose a name like
ratio_of_sums to refer to that object. R has a few restrictions regarding what can be included in the name of R objects:
- Object names cannot begin with a number.
- Object names cannot contain symbols used for mathematics or to denote other operations native to R. These symbols include
Another important property of R is that is case-sensitive. You’ll see what this means in the video below that also includes examples of invalid names of objects. Note that we will continue to work inside the R chunk as we did in the last video. You’ll see that these code chunks provide a nice way to keep track of our important analyses.
You may have noticed that R will do some checks and alert you to potential errors by placing a red X to the left of lines of code with common syntax errors. These checks won’t catch all errors, but this can be helpful. Note also that
nAme refer to three different values.
Another thing to notice is that you can store numbers into an object with a given name like we did earlier with
num1. If we’d like to store a character string such as “Chester”, we can provide the name of the object on the left-hand-side of
<- and the string in quotation marks on the right-hand-side of
<-. There are more complex object types that you will see in Chapter 5, but it is always important to think about the difference between a number and a character in R.
It’s also a good idea to not call objects the names of functions that are built into R. You may want to call the addition of two numbers
sum and R will allow this, but it is HIGHLY recommended that you create more descriptive names and not choose to name objects the same as common R functions. Something like
sum_densities is better and less likely to be the name of a function.
help function (
Of course, you won’t know what all of the names of built-in functions are until you practice, but it is something to think about. If you are ever wondering if a function is built-in or is in a package you have included, you can use the
? function in the Console to check. Some examples are shown below as a screencast.