3 R and RStudio Basics
3.1 What is R?
In Chapter 2, I discussed many of the reasons why you should be doing your analyses (especially those of the data type) using R. If you skipped over that chapter in the hopes of just hopping into learning about R, I request that you to go back to it and carefully read it over. As you begin working with R, it is especially important to review that introductory chapter from time to time.
3.1.1 R beginnings
R was created by a group of statisticians who wanted an open-source alternative to the costly proprietary options. Being created by statisticians (instead of computer scientists) means that R has some quirky aspects to it that take a little bit of time to get used to. We’ll see that many packages have been developed to help with this and that you don’t need to have advanced degrees in Statistics to be able to work with R now.
Getting back to the development of R… R was created by Ross Ihaka and Robert Gentleman in New Zealand at the University of Auckland. It is a spin-off of the S programming language and is named partly after the first names of its developers (as you can see in the emphasis above). The beginning ideas for creating R came in 1992 and the first version of R was released in 1994. You can find much more about the background of R and its features as well as its connections to the S language on its Wikipedia page.
3.1.2 R packages
I first learned to use R while a graduate student at Northern Arizona University from Dr. Philip Turk in 2007. At the time, I never thought that R could have exploded in users as we have seen since 2011. I never would have thought that students taking an introductory statistics course would be encouraged to learn to use R.
In 2007, it was still largely esoteric and tricky language used by statisticians to do analyses. Getting used to the syntax for producing plots and working with data was especially tricky for those with little to no programming experience. So what has changed since 2007 about learning R?
I believe one of the biggest developments has been the creation of packages to make R easier to work with for newbies. Packages are created by users of R to increase the functionality of the base R installation. Packages created by Hadley Wickham and others recently have greatly expanded the capabilities of R, while also working to make beginning with R simpler. From the Wikipedia page referenced earlier, as of January 2016, there were around 7800 additional R packages available on common R repositories.1
Another great development is the graphical user interface called RStudio and the package developed by the those that work for RStudio, Inc. called
rmarkdown. We will discuss
rmarkdown (also referred to as R Markdown) in Chapter 4, and will now focus on discussing RStudio.
3.2 What is RStudio?
RStudio is a powerful, free, open-source integrated development environment for R. It began development in 2010 and its first beta release came in February 2011. It is available in two editions: RStudio Desktop and RStudio Server. This book will focus mostly on using the RStudio Server, but both versions are nearly identical to work with.
You can find instructions linked below for downloading R and RStudio on Windows and Mac machines. If you are using RStudio Server, your professor and members of your organization’s IT department have done these steps for you. On the RStudio Server you log on using a web browser to an account sitting on the cloud. There are many advantages to using the RStudio Server for the beginning user including sharing of R projects to help with feedback and error resolution. Installation of software can also cause its own headaches and this is eliminated by using the RStudio Server.
Note for advanced users: You can also install your very own RStudio Server for around $5 per month on Digital Ocean. Instructions to do so can be found from Dean Attali here and on the Digital Ocean site here.
After you complete a few months of work with the RStudio Server, it is recommended that you download RStudio Desktop to your computer. The instructions to do so are below.
3.2.1 Installing R and RStudio Desktop
It is worth noting that you can’t just install RStudio Desktop without installing R as RStudio needs to have R installed in order to run. A step-by-step guide to installing R and RStudio Desktop with screenshots can be found
Unless you plan to create PDF documents (which requires a multiple gigabyte download of LaTeX) you can skip some of the later steps of the installation. It is recommended that you select HTML as the Default Output Format for R Markdown. You’ll see more about this in Chapter 4.
3.3 Working in RStudio Server
3.3.1 Logging in and initial screen
The RStudio Server provides a web-based way to run analyses in R. This means that you will only need an internet connection and a web browser to run your analyses. Your professor or administrator will provide you with a link to the web location of your RStudio Server. After entering the link, you’ll see a page that looks something like:
After logging in with your username and password, you should see a layout similar to what follows.
For your own reference, a screenshot of RStudio Desktop looks like:
As you can see they are almost identical. This makes working between the two different RStudio set-ups painless. A discussion of what each of the three different RStudio panes (will soon be four panes) and their corresponding tabs will occur in Chapter 4. You’ll find that a lot of what follows also applies to RStudio Desktop (except for the Shared Projects feature), but it always recommended to create an RStudio project regardless of whether you are on the cloud or working locally.
3.3.2 Basic Workflow with RStudio
A good habit to get into whenever you start a new project with R code is to create a new RStudio project to go along with it. RStudio project files have the extension
.Rproj and store metadata that goes along with the documents you’ve saved and information about the R environment you are working in. More information about RStudio projects is available from RStudio, Inc. here.
If you are sharing homework or lab assignments with your instructor using RStudio Server, for example, it might make sense to create an RStudio project, share it with your instructor, and then create new folders for each lab. We will follow this example below.
The video below shows you how to create a new RStudio project called
initial and also your first R Markdown file. Note that you also may see a description about what version of R is running on your initial login like shown in the screencast below in the Console pane.
We have our first_rmarkdown.Rmd file set up.
3.4 RStudio Layout
You may be initially a little overwhelmed by all the different panes and tabs that are available in RStudio. You’ll soon learn to love this layout but, as with all things, it will take a little bit to get used to it. We will begin with the top left pane, proceed to the top right pane, then to the bottom left pane, and lastly to the bottom right pane. These panes can be customized but it is recommended for beginning users to keep things in this standard layout.
3.4.1 Code Editor / View Window
Likely the pane where you will spend a majority of your time is in the top left. When you first login this pane isn’t there, but it was created when we made the first_rmarkdown.Rmd. This pane serves as a place to view the contents of files and objects in R. In the screencast below, you can see that we can change the text in this file and then save the file.
Note that the tab that gives the filename will change its color from black to red and an asterisk will appear after the filename. This is to remind you that the file is not saved. You’ll need to get into the habit of saving files frequently. You can have multiple tabs open and view different files in this pane. You’ll also see that you can
View datasets in this pane in Chapter 4.
It may not be clear what these additions are really doing just yet. You’ll be pressing the Knit HTML button near the top of the pane to put all of your text, code, and its output together. We aren’t quite there just yet though!
3.4.2 Environment / History
The next pane includes by default an Environment tab and a History tab. To get a sense of what these tabs provide we’ll need to also use the bottom left pane and the Console tab. I’ll show you how to create multiple objects in R using the Console. Initially you will see that the Environment tab tells us that the “Environment is empty.” If you click on the History tab, you should also see a blank screen with a few icons. We won’t go over using all of these buttons here, but you are encouraged to hover over them and click on them to get a sense for what they do. As I enter code into the Console watch to see how the Environment and History tabs change.
You can think of the Console as a place to play around. It is your R sandbox. You can test your code to make sure it is working and then copy that text into your Rmd file into a chunk after you are satisfied with it. We’ll see more examples of this in the chapters to come.
Note that, by default, when you enter the name of an object into the console like I did with
sum_1_2 it displays the result. You’ve also been shown what is called the assignment operator denoted by
<-. You can read this as putting the contents of the right-hand-side into an object named whatever appears on the left-hand-side. For example,
num1 is the name of an object that stores the value 7.
A powerful feature of the R language has been introduced in the
sum_1_2 <- sum(num1, num2) line.
sum is a function. Functions are denoted by their name and then a parenthesis, their arguments separated by commas, and then a closing parenthesis. You’ll see many examples of these going forward. Of course, we’ll be using R for more than just a basic calculator shown here but this should give you a good idea of what the Environment and History tabs store.
You’ll frequently use the Console as a way to check your work or thoughts on how to solve a problem using R. Before RStudio came around, most users of R just had a window like the Console provides where they entered their commands and then looked at the results in different windows. We will see that the Console and the Code Editor/View Window will allow us to store all of our code in a file and then “run” that code through the Console to check that it works. The example video below shows how this may be done.
Rules for naming objects
It is good practice to get in the habit of naming variables corresponding to what they actually represent. If you are dividing two different sums of numbers, you might want to choose a name like
ratio_of_sums to refer to that object. R has a couple restrictions on what can be included in the name of R objects:
- Object names cannot begin with a number.
- Object names cannot contain symbols used for mathematics or to denote other operations native to R. These symbols include
Another important property of R is that is case-sensitive. You’ll see what this means in the video below that also includes examples of invalid names of objects. Note that we will continue to work inside the R chunk as we did in the last video. You’ll see that these code chunks provide a nice way to keep track of our important analyses.
You may have noticed that R will do some checks and alert you to potential errors by placing a red X to the left of the line of code with an error. It won’t catch all errors, but this can be helpful. Not also that
nAme refer to three different values.
Another thing to notice is that you can store numbers into an object with a given name like we did earlier with
num1. If we’d like to store a character string such as “Chester”, we can provide the name of the object on the left-hand-side of
<- and the string in quotation marks on the right-hand-side of
<-. There are more complex object types that you will see in Chapter 5, but it is always important to think about the difference between a number and a character in R.
It’s also a good idea to not call objects the names of functions that are built into R. You may want to call the addition of two numbers
sum and R will allow this, but it is HIGHLY recommended that you create more descriptive names and not choose to name objects the same as common R functions. Something like
sum_densities is better and less likely to be the name of a function.
help function (
Of course, you won’t know what all of the names of built-in functions are until you practice, but it is something to think about. If you are ever wondering if a function is built-in or is in a package you have included, you can use the
? function in the Console to check. Some examples are shown below as a screencast.