Advanced Economic Methods
Exercise 1: Intro to R
Welcome
Welcome to the course! Over the next 10 lessons, you will learn how
to run regressions, store the results, and generate plots using
R. This document is divided into sections that first
introduce you to the layout of RStudio, outline the file
layout we will use to ensure that the code can be copy/pasted into your
own files to make the lessons run smoothly, and show you the first steps
to setting up and running code.
File organization
While there are many different ways to organize your files, please
use the structure I provide so that all code will run if you would like
to copy and paste it directly into your own R documents
without any headaches. I use a parent folder called R, with
subfolders named cmd for command (store your code files here),
dat (store your data files here), and results (store
your plots and regression results here). The file structure should look
like this:
R/
├── cmd/ # code files
├── dat/ # data files
└── results/ # plots and regression outputs
We will use the data files provided by the author of the textbook
with the Excel format. We will use this format since most publicly
available data sets are in Excel or CSV formats. Extract these files
from the zip file of data provided by the author and copy them to the
dat folder.
Installing and using R and RStudio
To start, you must download and install R and
RStudio and download the data supplied by the textbook
Applied Econometrics. All file downloads are found in the
following links:
- First install the newest version of
R: Windows version, Mac version - Then, install
RStudio(free version): Windows and Mac, - Now, download the data files here
and extract to
dat. Be sure that the files are not in any subfolders indat
RStudio layout
Please refer to the image below to identify the panels in
RStudio
- Top left: This is where your script files are. You will write all your code, run the commands, and save your work
- Top right: You will usually use the environment tab here, which shows all the objects you will create (data files, regression results, values, etc.)
- Bottom left: You will usually use the console tab, which shows all progress, informs you of errors, shows results of the commands you run, etc.
- Bottom right: Can be used for navigation, but we will mostly look at the plots tab to see our visualizations before saving to file
Starting your script
We will do (almost) all coding within script files. Open a new script
file by going to File → New File → R Script. Save the
new script file in the cmd folder. The remainder of the
tutorial will give you instructions on all the code to enter to your new
script.
A few useful tips when coding:
- Any text after a hashtag (
#) will not be run as operational code. This is referred to as ‘commenting-out’ text - Add spaces between the commands along with commented-out descriptions of what you are doing
- Write a header at the top of your script with basic details (project name, author, date, etc.)
- Add a command directly below your header that clears the environment
(information in the top right panel). This ensures that everything will
run correctly if you are running multiple scripts while
Ris open.
Below is an example script header. Copy/paste it into the top of your script file and change the details to fit your project.
###########################################################################
# Project: Introduction to R
# Author: [Your name]
# Date: [Today's date]
###########################################################################Now, add the following line to clear the environment:
Installing and loading libraries
Because it is open-source software, R requires many
user-written packages to run. You will have to install any packages you
have never used before. Since this is your first time using
R, we will install several packages that you will use
throughout the course.
Run the following code to install readxl (for reading
Excel files), along with other essential packages for importing,
organizing, and plotting data. Since you only need to run it once, I
would suggest running it in your console rather than keeping it in the
script file.
What each package does:
- readxl: Imports
.xlsand.xlsxExcel files into R. - tidyverse: A collection of packages for data
science, including:
dplyrfor data manipulation,readrfor reading.csvand.tsvfiles,tidyrfor reshaping data,tibblefor modern data frames,stringrfor string operations,forcatsfor working with factors.
- ggplot2: A powerful system for creating publication-quality graphics.
*Note: if you chose to run the Stata files provided by the textbook,
you will also have to install and load the haven
package.
The packages are now permanently saved to your library. However, you will have to call them every time you start a new session. Place the code below after the line that clears the environment:
Setting the working directory
We will now set the working directory. We do this so R
knows where to find all our data and save results to file. We use the
setwd() command, and you must enter your file path. This
will be wherever you created your R file specified in the
file directory above. Make sure that the directory is in quotation marks
and uses forward slashes / only. The example below shows a
generic filepath example that you can modify to fit the location of your
R folder:
Loading data and creating a plot
Now, we will upload a sample data set and generate a basic plot to
show you some basic functions. When we load a new data set, we will
create an object for it in the environment. For
simplicity, we will just call this object d, and we will
load the arch data set from the dat
folder:
## New names:
## • `` -> `...1`
You can now look at the data structure using the str()
command to get an overview of what the data set contains
## tibble [2,610 × 7] (S3: tbl_df/tbl/data.frame)
## $ ...1 : POSIXct[1:2610], format: "1990-01-01" "1990-01-02" ...
## $ GULF : num [1:2610] 0 0 0 0 0 0 0 0 0 0 ...
## $ POSTWAR : num [1:2610] 0 0 0 0 0 0 0 0 0 0 ...
## $ R_FTSE : num [1:2610] 0.002442 0.000266 -0.007566 0.0073 -0.010836 ...
## $ R_STOCK1: num [1:2610] 0 0.00805 0.00398 0 0.00397 ...
## $ R_STOCK2: num [1:2610] 0 0.01733 0.04317 -0.01897 0.00475 ...
## $ R_STOCK3: num [1:2610] 0 0.00174 0.01216 -0.00694 0.00694 ...
We can see that when the data were loaded, the date column was given
an unclear name (...1). To make the data set easier to
understand, we’ll rename this column to Date. We do this
using the rename() function, and the %>%
pipe operator, which passes the data (d) into the next function. The
pipe is useful for chaining together multiple operations, which will
become especially helpful as we perform more complex data
transformations later in the course.
Now, we will look at the data again to make sure the change was
successful. This time we will use the head() command to
view the first few rows of the data set instead of the overall
structure.
## # A tibble: 6 × 7
## Date GULF POSTWAR R_FTSE R_STOCK1 R_STOCK2 R_STOCK3
## <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1990-01-01 00:00:00 0 0 0.00244 0 0 0
## 2 1990-01-02 00:00:00 0 0 0.000266 0.00805 0.0173 0.00174
## 3 1990-01-03 00:00:00 0 0 -0.00757 0.00398 0.0432 0.0122
## 4 1990-01-04 00:00:00 0 0 0.00730 0 -0.0190 -0.00694
## 5 1990-01-05 00:00:00 0 0 -0.0108 0.00397 0.00475 0.00694
## 6 1990-01-06 00:00:00 0 0 -0.00652 -0.00196 0 0
If you prefer, you can also use the command View(d)
(note the capital V) to open the entire data set in a new
window. This might be cumbersome if it is a very large data set, but it
is nice to keep this window open for a small data set so you can look at
the changes you make throughout the script.
Our data looks good, so now we will make a basic plot to get a
visualization of one of the variables. We will plot the daily returns of
stock over time using the ggplot command, which is a very
nice package to get publishable quality graphics and it relatively easy
to use. The commented out parts of the code below explain what each part
of the function is doing in the plot.
ggplot(d, aes(x = Date, y = R_STOCK1)) + # Specifying the data object d, and the x and y variables
geom_line(color = "steelblue") + # Sets the color of the line
labs(
title = "Daily Returns of Stock 1", # This is the title of the plot
x = "Date", # The label for the x-axis
y = "Return" # The label for the y-axis
) +
theme_minimal() # This keeps the theme basic (minimal)