An Introduction to R

Topics

R Basics

Lesson 1: Getting Started

Objectves

Understand:
- How objects are created and used
Be able to:
- View and clear the global environment
- Use R for simple arithmetic calculations

Objects

Everything in R is an object that can be referred to by name. We create objects by assigning values to them:

# This is a comment ignored by R
Instructor <- 'James Henderson'
x <- 10
y <- 32
z <- c(x,y) #This how we form vectors

9 -> w # This works, but is bad style.
TheAnswer = 42

The values can be referred to by the object name:

TheAnswer

## [1] 42

Objects are stored by value and not reference:

z <- c(x,y)
c(x,y,z)

## [1] 10 32 10 32

y=TheAnswer
c(x,y,z)

## [1] 10 42 10 32

Arithmetic

R can do arithmetic with objects that contain numbers.

x + y

## [1] 52

z / x

## [1] 1.0 3.2

z^2

## [1]  100 1024

z + 2*c(y,x) - 10

## [1] 84 42

Be careful about mixing vectors of different lengths as R will sometimes recycle values:

x <- 4:6
y <- c(0,1)
x*y

## Warning in x * y: longer object length is not a multiple of shorter object
## length

## [1] 0 5 0

x <- 1:4
y*x

## [1] 0 2 0 4

There are a number of common mathematical functions already in R:

mean(x) # average

## [1] 2.5

sum(x)  # summation

## [1] 10

sd(x)   # Standard deviation

## [1] 1.290994

var(x)  # Variance

## [1] 1.666667

exp(x)  # Exponential

## [1]  2.718282  7.389056 20.085537 54.598150

sqrt(x) # Square root

## [1] 1.000000 1.414214 1.732051 2.000000

log(x)  # Natural log

## [1] 0.0000000 0.6931472 1.0986123 1.3862944

Global Environment

The values are stored in a workspace called the global environment. You can view objects in the global environment using the function ‘ls()’ and remove objects using ‘rm()’:

ls()

## [1] "Instructor" "TheAnswer"  "w"          "x"          "y"         
## [6] "z"

rm(w)
ls()

## [1] "Instructor" "TheAnswer"  "x"          "y"          "z"

We can remove multiple objects in a few ways:

remove(Instructor,TheAnswer) # remove and rm are synonyms
ls()

## [1] "x" "y" "z"

rm(list=c('x','y')) # Object names are passed to list as strings
ls()

## [1] "z"

To clear the entire workspace use ‘rm(list=ls())’:

ls()

## [1] "z"

rm(list=ls())
ls()

## character(0)

More on objects

Functions are also objects:

ViewGlobalEnv <- ls
ViewGlobalEnv()

## [1] "ViewGlobalEnv"

Elements of vectors can be given names:

z = c('x'=10,'y'=42)
names(z)

## [1] "x" "y"

names(z) <- c('Xval','Yval'); names(z)

## [1] "Xval" "Yval"

unname(z)

## [1] 10 42

Exercises

Determine whether object names are case sensitive in R.
Ask four people on which day of the month they were born:

Store each value as an object with the person’s name
Concatenate these objects into a vector ‘birthdays’.
Remove the original objects from the global environment.
Type “quit(‘yes’)” at the console and then re-open R Studio. What do you notice?
Remove ‘birthdays’ from the global environment, type “quit(‘no’)”, and re-open R Studio. What do you notice?

Lesson 2: Scripts

R is primarily a scripting language and should rarely be used directly from the console. R can and often should be used interactively, but nearly everything you type should be in a script.

Objectives

Understand:
- The role, importance and utility of scripts
- Standard elements of good coding style
Be able to:
- View and change the working directory
- Create, save, and edit scripts
- Use scripts interactively
- Set and find shortcut keys
- Run an entire script

R Scripts: what and why

Scripts are simply text files containing R commands. I say nearly everything you type should be in a script because:

scripts provide a sharable record of the data manipulation and analysis steps needed to reproduce an analysis
scripts allow you to incrementally build and modify analyses leading to better results
used right, scripts will reduce tedious re-typing or looking through menus.

Best Practices

Here are some best practices for working with scripts:

Use comments! Comments explain what a particular chunk of code does helping you and others to understand your script.
Start each script with a standard header that includes a brief description of its purpose.
Give scripts a descriptive name.
Use multiple scripts for key pieces of complex multipart analyses. Use folders or R projects to keep related scripts together.
Use descriptive variable names rather than x, y, z. Additional typing now will save headaches later and is minimized by text completion in R Studio.
Scripts should be self-contained and always start with a clean workspace.

An example script

cat(readChar('./ExampleScript1.R',nchars=file.size('./ExampleScript1.R')),'\n')

## ## An example script for the Intro to R Workshop ##
## ## Author: James Henderson (jbheder@umich.edu)
## ## Created: June 9, 2017
## ## Modified: June 12, Add a final print command.
## 
## ## prepare your workspace ##
## rm(list=ls())
## 
## ## load any packages you need
## # library(dplyr) 
## 
## ## create some objects ##
## message <- 'Hello World!'
## 
## ## do something ##
## print(message)
## 
## ## save some results

Working Directory

When you ask R to read or write a file without a specified path it defaults to looking in the current working directory. Use ‘getwd()’ and ‘setwd()’ to view and change the current working directory:

getwd()

## [1] "/Users/jbhender/Workshops/Intro_to_R"

startDirectory = getwd()
setwd('/Users/jbhender/Workshops/')
getwd()

## [1] "/Users/jbhender/Workshops"

setwd(startDirectory)
getwd()

## [1] "/Users/jbhender/Workshops/Intro_to_R"

To list the contents in a directory use ‘dir()’:

dir()

## [1] "attitude.csv"            "ExampleScript1.R"       
## [3] "Intro_2_R.html"          "Intro_2_R.Rmd"          
## [5] "message.RData"           "mtcars_displacement.pdf"

dir('./')

## [1] "attitude.csv"            "ExampleScript1.R"       
## [3] "Intro_2_R.html"          "Intro_2_R.Rmd"          
## [5] "message.RData"           "mtcars_displacement.pdf"

When working with scripts it best to make the working directory the highest level folder for a project and use relative paths to point to subfolders.

dir('./')

## [1] "attitude.csv"            "ExampleScript1.R"       
## [3] "Intro_2_R.html"          "Intro_2_R.Rmd"          
## [5] "message.RData"           "mtcars_displacement.pdf"

Running a script

When building an analysis you will often work with scripts interactively, calling each line in turn. At times you will want to run an entire script, which can be done using the ‘source()’ command:

source('./ExampleScript1.R')

## [1] "Hello World!"

Exercises

Change the working directory in R Studio to a folder ‘RWorkshop’ on your desktop.
Create a new script and save it as “Day1_RBasics.R” in the RWorkshop folder.
Add a header with your name, the date, and a short description of the script.
Type ‘print(“Hello world!”)’ in your script and then use ‘source’ to call it from the console.

Lesson 3: Read and Write Data

Objectives:

Understand:
- File types used for data storage
Be able to:
- Read data from and write data to common flat formats: tsv, csv
- Save and load RData objects

Getting data into R

Data can be read into R from common flat file formats such as comma or tab separated text files. The best starting place is ‘read.table()’ or ‘read.csv()’

attitude_data <- read.csv('./attitude.csv',sep=',',
                            stringsAsFactors = FALSE)

To write to csv use ‘write.csv()’.

write.csv(attitude,file='./attitude.csv',
          row.names=FALSE)

Saving and loading R objects

To save data or other objects in the native .RData format using ‘save()’.

message='Hello world!'
save(message,attitude_data,file='./message.RData')

To read such data into R use ‘load()’.

rm(list=ls()) ## clearing workspace
foo <- load('./message.RData')
foo

## [1] "message"       "attitude_data"

ls()

## [1] "attitude_data" "foo"           "message"

message

## [1] "Hello world!"

Reading data form other formats

When possible, it is best to transfer data from other programs into R using the software associated with its native format to first export to a flat file.

Exercises:

Download the cars data and move it to your workshop folder.
Read it into R using ‘read.csv’.
Write a copy of it using write.csv using the file name ‘mtcars_copy.csv’. Open and inspect your copy in a spreadsheet program.
Save cars as an ‘.RData’ file.
Clear your workspace and reload cars from the ‘.RData’ file.

Lesson 4: Classes

Objectives

Understand:
- Commonly used classes in R
- How classes impact the way R treats object
Be able to:
- Create new objects of various classes
- Determine the class of an object
- Convert between common classes
- Access specific elements within objects

Exercises

Use your favorite search engine to find todays high and low temperatures for three cities of your choice.
Create vectors for: the city names, the low temperatures, the high temperatures.
Use the city names as names for the low and high vectors.
Create a data frame for this information.
Subset the data frame to return cities with more than a 15 degree difference between the hight and low temperature.
Store the low and high temperatures in a matrix. Set the row and column names to be decriptive.
Subset the matrix as before.

R is Classy

Named objects in R are associated with one or more classes that tell us how to understand the information they contain. To see the class(es) associated with an object use ‘class()’. Below are some common single-value classes (aka types):

str <- 'This is a string'
class(str)

## [1] "character"

number <- 4.5
class(number)

## [1] "numeric"

int <- 42
class(int)

## [1] "numeric"

int <- as.integer(42)
class(int)

## [1] "integer"

When we don’t specify the class of an object, R is programmed to supply a default type. There are special functions for declaring and converting between classes:

str <- '42'
str

## [1] "42"

class(str)

## [1] "character"

num <- as.numeric(str)
str

## [1] "42"

class(str)

## [1] "character"

int <- as.numeric(num)
class(int)

## [1] "numeric"

is.integer(num)

## [1] FALSE

is.integer(int)

## [1] FALSE

class(is.integer(int))

## [1] "logical"

Multiples values of a single type

Multiple values of a single type can be stored in vectors, matrices, or arrays.

Vectors

Vectors are one dimensional and have a specific ‘length’:

PetNames <- c('Nahla','Oliver')
length(PetNames)

## [1] 2

PetNames <- c(PetNames,'Trixie')
length(PetNames)

## [1] 3

If you try to combine multiple types, R will attempt to convert to a single type:

BirthDays <- c(10,27,29)
c(PetNames,BirthDays)

## [1] "Nahla"  "Oliver" "Trixie" "10"     "27"     "29"

You can a names attribute to vectors:

names(BirthDays) <- PetNames
names(BirthDays)

## [1] "Nahla"  "Oliver" "Trixie"

BirthDays <- c(Nahla=10,Oliver=27,Trixie=29)

Use ‘[]’ to access specific elements by name or position,

BirthDays[3]

## Trixie 
##     29

BirthDays[c(1,2)]

##  Nahla Oliver 
##     10     27

BirthDays[-1] ## Negative indexing

## Oliver Trixie 
##     27     29

BirthDays['Oliver']

## Oliver 
##     27

Matrices

Matrices are two-dimensional vectors organized into rows and columns. They always contain values of a single type.

Matrices are stored using ‘column-major ordering’ meaning that by default they are filled and operated on by column.

X <- matrix(1:10,nrow=5,ncol=2)
Y <- matrix(1:10,nrow=5,ncol=2,byrow = TRUE)
X

##      [,1] [,2]
## [1,]    1    6
## [2,]    2    7
## [3,]    3    8
## [4,]    4    9
## [5,]    5   10

##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
## [3,]    5    6
## [4,]    7    8
## [5,]    9   10

class(X)

## [1] "matrix"

R can do matrix multiplication and many other linear algebra computations.

X %*% t(Y)

##      [,1] [,2] [,3] [,4] [,5]
## [1,]   13   27   41   55   69
## [2,]   16   34   52   70   88
## [3,]   19   41   63   85  107
## [4,]   22   48   74  100  126
## [5,]   25   55   85  115  145

3*X

##      [,1] [,2]
## [1,]    3   18
## [2,]    6   21
## [3,]    9   24
## [4,]   12   27
## [5,]   15   30

c(1,2)*Y

##      [,1] [,2]
## [1,]    1    4
## [2,]    6    4
## [3,]    5   12
## [4,]   14    8
## [5,]    9   20

Matrices have both dimension and length.

dim(X)

## [1] 5 2

length(X)

## [1] 10

as.vector(X)

##  [1]  1  2  3  4  5  6  7  8  9 10

c(nrow(X),ncol(X))

## [1] 5 2

colnames(X) <- paste('Col',1:2,sep='')
rownames(X) <- letters[1:5]
X["a",]

## Col1 Col2 
##    1    6

X[1:3,'Col2']

## a b c 
## 6 7 8

Arrays

See ‘help(arrays)’.

Multiple types

Lists

In R a list is a generic container for storing values of multiple types.

myList <- list(Name='An example list',
               Matrix=diag(5),
               n=5
               )
myList

## $Name
## [1] "An example list"
## 
## $Matrix
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    0    0    0    0
## [2,]    0    1    0    0    0
## [3,]    0    0    1    0    0
## [4,]    0    0    0    1    0
## [5,]    0    0    0    0    1
## 
## $n
## [1] 5

class(myList)

## [1] "list"

length(myList)

## [1] 3

names(myList)

## [1] "Name"   "Matrix" "n"

You can access a specific element in a list by position or name:

myList[['Name']]

## [1] "An example list"

myList$Matrix

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    0    0    0    0
## [2,]    0    1    0    0    0
## [3,]    0    0    1    0    0
## [4,]    0    0    0    1    0
## [5,]    0    0    0    0    1

Note the use of double brackets (’[[‘n’]]) and compare to the single bracket case below.

class(myList['n'])

## [1] "list"

class(myList[['n']])

## [1] "numeric"

Data Frames

Data frame are perhaps the most common way to represent a data set in R. A data frame is like a matrix with observations or units in rows and variables in columns. It doesn’t require the columns to all be of the same type.

df <- data.frame(ID=1:10,
                 Group=
                   sample(0:1,10,replace=TRUE),
                 Var1=rnorm(10),
                 Var2=seq(0,1,length.out=10),
                 Var3=factor(
                   rep(c('a','b'),each=5)
                   )
                )
names(df)

## [1] "ID"    "Group" "Var1"  "Var2"  "Var3"

dim(df)

## [1] 10  5

length(df)

## [1] 5

nrow(df)

## [1] 10

We can access the values of a data frame both like a list:

df$ID

##  [1]  1  2  3  4  5  6  7  8  9 10

df[['Var3']]

##  [1] a a a a a b b b b b
## Levels: a b

or like a matrix

df[1:5,]

##   ID Group       Var1      Var2 Var3
## 1  1     1 -1.8598392 0.0000000    a
## 2  2     1 -1.2936927 0.1111111    a
## 3  3     0 -0.4659862 0.2222222    a
## 4  4     1 -1.6222289 0.3333333    a
## 5  5     1  0.2157492 0.4444444    a

df[,'Var2']

##  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
##  [8] 0.7777778 0.8888889 1.0000000

Logicals & Indexing

Logicals

R has three reserved words of class ‘logical’:

class(TRUE)

## [1] "logical"

class(FALSE)

## [1] "logical"

class(NA)

## [1] "logical"

if(TRUE & T){
  print('Synonyms')
}

## [1] "Synonyms"

if(FALSE | F){
  print('Synonyms')
}

While ‘T’ and ‘F’ are equivalent to ‘TRUE’ and ‘FALSE’ it is best to always use the full words. You should also avoid using ‘T’ or ‘F’ as objects or arguments in functions.

Boolean comparisons

Logicals are created by Boolean comparisons:

{2*3} == 6     # test equality with ==

## [1] TRUE

{2+2} != 5     # use != for 'not equal'

## [1] TRUE

sqrt(69) > 8   # comparison operators: >, >=, <, <=

## [1] TRUE

sqrt(64) >= 8

## [1] TRUE

!{2==3}        # Use not to negate or 'flip' a logical

## [1] TRUE

Comparison operators are vectorized:

1:10 > 5

##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

You can can combine operators using ‘and (&)’ or ‘or (|)’:

{2+2}==4 | {2+2}==5 # An or statement asks if either statement is true

## [1] TRUE

{2+2}==4 & {2+2}==5 # And requires both to be true

## [1] FALSE

if statements

if(TRUE){
  print('do something if true')
}

## [1] "do something if true"

if({2+2}==5){
  print('the statement is true')
} else{
  print('the statement is false')
}

## [1] "the statement is false"

result <- c(4,5)
report = ifelse({2+2}==result,'true','false')
report

## [1] "true"  "false"

Using which

The ‘which()’ function returns the elements of a logical vector that return true:

which({1:5}^2 > 10)

## [1] 4 5

A combination of which and logicals can be used to subset data frames:

head(mtcars)

##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

mtcars[which(mtcars$mpg>30),]

##                 mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Fiat 128       32.4   4 78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4 75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4 71.1  65 4.22 1.835 19.90  1  1    4    1
## Lotus Europa   30.4   4 95.1 113 3.77 1.513 16.90  1  1    5    2

You can use ‘with()’ to refer to variables/columns by name:

ind <-with(mtcars, which(mpg > 20 & cyl >=6))
ind

## [1] 1 2 4

mtcars[ind,c('mpg','cyl')]

##                 mpg cyl
## Mazda RX4      21.0   6
## Mazda RX4 Wag  21.0   6
## Hornet 4 Drive 21.4   6

mtcars[which(mtcars[,'am']!=0),]

##                 mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710     22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
## Ford Pantera L 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
## Ferrari Dino   19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
## Maserati Bora  15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
## Volvo 142E     21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

The ‘with()’ construction will not work with matrices.

rm(ind) # removing ind 
carsMat <- as.matrix(mtcars)
ind <-with(carsMat, which(mpg > 20 & cyl >=6))

## Error in eval(substitute(expr), data, enclos = parent.frame()): numeric 'envir' arg not of length one

carsMat[ind,c('mpg','cyl')]

## Error in eval(expr, envir, enclos): object 'ind' not found

Instead use explicit indexing by name or position.

X <- matrix(rnorm(100),25,4)
ind <- which({X[,1]>0 | X[,2]>0} & {X[,3]<0 | X[,4]<0})
1*{X[ind,] > 0} # convert logicals to numeric

##       [,1] [,2] [,3] [,4]
##  [1,]    0    1    0    1
##  [2,]    1    0    0    0
##  [3,]    1    0    0    0
##  [4,]    1    1    1    0
##  [5,]    1    0    0    0
##  [6,]    1    1    0    0
##  [7,]    1    1    0    0
##  [8,]    0    1    0    1
##  [9,]    1    1    1    0
## [10,]    0    1    0    0
## [11,]    1    1    0    1
## [12,]    1    0    1    0
## [13,]    1    0    1    0
## [14,]    0    1    1    0
## [15,]    1    1    1    0
## [16,]    0    1    0    0

Other classes

There are many other classes of objects in R and many packages define special classes. Here are few other common classes:

class(mean)

## [1] "function"

class(.GlobalEnv)

## [1] "environment"

class(Y~X1+X2)

## [1] "formula"

Lesson 5: Functions

Objectves

Understand:
- How R knows to interpret something as a function
- How arguments passed to functions are interpreted
Be able to:
- Use ‘help()’ to access R documentation for a function
- Write and call a function.

Functions

As we saw earlier, R identifies functions by the ‘func()’ construction. Functions are simply collections of commands that do something. Functions take arguments which can be used to specify which objects to operate on and what values of parameters are used. You can use ‘help(func)’ to see what a function is used for and what arguments it expects, i.e.

help(round)

Functions will often have multiple arguments. Some arguments have default values, others do not. All arguments without default values must be passed to a function. Arguments can be passed by name or position. For instance,

x <- runif(n=5,min=0,max=1)
y <- runif(5,0,1)
z <- runif(5)
round(cbind(x,y,z),1)

##        x   y   z
## [1,] 0.7 0.2 1.0
## [2,] 0.2 0.6 0.1
## [3,] 0.2 0.8 0.2
## [4,] 0.2 0.8 0.5
## [5,] 0.0 0.7 0.2

both generate 5 numbers from a Uniform(0,1) distribution.

Arguments passed by name need not be in order:

w <- runif(min=0,max=1,n=5)
u <- runif(min=0,max=1,5) # This also works but is bad style. 
round(rbind(u=u,w=w),1)

##   [,1] [,2] [,3] [,4] [,5]
## u  0.9  0.3  0.2  0.2  0.8
## w  0.2  0.9  0.3  0.6  0.2

Writing Functions

You can create your own functions in R. Use functions for tasks that you repeat often in order to make your scripts more easily readable and modifiable.

# function to compute z-scores
zScore1 <- function(x){
  xbar <- mean(x)
  s <- sd(x)
  z <- (x-mean(x))/s
  return(z)  
}

The return statement is not strictly necessary, but can make complex functions more readable. It is good practice to avoid creating intermediate objects to store values only used once.

# function to compute z-scores
zScore2 <- function(x){
  {x-mean(x)}/sd(x)
}

x <- rnorm(10,3,1) ## generate some normally distributed values
round(cbind(x,'Z1'=zScore1(x),'Z2'=zScore2(x)),1)

##         x   Z1   Z2
##  [1,] 3.6  1.2  1.2
##  [2,] 3.5  1.1  1.1
##  [3,] 2.3 -1.0 -1.0
##  [4,] 3.5  1.2  1.2
##  [5,] 3.1  0.5  0.5
##  [6,] 1.9 -1.6 -1.6
##  [7,] 2.3 -0.8 -0.8
##  [8,] 2.5 -0.6 -0.6
##  [9,] 2.9  0.1  0.1
## [10,] 2.8  0.0  0.0

We can set default values for parameters using the construction ‘parameter = xx’ in the function definition.

# function to compute z-scores
zScore3 <- function(x,na.rm=T){
  {x-mean(x,na.rm=na.rm)}/sd(x,na.rm=na.rm)
}

x <- c(NA,x,NA)
round(cbind(x,'Z1'=zScore1(x),'Z2'=zScore2(x),'Z3'=zScore3(x)),1)

##         x Z1 Z2   Z3
##  [1,]  NA NA NA   NA
##  [2,] 3.6 NA NA  1.2
##  [3,] 3.5 NA NA  1.1
##  [4,] 2.3 NA NA -1.0
##  [5,] 3.5 NA NA  1.2
##  [6,] 3.1 NA NA  0.5
##  [7,] 1.9 NA NA -1.6
##  [8,] 2.3 NA NA -0.8
##  [9,] 2.5 NA NA -0.6
## [10,] 2.9 NA NA  0.1
## [11,] 2.8 NA NA  0.0
## [12,]  NA NA NA   NA

Exercises

Access and skim the help pages for ‘median()’, ‘mad()’, and ‘IQR’.
Write a function ‘zScoreRobust’ that accepts a numeric vector and returns robust z-scores.
Make the function you wrote robust to vectors containing “NA” values
Generate some data from N(4,2) to test your functions.

Packages

Objectives:

Understand:
- Basics of the R package system
- What it means for a function to be ‘masked’
Be able to:
- Install packages
- Make a package available to R
- Call functions from packages without loading
- Remove packages

The R package system

Much of the utility of R is derived from an extensive collection of user and domain-expert contributed packages. Packages are simply a standardized way for people to share documented code and data. There are thousands of packages!

Packages are primarily distributed through three sources: + CRAN + Bioconductor + Github

Installing packages

The primary way to install a package is using ‘install.packages(“pkg”)’.

#install.packages('lme4') # the package name should be passed as a character string

You can find the default location for your R packages using the “.libPaths()” function. If you don’t have write permission to this folder, you can set this directory to a personal library instead.

.libPaths() ## The default library location

## [1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library"

.libPaths('/Users/jbhender/Rlib') #Create the directory first!
.libPaths()

## [1] "/Users/jbhender/Rlib"                                          
## [2] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library"

To install a package to a personal library use the ‘lib’ option.

## install.packages("haven",lib='/Users/jbhender/Rlib')

If your computer has the necessary tools, packages can also be installed from source by downloading the package file and pointing directly to the source tar ball (‘.tgz’) or Windows binary.

Using packages in R

Installing a package does not make it available to R! There are two ways to use things from a package: + calling ‘library(“pkg”)’ to add it to the search path + using the “pkg::function” construction.

These methods are illustrated below using the data set ‘InstEval’ distributed with the ‘lme4’ package.

#head(InstEval)
## Using the pkg::function construction
head(lme4::InstEval)

##   s    d studage lectage service dept y
## 1 1 1002       2       2       0    2 5
## 2 1 1050       2       1       1    6 2
## 3 1 1582       2       2       0    2 5
## 4 1 2050       2       2       1    3 3
## 5 2  115       2       1       0    5 2
## 6 2  756       2       1       0    5 4

The ‘library(“pkg”)’ command adds a package to the search path.

search()

## [1] ".GlobalEnv"        "package:stats"     "package:graphics" 
## [4] "package:grDevices" "package:utils"     "package:datasets" 
## [7] "package:methods"   "Autoloads"         "package:base"

library(lme4)

## Loading required package: Matrix

search()

##  [1] ".GlobalEnv"        "package:lme4"      "package:Matrix"   
##  [4] "package:stats"     "package:graphics"  "package:grDevices"
##  [7] "package:utils"     "package:datasets"  "package:methods"  
## [10] "Autoloads"         "package:base"

head(InstEval)

##   s    d studage lectage service dept y
## 1 1 1002       2       2       0    2 5
## 2 1 1050       2       1       1    6 2
## 3 1 1582       2       2       0    2 5
## 4 1 2050       2       2       1    3 3
## 5 2  115       2       1       0    5 2
## 6 2  756       2       1       0    5 4

To remove a library from the search path use ‘detach(“package:pkg”,unload=TRUE)’.

detach(package:lme4,unload=TRUE)
search()

##  [1] ".GlobalEnv"        "package:Matrix"    "package:stats"    
##  [4] "package:graphics"  "package:grDevices" "package:utils"    
##  [7] "package:datasets"  "package:methods"   "Autoloads"        
## [10] "package:base"

Vignettes

As part of their documentation, many packages come with a “vignette” which servers as a short tour of a packages purpose and functionality.

Exercises

Detach the ‘datasets’ package using ‘search()’ to check your success.
Reload ‘datasets’ using ‘library()’ and again check with ‘search()’.
Install the following packages:
- haven (For reading data from other sources)
- lme4 (For linear mixed models)s
- ggplot2 (The “grammar of graphics”)
- dplyr (For data manipulation)
- tidyr (Utility functions for working with R objects)

Graphics

Objectives:

Understand:
- The role of plotting ‘devices’ and how R handles images
Be able to:
- Create standard statistical graphics using base R
- Save graphical output to a file

R has standard functions for computing many statistical graphics.

Scatterplots

plot(mtcars$hp~mtcars$disp)

There are many aesthetic options you can control; see ‘par()’ for a full list.

with(mtcars,
     plot(hp~disp,pch=15,main='Horsepower in mtcars',xlab='displacement',ylab='horsepower',las=1,col='grey')
     )

Use vector to set values for specific points.

col <- rep('blue',nrow(mtcars))
col[which(mtcars$cyl==6)] <- 'grey'
col[which(mtcars$cyl==8)] <- 'red'

pch <- rep(16,nrow(mtcars))
pch[which(mtcars$am==1)] <- 17

with(mtcars,
     plot(hp~disp,pch=pch,col=col,main='Horsepower in mtcars',xlab='displacement',ylab='horsepower',las=1)
     )
legend('topleft',legend=c('Automatic','Manual'),pch=16:17,col='black',bty='n')
legend('bottomright',legend=paste(c(4,6,8),'cylinders'),col=c('blue','grey','red'),pch=15)

Other standard plots

hist(mtcars[,'hp'],col='lightblue',las=1,xlab='horsepower',main='Histogram of horsepower')

boxplot(mtcars[,'hp']~mtcars[,'cyl'],las=1,xlab='# of cylinders',ylab='horsepower',col=rgb(0,0,1,.5))

qqnorm(mtcars[,'hp'])
qqline(mtcars[,'hp'])

Writing plots to file

By default, plot commands are sent to the default Rstudio graphics window. However, you can print graphics directly to file using: pdf(), jpeg(), png(), etc.

pdf('./mtcars_displacement.pdf') #opens the file
  hist(mtcars$disp)
dev.off() ## closes the file

## quartz_off_screen 
##                 2

dir('./')

## [1] "attitude.csv"            "ExampleScript1.R"       
## [3] "Intro_2_R_files"         "Intro_2_R.html"         
## [5] "Intro_2_R.Rmd"           "message.RData"          
## [7] "mtcars_displacement.pdf"

I recommend using pdf as default as it is a vector based format.

Exercises:

Using the mtcars data:

create a histogram of the ‘wt’ variable
create a qqplot of the ‘mpg’ variable
create side-by-side boxplots of mpg grouped by ‘cyl’

Still using mtcars, create a scatter plot of mpg vs wt.

Use color and plotting symbol to also include information about cyl and gear in the plot
Create descriptive names for the axes and title
Add a legend explaining your plot symbols

Write your scatter plot to pdf.

Additonal Topics

Control Statments

for loops

Here is the syntax for a basic for loop in R

for(i in 1:10){
   cat(i,'\n')
}

## 1 
## 2 
## 3 
## 4 
## 5 
## 6 
## 7 
## 8 
## 9 
## 10

for(var in names(mtcars)){
  cat(sprintf('average %s = %4.3f',var,mean(mtcars[,var])),'\n')
}

## average mpg = 20.091 
## average cyl = 6.188 
## average disp = 230.722 
## average hp = 146.688 
## average drat = 3.597 
## average wt = 3.217 
## average qsec = 17.849 
## average vs = 0.438 
## average am = 0.406 
## average gear = 3.688 
## average carb = 2.812

while

A while statement can be useful when you aren’t sure how many iterations are needed. Here is an that takes a random walk and terminates if the value is more than 10 units from 0.

maxIter <- 1e3 # always limit the total iterations allowed
val=vector(mode='numeric',length=maxIter)
val[1]=rnorm(1) ## intialize
k=1
while(abs(val[k]) < 10 & k <= maxIter){
  val[k+1] = val[k] + rnorm(1)
  k = k + 1
}
val = val[1:{k-1}]
plot(val)

switch

Use a switch when you have two or more discrete options.

mySummary <- function(x){
  switch(class(x),
         factor=table(x),
         numeric=sprintf('mean=%4.2f,sd=%4.2f',mean(x),sd(x)),
          'Only defined for factor and numeric classes.')
}
for(var in names(iris)){
  cat(var,':\n',sep='')
  print(mySummary(iris[,var]))
}

## Sepal.Length:
## [1] "mean=5.84,sd=0.83"
## Sepal.Width:
## [1] "mean=3.06,sd=0.44"
## Petal.Length:
## [1] "mean=3.76,sd=1.77"
## Petal.Width:
## [1] "mean=1.20,sd=0.76"
## Species:
## x
##     setosa versicolor  virginica 
##         50         50         50

Exercises

The Fibonacci sequence starts 1, 1, 2, … and continues with each new value formed by adding the two previous values.

Write a function ‘Fib1’ which takes an argument ‘n’ and returns the \(n^{th}\) value of the Fibonacci sequence. Use a for loop in the function.
Write a function ‘Fib2’ which does the same thing using a while loop.
Use a switch to write a function that has a parameter ‘loop=c(’for’,‘while’)‘for calling either ’Fib1’ or ‘Fib2’.

Apply

Loops in R can be quite slow compared to other programming language on account of the overhead of many of the conveniences that make it useful for routine data analysis. Often, explicit loops can be avoided by using an ‘apply’ function.

Here is an example:

X = matrix(rep(1:5,each=5),5,5)
apply(X,1,sum)

## [1] 15 15 15 15 15

apply(X,2,sum)

## [1]  5 10 15 20 25

For lists use ‘lapply()’ or ‘sapply()’.

myList=list(x=1:5,y=-5:-1)
lapply(myList,sum)

## $x
## [1] 15
## 
## $y
## [1] -15

sapply(myList,sum)

##   x   y 
##  15 -15

The values in a data.frame are represented internally as a list, so use lapply with data frames.

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

sapply(iris,class)

## Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
##    "numeric"    "numeric"    "numeric"    "numeric"     "factor"

apply(iris,2,class)

## Sepal.Length  Sepal.Width Petal.Length  Petal.Width      Species 
##  "character"  "character"  "character"  "character"  "character"

A very powerful construction for data manipulation is the use of apply with an implicit function.

sapply(mtcars,function(x){
  nVals = length(unique(x))
  return(nVals)
})

##  mpg  cyl disp   hp drat   wt qsec   vs   am gear carb 
##   25    3   27   22   22   29   30    2    2    3    6

Exercises

Use apply to get the class of each variable in the ‘mtcars’ data set.
Use apply to find the row means and column means of the ‘attitude’ data.

An Introduction to R

James Henderson, PhD / CSCAR Data Science Consultant

June 12, 2017

Format

Topics

R Basics

Lesson 1: Getting Started

Objectves

Objects

Arithmetic

Global Environment

More on objects

Exercises

Lesson 2: Scripts

Objectives

R Scripts: what and why

Best Practices

An example script

Working Directory

Running a script

Exercises

Lesson 3: Read and Write Data

Getting data into R

Saving and loading R objects

Reading data form other formats

Exercises:

Lesson 4: Classes

Objectives

Exercises

R is Classy

Multiples values of a single type

Vectors

Matrices

Arrays

Multiple types

Lists

Data Frames

Logicals & Indexing

Logicals

Boolean comparisons

if statements

Using which

Other classes

Lesson 5: Functions

Objectves

Functions

Writing Functions

Exercises

Packages

Objectives:

The R package system

Installing packages

Using packages in R

Vignettes

Exercises

Graphics

Objectives:

Scatterplots

Other standard plots

Writing plots to file

Exercises:

Additonal Topics

Control Statments

for loops

while

switch

Exercises

Apply

Exercises