This workshop is structured as a number of short thematic lessons. Each lesson includes a brief introduction to a topic and is followed by some simple exercises. My goal is to spend 15-20 minutes on each lesson including time for you to try the exercise and for us to review if necessary.
Everything in R is an object that can be referred to by name. We create objects by assigning values to them:
# This is a comment ignored by R
Instructor <- 'James Henderson'
x <- 10
y <- 32
z <- c(x,y) #This how we form vectors
9 -> w # This works, but is bad style.
TheAnswer = 42
The values can be referred to by the object name:
TheAnswer
## [1] 42
Objects are stored by value and not reference:
z <- c(x,y)
c(x,y,z)
## [1] 10 32 10 32
y=TheAnswer
c(x,y,z)
## [1] 10 42 10 32
R can do arithmetic with objects that contain numbers.
x + y
## [1] 52
z / x
## [1] 1.0 3.2
z^2
## [1] 100 1024
z + 2*c(y,x) - 10
## [1] 84 42
Be careful about mixing vectors of different lengths as R will sometimes recycle values:
x <- 4:6
y <- c(0,1)
x*y
## Warning in x * y: longer object length is not a multiple of shorter object
## length
## [1] 0 5 0
x <- 1:4
y*x
## [1] 0 2 0 4
There are a number of common mathematical functions already in R:
mean(x) # average
## [1] 2.5
sum(x) # summation
## [1] 10
sd(x) # Standard deviation
## [1] 1.290994
var(x) # Variance
## [1] 1.666667
exp(x) # Exponential
## [1] 2.718282 7.389056 20.085537 54.598150
sqrt(x) # Square root
## [1] 1.000000 1.414214 1.732051 2.000000
log(x) # Natural log
## [1] 0.0000000 0.6931472 1.0986123 1.3862944
The values are stored in a workspace called the global environment. You can view objects in the global environment using the function ‘ls()’ and remove objects using ‘rm()’:
ls()
## [1] "Instructor" "TheAnswer" "w" "x" "y"
## [6] "z"
rm(w)
ls()
## [1] "Instructor" "TheAnswer" "x" "y" "z"
We can remove multiple objects in a few ways:
remove(Instructor,TheAnswer) # remove and rm are synonyms
ls()
## [1] "x" "y" "z"
rm(list=c('x','y')) # Object names are passed to list as strings
ls()
## [1] "z"
To clear the entire workspace use ‘rm(list=ls())’:
ls()
## [1] "z"
rm(list=ls())
ls()
## character(0)
Functions are also objects:
ViewGlobalEnv <- ls
ViewGlobalEnv()
## [1] "ViewGlobalEnv"
Elements of vectors can be given names:
z = c('x'=10,'y'=42)
names(z)
## [1] "x" "y"
names(z) <- c('Xval','Yval'); names(z)
## [1] "Xval" "Yval"
unname(z)
## [1] 10 42
Determine whether object names are case sensitive in R.
R is primarily a scripting language and should rarely be used directly from the console. R can and often should be used interactively, but nearly everything you type should be in a script.
Scripts are simply text files containing R commands. I say nearly everything you type should be in a script because:
Here are some best practices for working with scripts:
cat(readChar('./ExampleScript1.R',nchars=file.size('./ExampleScript1.R')),'\n')
## ## An example script for the Intro to R Workshop ##
## ## Author: James Henderson (jbheder@umich.edu)
## ## Created: June 9, 2017
## ## Modified: June 12, Add a final print command.
##
## ## prepare your workspace ##
## rm(list=ls())
##
## ## load any packages you need
## # library(dplyr)
##
## ## create some objects ##
## message <- 'Hello World!'
##
## ## do something ##
## print(message)
##
## ## save some results
When you ask R to read or write a file without a specified path it defaults to looking in the current working directory. Use ‘getwd()’ and ‘setwd()’ to view and change the current working directory:
getwd()
## [1] "/Users/jbhender/Workshops/Intro_to_R"
startDirectory = getwd()
setwd('/Users/jbhender/Workshops/')
getwd()
## [1] "/Users/jbhender/Workshops"
setwd(startDirectory)
getwd()
## [1] "/Users/jbhender/Workshops/Intro_to_R"
To list the contents in a directory use ‘dir()’:
dir()
## [1] "attitude.csv" "ExampleScript1.R"
## [3] "Intro_2_R.html" "Intro_2_R.Rmd"
## [5] "message.RData" "mtcars_displacement.pdf"
dir('./')
## [1] "attitude.csv" "ExampleScript1.R"
## [3] "Intro_2_R.html" "Intro_2_R.Rmd"
## [5] "message.RData" "mtcars_displacement.pdf"
When working with scripts it best to make the working directory the highest level folder for a project and use relative paths to point to subfolders.
dir('./')
## [1] "attitude.csv" "ExampleScript1.R"
## [3] "Intro_2_R.html" "Intro_2_R.Rmd"
## [5] "message.RData" "mtcars_displacement.pdf"
When building an analysis you will often work with scripts interactively, calling each line in turn. At times you will want to run an entire script, which can be done using the ‘source()’ command:
source('./ExampleScript1.R')
## [1] "Hello World!"
Objectives:
Data can be read into R from common flat file formats such as comma or tab separated text files. The best starting place is ‘read.table()’ or ‘read.csv()’
attitude_data <- read.csv('./attitude.csv',sep=',',
stringsAsFactors = FALSE)
To write to csv use ‘write.csv()’.
write.csv(attitude,file='./attitude.csv',
row.names=FALSE)
To save data or other objects in the native .RData format using ‘save()’.
message='Hello world!'
save(message,attitude_data,file='./message.RData')
To read such data into R use ‘load()’.
rm(list=ls()) ## clearing workspace
foo <- load('./message.RData')
foo
## [1] "message" "attitude_data"
ls()
## [1] "attitude_data" "foo" "message"
message
## [1] "Hello world!"
When possible, it is best to transfer data from other programs into R using the software associated with its native format to first export to a flat file.
Write a copy of it using write.csv using the file name ‘mtcars_copy.csv’. Open and inspect your copy in a spreadsheet program.
Clear your workspace and reload cars from the ‘.RData’ file.
Named objects in R are associated with one or more classes that tell us how to understand the information they contain. To see the class(es) associated with an object use ‘class()’. Below are some common single-value classes (aka types):
str <- 'This is a string'
class(str)
## [1] "character"
number <- 4.5
class(number)
## [1] "numeric"
int <- 42
class(int)
## [1] "numeric"
int <- as.integer(42)
class(int)
## [1] "integer"
When we don’t specify the class of an object, R is programmed to supply a default type. There are special functions for declaring and converting between classes:
str <- '42'
str
## [1] "42"
class(str)
## [1] "character"
num <- as.numeric(str)
str
## [1] "42"
class(str)
## [1] "character"
int <- as.numeric(num)
class(int)
## [1] "numeric"
is.integer(num)
## [1] FALSE
is.integer(int)
## [1] FALSE
class(is.integer(int))
## [1] "logical"
Multiple values of a single type can be stored in vectors, matrices, or arrays.
Vectors are one dimensional and have a specific ‘length’:
PetNames <- c('Nahla','Oliver')
length(PetNames)
## [1] 2
PetNames <- c(PetNames,'Trixie')
length(PetNames)
## [1] 3
If you try to combine multiple types, R will attempt to convert to a single type:
BirthDays <- c(10,27,29)
c(PetNames,BirthDays)
## [1] "Nahla" "Oliver" "Trixie" "10" "27" "29"
You can a names attribute to vectors:
names(BirthDays) <- PetNames
names(BirthDays)
## [1] "Nahla" "Oliver" "Trixie"
BirthDays <- c(Nahla=10,Oliver=27,Trixie=29)
Use ‘[]’ to access specific elements by name or position,
BirthDays[3]
## Trixie
## 29
BirthDays[c(1,2)]
## Nahla Oliver
## 10 27
BirthDays[-1] ## Negative indexing
## Oliver Trixie
## 27 29
BirthDays['Oliver']
## Oliver
## 27
Matrices are two-dimensional vectors organized into rows and columns. They always contain values of a single type.
Matrices are stored using ‘column-major ordering’ meaning that by default they are filled and operated on by column.
X <- matrix(1:10,nrow=5,ncol=2)
Y <- matrix(1:10,nrow=5,ncol=2,byrow = TRUE)
X
## [,1] [,2]
## [1,] 1 6
## [2,] 2 7
## [3,] 3 8
## [4,] 4 9
## [5,] 5 10
Y
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
## [4,] 7 8
## [5,] 9 10
class(X)
## [1] "matrix"
R can do matrix multiplication and many other linear algebra computations.
X %*% t(Y)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 13 27 41 55 69
## [2,] 16 34 52 70 88
## [3,] 19 41 63 85 107
## [4,] 22 48 74 100 126
## [5,] 25 55 85 115 145
3*X
## [,1] [,2]
## [1,] 3 18
## [2,] 6 21
## [3,] 9 24
## [4,] 12 27
## [5,] 15 30
c(1,2)*Y
## [,1] [,2]
## [1,] 1 4
## [2,] 6 4
## [3,] 5 12
## [4,] 14 8
## [5,] 9 20
Matrices have both dimension and length.
dim(X)
## [1] 5 2
length(X)
## [1] 10
as.vector(X)
## [1] 1 2 3 4 5 6 7 8 9 10
c(nrow(X),ncol(X))
## [1] 5 2
colnames(X) <- paste('Col',1:2,sep='')
rownames(X) <- letters[1:5]
X["a",]
## Col1 Col2
## 1 6
X[1:3,'Col2']
## a b c
## 6 7 8
See ‘help(arrays)’.
In R a list is a generic container for storing values of multiple types.
myList <- list(Name='An example list',
Matrix=diag(5),
n=5
)
myList
## $Name
## [1] "An example list"
##
## $Matrix
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 0 0 0 0
## [2,] 0 1 0 0 0
## [3,] 0 0 1 0 0
## [4,] 0 0 0 1 0
## [5,] 0 0 0 0 1
##
## $n
## [1] 5
class(myList)
## [1] "list"
length(myList)
## [1] 3
names(myList)
## [1] "Name" "Matrix" "n"
You can access a specific element in a list by position or name:
myList[['Name']]
## [1] "An example list"
myList$Matrix
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 0 0 0 0
## [2,] 0 1 0 0 0
## [3,] 0 0 1 0 0
## [4,] 0 0 0 1 0
## [5,] 0 0 0 0 1
Note the use of double brackets (’[[‘n’]]) and compare to the single bracket case below.
class(myList['n'])
## [1] "list"
class(myList[['n']])
## [1] "numeric"
Data frame are perhaps the most common way to represent a data set in R. A data frame is like a matrix with observations or units in rows and variables in columns. It doesn’t require the columns to all be of the same type.
df <- data.frame(ID=1:10,
Group=
sample(0:1,10,replace=TRUE),
Var1=rnorm(10),
Var2=seq(0,1,length.out=10),
Var3=factor(
rep(c('a','b'),each=5)
)
)
names(df)
## [1] "ID" "Group" "Var1" "Var2" "Var3"
dim(df)
## [1] 10 5
length(df)
## [1] 5
nrow(df)
## [1] 10
We can access the values of a data frame both like a list:
df$ID
## [1] 1 2 3 4 5 6 7 8 9 10
df[['Var3']]
## [1] a a a a a b b b b b
## Levels: a b
or like a matrix
df[1:5,]
## ID Group Var1 Var2 Var3
## 1 1 1 -1.8598392 0.0000000 a
## 2 2 1 -1.2936927 0.1111111 a
## 3 3 0 -0.4659862 0.2222222 a
## 4 4 1 -1.6222289 0.3333333 a
## 5 5 1 0.2157492 0.4444444 a
df[,'Var2']
## [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
## [8] 0.7777778 0.8888889 1.0000000
R has three reserved words of class ‘logical’:
class(TRUE)
## [1] "logical"
class(FALSE)
## [1] "logical"
class(NA)
## [1] "logical"
if(TRUE & T){
print('Synonyms')
}
## [1] "Synonyms"
if(FALSE | F){
print('Synonyms')
}
While ‘T’ and ‘F’ are equivalent to ‘TRUE’ and ‘FALSE’ it is best to always use the full words. You should also avoid using ‘T’ or ‘F’ as objects or arguments in functions.
Logicals are created by Boolean comparisons:
{2*3} == 6 # test equality with ==
## [1] TRUE
{2+2} != 5 # use != for 'not equal'
## [1] TRUE
sqrt(69) > 8 # comparison operators: >, >=, <, <=
## [1] TRUE
sqrt(64) >= 8
## [1] TRUE
!{2==3} # Use not to negate or 'flip' a logical
## [1] TRUE
Comparison operators are vectorized:
1:10 > 5
## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
You can can combine operators using ‘and (&)’ or ‘or (|)’:
{2+2}==4 | {2+2}==5 # An or statement asks if either statement is true
## [1] TRUE
{2+2}==4 & {2+2}==5 # And requires both to be true
## [1] FALSE
if(TRUE){
print('do something if true')
}
## [1] "do something if true"
if({2+2}==5){
print('the statement is true')
} else{
print('the statement is false')
}
## [1] "the statement is false"
result <- c(4,5)
report = ifelse({2+2}==result,'true','false')
report
## [1] "true" "false"
The ‘which()’ function returns the elements of a logical vector that return true:
which({1:5}^2 > 10)
## [1] 4 5
A combination of which and logicals can be used to subset data frames:
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars[which(mtcars$mpg>30),]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
You can use ‘with()’ to refer to variables/columns by name:
ind <-with(mtcars, which(mpg > 20 & cyl >=6))
ind
## [1] 1 2 4
mtcars[ind,c('mpg','cyl')]
## mpg cyl
## Mazda RX4 21.0 6
## Mazda RX4 Wag 21.0 6
## Hornet 4 Drive 21.4 6
mtcars[which(mtcars[,'am']!=0),]
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
The ‘with()’ construction will not work with matrices.
rm(ind) # removing ind
carsMat <- as.matrix(mtcars)
ind <-with(carsMat, which(mpg > 20 & cyl >=6))
## Error in eval(substitute(expr), data, enclos = parent.frame()): numeric 'envir' arg not of length one
carsMat[ind,c('mpg','cyl')]
## Error in eval(expr, envir, enclos): object 'ind' not found
Instead use explicit indexing by name or position.
X <- matrix(rnorm(100),25,4)
ind <- which({X[,1]>0 | X[,2]>0} & {X[,3]<0 | X[,4]<0})
1*{X[ind,] > 0} # convert logicals to numeric
## [,1] [,2] [,3] [,4]
## [1,] 0 1 0 1
## [2,] 1 0 0 0
## [3,] 1 0 0 0
## [4,] 1 1 1 0
## [5,] 1 0 0 0
## [6,] 1 1 0 0
## [7,] 1 1 0 0
## [8,] 0 1 0 1
## [9,] 1 1 1 0
## [10,] 0 1 0 0
## [11,] 1 1 0 1
## [12,] 1 0 1 0
## [13,] 1 0 1 0
## [14,] 0 1 1 0
## [15,] 1 1 1 0
## [16,] 0 1 0 0
There are many other classes of objects in R and many packages define special classes. Here are few other common classes:
class(mean)
## [1] "function"
class(.GlobalEnv)
## [1] "environment"
class(Y~X1+X2)
## [1] "formula"
As we saw earlier, R identifies functions by the ‘func()’ construction. Functions are simply collections of commands that do something. Functions take arguments which can be used to specify which objects to operate on and what values of parameters are used. You can use ‘help(func)’ to see what a function is used for and what arguments it expects, i.e.
help(round)
Functions will often have multiple arguments. Some arguments have default values, others do not. All arguments without default values must be passed to a function. Arguments can be passed by name or position. For instance,
x <- runif(n=5,min=0,max=1)
y <- runif(5,0,1)
z <- runif(5)
round(cbind(x,y,z),1)
## x y z
## [1,] 0.7 0.2 1.0
## [2,] 0.2 0.6 0.1
## [3,] 0.2 0.8 0.2
## [4,] 0.2 0.8 0.5
## [5,] 0.0 0.7 0.2
both generate 5 numbers from a Uniform(0,1) distribution.
Arguments passed by name need not be in order:
w <- runif(min=0,max=1,n=5)
u <- runif(min=0,max=1,5) # This also works but is bad style.
round(rbind(u=u,w=w),1)
## [,1] [,2] [,3] [,4] [,5]
## u 0.9 0.3 0.2 0.2 0.8
## w 0.2 0.9 0.3 0.6 0.2
You can create your own functions in R. Use functions for tasks that you repeat often in order to make your scripts more easily readable and modifiable.
# function to compute z-scores
zScore1 <- function(x){
xbar <- mean(x)
s <- sd(x)
z <- (x-mean(x))/s
return(z)
}
The return statement is not strictly necessary, but can make complex functions more readable. It is good practice to avoid creating intermediate objects to store values only used once.
# function to compute z-scores
zScore2 <- function(x){
{x-mean(x)}/sd(x)
}
x <- rnorm(10,3,1) ## generate some normally distributed values
round(cbind(x,'Z1'=zScore1(x),'Z2'=zScore2(x)),1)
## x Z1 Z2
## [1,] 3.6 1.2 1.2
## [2,] 3.5 1.1 1.1
## [3,] 2.3 -1.0 -1.0
## [4,] 3.5 1.2 1.2
## [5,] 3.1 0.5 0.5
## [6,] 1.9 -1.6 -1.6
## [7,] 2.3 -0.8 -0.8
## [8,] 2.5 -0.6 -0.6
## [9,] 2.9 0.1 0.1
## [10,] 2.8 0.0 0.0
We can set default values for parameters using the construction ‘parameter = xx’ in the function definition.
# function to compute z-scores
zScore3 <- function(x,na.rm=T){
{x-mean(x,na.rm=na.rm)}/sd(x,na.rm=na.rm)
}
x <- c(NA,x,NA)
round(cbind(x,'Z1'=zScore1(x),'Z2'=zScore2(x),'Z3'=zScore3(x)),1)
## x Z1 Z2 Z3
## [1,] NA NA NA NA
## [2,] 3.6 NA NA 1.2
## [3,] 3.5 NA NA 1.1
## [4,] 2.3 NA NA -1.0
## [5,] 3.5 NA NA 1.2
## [6,] 3.1 NA NA 0.5
## [7,] 1.9 NA NA -1.6
## [8,] 2.3 NA NA -0.8
## [9,] 2.5 NA NA -0.6
## [10,] 2.9 NA NA 0.1
## [11,] 2.8 NA NA 0.0
## [12,] NA NA NA NA
Much of the utility of R is derived from an extensive collection of user and domain-expert contributed packages. Packages are simply a standardized way for people to share documented code and data. There are thousands of packages!
Packages are primarily distributed through three sources: + CRAN + Bioconductor + Github
The primary way to install a package is using ‘install.packages(“pkg”)’.
#install.packages('lme4') # the package name should be passed as a character string
You can find the default location for your R packages using the “.libPaths()” function. If you don’t have write permission to this folder, you can set this directory to a personal library instead.
.libPaths() ## The default library location
## [1] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library"
.libPaths('/Users/jbhender/Rlib') #Create the directory first!
.libPaths()
## [1] "/Users/jbhender/Rlib"
## [2] "/Library/Frameworks/R.framework/Versions/3.3/Resources/library"
To install a package to a personal library use the ‘lib’ option.
## install.packages("haven",lib='/Users/jbhender/Rlib')
If your computer has the necessary tools, packages can also be installed from source by downloading the package file and pointing directly to the source tar ball (‘.tgz’) or Windows binary.
Installing a package does not make it available to R! There are two ways to use things from a package: + calling ‘library(“pkg”)’ to add it to the search path + using the “pkg::function” construction.
These methods are illustrated below using the data set ‘InstEval’ distributed with the ‘lme4’ package.
#head(InstEval)
## Using the pkg::function construction
head(lme4::InstEval)
## s d studage lectage service dept y
## 1 1 1002 2 2 0 2 5
## 2 1 1050 2 1 1 6 2
## 3 1 1582 2 2 0 2 5
## 4 1 2050 2 2 1 3 3
## 5 2 115 2 1 0 5 2
## 6 2 756 2 1 0 5 4
The ‘library(“pkg”)’ command adds a package to the search path.
search()
## [1] ".GlobalEnv" "package:stats" "package:graphics"
## [4] "package:grDevices" "package:utils" "package:datasets"
## [7] "package:methods" "Autoloads" "package:base"
library(lme4)
## Loading required package: Matrix
search()
## [1] ".GlobalEnv" "package:lme4" "package:Matrix"
## [4] "package:stats" "package:graphics" "package:grDevices"
## [7] "package:utils" "package:datasets" "package:methods"
## [10] "Autoloads" "package:base"
head(InstEval)
## s d studage lectage service dept y
## 1 1 1002 2 2 0 2 5
## 2 1 1050 2 1 1 6 2
## 3 1 1582 2 2 0 2 5
## 4 1 2050 2 2 1 3 3
## 5 2 115 2 1 0 5 2
## 6 2 756 2 1 0 5 4
To remove a library from the search path use ‘detach(“package:pkg”,unload=TRUE)’.
detach(package:lme4,unload=TRUE)
search()
## [1] ".GlobalEnv" "package:Matrix" "package:stats"
## [4] "package:graphics" "package:grDevices" "package:utils"
## [7] "package:datasets" "package:methods" "Autoloads"
## [10] "package:base"
As part of their documentation, many packages come with a “vignette” which servers as a short tour of a packages purpose and functionality.
R has standard functions for computing many statistical graphics.
plot(mtcars$hp~mtcars$disp)
There are many aesthetic options you can control; see ‘par()’ for a full list.
with(mtcars,
plot(hp~disp,pch=15,main='Horsepower in mtcars',xlab='displacement',ylab='horsepower',las=1,col='grey')
)
Use vector to set values for specific points.
col <- rep('blue',nrow(mtcars))
col[which(mtcars$cyl==6)] <- 'grey'
col[which(mtcars$cyl==8)] <- 'red'
pch <- rep(16,nrow(mtcars))
pch[which(mtcars$am==1)] <- 17
with(mtcars,
plot(hp~disp,pch=pch,col=col,main='Horsepower in mtcars',xlab='displacement',ylab='horsepower',las=1)
)
legend('topleft',legend=c('Automatic','Manual'),pch=16:17,col='black',bty='n')
legend('bottomright',legend=paste(c(4,6,8),'cylinders'),col=c('blue','grey','red'),pch=15)
hist(mtcars[,'hp'],col='lightblue',las=1,xlab='horsepower',main='Histogram of horsepower')
boxplot(mtcars[,'hp']~mtcars[,'cyl'],las=1,xlab='# of cylinders',ylab='horsepower',col=rgb(0,0,1,.5))
qqnorm(mtcars[,'hp'])
qqline(mtcars[,'hp'])
By default, plot commands are sent to the default Rstudio graphics window. However, you can print graphics directly to file using: pdf(), jpeg(), png(), etc.
pdf('./mtcars_displacement.pdf') #opens the file
hist(mtcars$disp)
dev.off() ## closes the file
## quartz_off_screen
## 2
dir('./')
## [1] "attitude.csv" "ExampleScript1.R"
## [3] "Intro_2_R_files" "Intro_2_R.html"
## [5] "Intro_2_R.Rmd" "message.RData"
## [7] "mtcars_displacement.pdf"
I recommend using pdf as default as it is a vector based format.
Here is the syntax for a basic for loop in R
for(i in 1:10){
cat(i,'\n')
}
## 1
## 2
## 3
## 4
## 5
## 6
## 7
## 8
## 9
## 10
for(var in names(mtcars)){
cat(sprintf('average %s = %4.3f',var,mean(mtcars[,var])),'\n')
}
## average mpg = 20.091
## average cyl = 6.188
## average disp = 230.722
## average hp = 146.688
## average drat = 3.597
## average wt = 3.217
## average qsec = 17.849
## average vs = 0.438
## average am = 0.406
## average gear = 3.688
## average carb = 2.812
A while statement can be useful when you aren’t sure how many iterations are needed. Here is an that takes a random walk and terminates if the value is more than 10 units from 0.
maxIter <- 1e3 # always limit the total iterations allowed
val=vector(mode='numeric',length=maxIter)
val[1]=rnorm(1) ## intialize
k=1
while(abs(val[k]) < 10 & k <= maxIter){
val[k+1] = val[k] + rnorm(1)
k = k + 1
}
val = val[1:{k-1}]
plot(val)
Use a switch when you have two or more discrete options.
mySummary <- function(x){
switch(class(x),
factor=table(x),
numeric=sprintf('mean=%4.2f,sd=%4.2f',mean(x),sd(x)),
'Only defined for factor and numeric classes.')
}
for(var in names(iris)){
cat(var,':\n',sep='')
print(mySummary(iris[,var]))
}
## Sepal.Length:
## [1] "mean=5.84,sd=0.83"
## Sepal.Width:
## [1] "mean=3.06,sd=0.44"
## Petal.Length:
## [1] "mean=3.76,sd=1.77"
## Petal.Width:
## [1] "mean=1.20,sd=0.76"
## Species:
## x
## setosa versicolor virginica
## 50 50 50
The Fibonacci sequence starts 1, 1, 2, … and continues with each new value formed by adding the two previous values.
Write a function ‘Fib1’ which takes an argument ‘n’ and returns the \(n^{th}\) value of the Fibonacci sequence. Use a for loop in the function.
Write a function ‘Fib2’ which does the same thing using a while loop.
Use a switch to write a function that has a parameter ‘loop=c(’for’,‘while’)‘for calling either ’Fib1’ or ‘Fib2’.
Loops in R can be quite slow compared to other programming language on account of the overhead of many of the conveniences that make it useful for routine data analysis. Often, explicit loops can be avoided by using an ‘apply’ function.
Here is an example:
X = matrix(rep(1:5,each=5),5,5)
apply(X,1,sum)
## [1] 15 15 15 15 15
apply(X,2,sum)
## [1] 5 10 15 20 25
For lists use ‘lapply()’ or ‘sapply()’.
myList=list(x=1:5,y=-5:-1)
lapply(myList,sum)
## $x
## [1] 15
##
## $y
## [1] -15
sapply(myList,sum)
## x y
## 15 -15
The values in a data.frame are represented internally as a list, so use lapply with data frames.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
sapply(iris,class)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## "numeric" "numeric" "numeric" "numeric" "factor"
apply(iris,2,class)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## "character" "character" "character" "character" "character"
A very powerful construction for data manipulation is the use of apply with an implicit function.
sapply(mtcars,function(x){
nVals = length(unique(x))
return(nVals)
})
## mpg cyl disp hp drat wt qsec vs am gear carb
## 25 3 27 22 22 29 30 2 2 3 6