Object Oriented Programming

When we attach values to names in an R environment we generally refer to the name and value collectively as an ‘object’. More formally, we can distinguish between base objects and object-oriented objects where the latter are those with a non-null class attribute. Compare the following:

# A base object with NULL class attribute
attr(1:5, 'class')
## NULL
# An object of class factor.
attr(factor(1:5), 'class')
## [1] "factor"

Object oriented programming is a programming paradigm built around the notions of classes, methods, and, of course, objects. There are a wide variety of object oriented languages and R has (at least) three object oriented (OO) systems you should be aware of:

  1. S3 - R’s original, informal OOP system;
  2. S4 - a more formal less flexible version of S3;
  3. RC - a reference class OOP system that more closely resembles the encapsulated object paradigm used in languages like C++ and Python.

We will focus on the S3 and S4 systems which predominate in R. You can read about RC and the related R6 system in Advanced R.

Before digging into R’s OO systems it will be helpful to define a few terms.

Reading

The S3 system in R

The S3 system in R is based on the idea of generic functions.
The core idea is that a generic function is used to dispatch a class-specific method taken from an object passed to it.

Some common S3 generic functions in R include: print, summary, plot, mean, head, tail, and str. If we look at the definitions for these functions, we see they are all quite simply defined in terms of a call to UseMethod().

print
## function (x, ...) 
## UseMethod("print")
## <bytecode: 0x7fef28107be8>
## <environment: namespace:base>
summary
## function (object, ...) 
## UseMethod("summary")
## <bytecode: 0x7fef265e78a8>
## <environment: namespace:base>
head
## function (x, ...) 
## UseMethod("head")
## <bytecode: 0x7fef265a0538>
## <environment: namespace:utils>
UseMethod
## function (generic, object)  .Primitive("UseMethod")

When UseMethod() is called R searches for an S3 method based on the name of the generic function and the class of its first argument. The specific function it looks for follows the naming pattern generic.class – this is why it is advisable not to use dots when naming functions, classes, or other objects unless explicitly defining an S3 method.

S3 Example

As an example, let’s construct a matrix object mat and examine a call to head(mat):

mat = matrix(1:45, nrow = 9, ncol = 5)
class(mat)
## [1] "matrix" "array"
head(mat)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   10   19   28   37
## [2,]    2   11   20   29   38
## [3,]    3   12   21   30   39
## [4,]    4   13   22   31   40
## [5,]    5   14   23   32   41
## [6,]    6   15   24   33   42

The object mat has classes matrix and array. Therefore, UseMethod("head") searches for a function (or method) called head.matrix() to apply to mat:

head.matrix
## function (x, n = 6L, ...) 
## {
##     checkHT(n, d <- dim(x))
##     args <- rep(alist(x, , drop = FALSE), c(1L, length(d), 1L))
##     ii <- which(!is.na(n[seq_along(d)]))
##     args[1L + ii] <- lapply(ii, function(i) seq_len(if ((ni <- n[i]) < 
##         0L) max(d[i] + ni, 0L) else min(ni, d[i])))
##     do.call("[", args)
## }
## <bytecode: 0x7fef2475ed50>
## <environment: namespace:utils>

We can describe this sequence using a call tree:

\[ \texttt{head(mat)} \to \texttt{UseMethod("head")} \to \texttt{.Primitive("UseMethod")} \to \texttt{head.matrix(mat)}. \]

S3 methods always work by this pattern: generic \(\to\) dispatch \(\to\) method.

Finding S3 methods

You can see all the methods associated with a generic function using methods().

methods(head)
## [1] head.array*      head.data.frame* head.default*    head.ftable*    
## [5] head.function*   head.matrix     
## see '?methods' for accessing help and source code

The * following some methods is used to denote methods that are not exported as part of the namesapce of the packages in which they are defined. For instance, the head.data.frame method is defined in the (base) package utils, but is not exported.

getS3method('head', 'data.frame')
## function (x, n = 6L, ...) 
## {
##     checkHT(n, d <- dim(x))
##     args <- rep(alist(x, , drop = FALSE), c(1L, length(d), 1L))
##     ii <- which(!is.na(n[seq_along(d)]))
##     args[1L + ii] <- lapply(ii, function(i) seq_len(if ((ni <- n[i]) < 
##         0L) max(d[i] + ni, 0L) else min(ni, d[i])))
##     do.call("[", args)
## }
## <bytecode: 0x7fef266283a8>
## <environment: namespace:utils>

When an object has more than one class, R searches successively through the class attribute until a suitable method is found. The sloop package and its function s3_dispatch() are helpful for understanding how this works.

Here is an example:

class(mat) = c('green', class(mat))
class(mat)
## [1] "green"  "matrix" "array"
head(mat)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   10   19   28   37
## [2,]    2   11   20   29   38
## [3,]    3   12   21   30   39
## [4,]    4   13   22   31   40
## [5,]    5   14   23   32   41
## [6,]    6   15   24   33   42
sloop::s3_dispatch(head(mat))
##    head.green
## => head.matrix
##  * head.array
##  * head.default

If a suitable method is not found, S3 generics revert to a default method when defined and throw an error if not. We can call some methods explicitly:

head.matrix(mat)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   10   19   28   37
## [2,]    2   11   20   29   38
## [3,]    3   12   21   30   39
## [4,]    4   13   22   31   40
## [5,]    5   14   23   32   41
## [6,]    6   15   24   33   42

but others such as head.default are not exposed as discussed above.

You can view the source code for unexposed S3 generics using getS3method('generic', 'class').

getS3method('head', 'default')
## function (x, n = 6L, ...) 
## {
##     checkHT(n, dx <- dim(x))
##     if (!is.null(dx)) 
##         head.array(x, n, ...)
##     else if (length(n) == 1L) {
##         n <- if (n < 0L) 
##             max(length(x) + n, 0L)
##         else min(n, length(x))
##         x[seq_len(n)]
##     }
##     else stop(gettextf("no method found for %s(., n=%s) and class %s", 
##         "head", deparse(n), sQuote(class(x))), domain = NA)
## }
## <bytecode: 0x7fef253a74c0>
## <environment: namespace:utils>

Once you know the package namespace within which a method is defined you can call it explicitly using, e.g. utils:::head.default().

Defining S3 methods

Defining a new S3 method is as simple as defining a function and naming it accordingly. Here we define a method head.green().

head.green = function(obj) {
  
  # Green escape sequences, from e.g. crayon::green("green"). 
  g1 = '\033[32m'
  g2 = '\033[39m'
  
  # Make sure obj is an object
  if ( !is.object(obj) ) warning("Object 'obj' is not an object!")

  # Check if its green
  if ('green' %in% class(obj) ) {
    if ( length(class(obj)) > 1 ) {
      next_class = class(obj)[-grep('green', class(obj))][1]
      cat( sprintf('This is a %sgreen %s%s.\n', g1, next_class, g2) )
    
      # This calls the next available method, allowing us to offload work in
      # a method for the subclass to an existing method for the superclass.
      NextMethod("head")

    } else {
      cat(sprintf('This a %sgeneric green object%s.\n', g1, g2))
    }
  } else {
    cat(sprintf('The object is not %sgreen%s!\n', g1, g2))
  }
  
}

Now we can test it under various conditions.

## We previously assigned
class(mat)
## [1] "green"  "matrix" "array"
head(mat)
## This is a green matrix.
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   10   19   28   37
## [2,]    2   11   20   29   38
## [3,]    3   12   21   30   39
## [4,]    4   13   22   31   40
## [5,]    5   14   23   32   41
## [6,]    6   15   24   33   42
## Test head.green for generic class
class(mat) = 'green'
head(mat)
## This a generic green object.
## Test on a non-green object
red_obj = 1:100
class(red_obj) = 'red'
head.green(red_obj)
## The object is not green!
head(red_obj)
## [1] 1 2 3 4 5 6

In our definition of head.green, notice the use of NextMethod() to dispatch a method previously defined on one of the parent classes.

Defining S3 Generics

We can similarly define our own S3 generic functions via UseMethod().

Note that the first argument to both UseMethod() and NextMethod() should be a character vector with the name of the generic.

# Generic Color finder
getColor = function(obj) {
  UseMethod("getColor")
}

# Default method 
getColor.default = function(obj) {
  # Are any classes colors?
  ind = class(obj) %in% colors()
  if ( any(ind) ) {
     # Yes. Return color with highest class predence.
     class(obj)[which(ind)[1]]
  } else {
    # No return a random color.
    sample(colors(), 1)
  }
}

# Specific method aliasing green to "darkgreen"
getColor.green = function(obj) {
  "darkgreen"
}

As a quick, somewhat contrived, example of how we might use this, we could define a col_boxplot function to pick colors according to the class of the object passed.

# A box plot function that uses the class attribute to define colors.
col_boxplot = function(dat, ...) {
  if ( is.atomic(dat) ) {
    boxplot(dat, col = getColor(dat), ...)
  } else{
    col = sapply(dat, getColor)
    boxplot(dat, col = col, ...)
  }
}
# Define some iid data
x = rnorm(100, 1, 1); class(x) = 'green'
y = rnorm(100, 0, 2); class(y) = 'red'
z = rnorm(100, 0, 1)
col_boxplot(list(x = x, y = y, z = z), las = 1)

col_boxplot(list(z = x, y = y, z = z), las = 1)

Caution

You should be aware that the class of the object returned by some generic functions (especially primitives) can depend on the input class.

class(x + y)
## [1] "green"
class(y + x)
## [1] "red"
class(mean(x))
## [1] "numeric"

Defining an S3 class

There are four common “styles” of S3 object:

  1. vector style S3 objects include the factor and Date classes which are built on atomic vectors and use attributes to add additional structure,
  2. scalar style S3 objects use a list to describe aspects of a single thing, e.g. the lm and glm classes,
  3. a record style S3 object such as used for the POSIXlt class has a fixed set of elements all of equal lengths that represent aspects of each datum,
  4. a data.frame style objects are similarly lists with components of equal lengths, but these elements of arbitrary rather than fixed classes.

The majority of S3 objects you will encounter are in the scalar style – they are simply lists with a class attribute governing method dispatch.

As an example, consider the class lm returned by the lm() function for linear regression modeling. We can find the definition of an lm object in the R documentatation.

The S4 System

The S3 system described above is very flexible making it easy to work with, but at the expense of the safety and uniformity of a more formal OO system.

The S4 system is a more formal OO system in R. One key difference is that S4 classes have formal definitions and classes, methods, and generics must all be explicitly defined as such. The functionality of the S4 object system comes from the (base) “methods” package.

Defining an S4 class

S4 classes are defined using the setClass function:

setClass("color_vector",
   slots = c(
     name = 'character',
     data = 'numeric',
     color = 'character'
   )
)

Create a new instance of an S4 class using new:

x = new("color_vector", name = "x", color = "darkgreen")
x
## An object of class "color_vector"
## Slot "name":
## [1] "x"
## 
## Slot "data":
## numeric(0)
## 
## Slot "color":
## [1] "darkgreen"

The function new is used above as a constructor for creating an object with the desired class. Most S4 classes defined in packages you download have their own constructors which you should use when defined. We can create a default constructor by assigning the output of setClass a name:

color_vector = 
 setClass("color_vector",
   slots = c(
     name = 'character',
     data = 'numeric',
     color = 'character'
   )
 )
y = color_vector(name = "y", data = rnorm(100, 0, 2), color = "red")
class(y)
## [1] "color_vector"
## attr(,"package")
## [1] ".GlobalEnv"

You could also create an explicit constructor by writing a function that calls new and manipulates the object in some way, say providing defaults for attributes.

Accessing slots in an S4 object

You can access and set attributes for an S4 object using an @ symbol, the slot function, or an attr(obj, 'name') construction:

## Access slots using @
x@color
## [1] "darkgreen"
# Assign some data to the data slot
x@data = rnorm(10, 1, 1)

# Check the color
slot(x, 'color')
## [1] "darkgreen"
# Change the name of x
attr(x, 'name') = 'Green Values'
names(attributes(x))
## [1] "name"  "data"  "color" "class"

Only the first slot(obj, 'name') and last attr(obj, 'name') work for S3 objects.

s3 = factor(1:10, levels = 1:10, labels = letters[1:10])
attr(s3, "levels")
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
slot(s3, "levels") 
##  [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j"
tryCatch({ s3@levels }, error = identity)
## <simpleError in doTryCatch(return(expr), name, parentenv, handler): trying to get slot "levels" from an object (class "factor") that is not an S4 object >

In addition, authors of S4 classes often provide accessor functions to get at the most common slots. Here is an example:

color = function(obj) {
 # Accessor function for the "color" slot in the color_vector class
 # Inputs: obj - an object of class color vector
 # Returns: the value of the color slot
 # If the object is not of class color vector, a random color is returned with
 # a warning.
  
 y = as.character(match.call())
 if ( !{"color_vector" %in% class(obj) } ) {
   msg = sprintf('Object %s is not of class color_vector.\n', y[2] )
   warning(msg)
   return( sample(colors(), 1) )
 }
 
 slot(obj, 'color')
 
}
color(x)
## [1] "darkgreen"
color(y)
## [1] "red"
color(LETTERS)
## Warning in color(LETTERS): Object LETTERS is not of class color_vector.
## [1] "deepskyblue3"

Validator

A validator is a function that ensures an object is a valid member of a given class. Here is an example validator for our color_vector class.

setValidity("color_vector", function(object) {
  if ( !{object@color %in% colors()} ) {
    sprintf('@color = %s is not a valid color. See colors().', object@color)
  } else {
    TRUE
  }
})
## Class "color_vector" [in ".GlobalEnv"]
## 
## Slots:
##                                     
## Name:       name      data     color
## Class: character   numeric character
tryCatch({color_vector(name = 'test', data = 1:3, color = 'A')}, 
         error = identity)
## <simpleError in validObject(.Object): invalid class "color_vector" object: @color = A is not a valid color. See colors().>

S4 Methods

We can control how an object of class color_vector gets displayed by defining a show method (the S4 equivalent of print).

## This is an S4 generic
show(x)
## An object of class "color_vector"
## Slot "name":
## [1] "Green Values"
## 
## Slot "data":
##  [1]  2.0793072  1.1020231  0.0194862  0.4737538  1.2446874  1.1115460
##  [7]  2.4734511  0.3636716 -1.2003157 -0.3708521
## 
## Slot "color":
## [1] "darkgreen"
## Change how color_vector objects are shown.
setMethod('show', 'color_vector',
  function(object) {
    msg = sprintf('name: %s, color: %s\n\n', object@name, object@color) 
    cat(msg)
    cat('Data:')
    str(object@data)
    cat('\n')
  }
)

Now, when we call show on an object of class color_vector R will use the custom method.

show(x)
## name: Green Values, color: darkgreen
## 
## Data: num [1:10] 2.0793 1.102 0.0195 0.4738 1.2447 ...
# Note: show, like print, is the default method for an unassigned object.
x
## name: Green Values, color: darkgreen
## 
## Data: num [1:10] 2.0793 1.102 0.0195 0.4738 1.2447 ...

We could similarly define a method that allows the user to change the value in the color slot. This is a so-called “setter” function.

## Define a new accessor for setting the color
setGeneric("color<-", function(object, value) standardGeneric('color<-'))
## [1] "color<-"
setMethod('color<-', 'color_vector', 
  function(object, value) {
    object@color = value
    validObject(object)
    object
  }
)

color(x) = 'purple'
color(x)
## [1] "purple"
show(x)
## name: Green Values, color: purple
## 
## Data: num [1:10] 2.0793 1.102 0.0195 0.4738 1.2447 ...

Resources