Object Oriented Programming

When we attach values to names in an R environment we generally refer to the name and value collectively as an ‘object’. We should, however, distinguish between base objects and object-oriented objects where the latter are those with a non-null class attribute. Compare the following:

## NULL
## [1] "factor"

Object oriented programming is a programming paradigm built around the notions of classes, methods, and, of course, objects. There are a wide variety of object oriented languages and R has (at least) three object oriented (OO) systems you should be aware of:

  1. S3 - R’s original, informal OOP system;
  2. S4 - a more formal less flexible version of S3;
  3. RC - a reference class OOP system that more closely resembles the paradigm used in languages like C++.

We will focus on the S3 and S4 systems which predominate in R. You can read about the RC and the related R6 systems in Advanced R.

Before digging into R’s OO systems it will be helpful to define a few terms.

Reading

The S3 system in R

The S3 system in R is based on the idea of generic functions.
The core idea is that a generic function is used to dispatch a class-specific method taken from an object passed to it.
Some common S3 generic functions in R inlcude, print, summary, plot, mean, head, tail, and str. If we look at the definitions for these functions, we see they are all quite simply defined in terms of a call to UseMethod().

## function (x, ...) 
## UseMethod("print")
## <bytecode: 0x7fc65c287648>
## <environment: namespace:base>
## function (object, ...) 
## UseMethod("summary")
## <bytecode: 0x7fc65c45bc08>
## <environment: namespace:base>
## function (x, ...) 
## UseMethod("head")
## <bytecode: 0x7fc65a751138>
## <environment: namespace:utils>

When UseMethod() is called R searches for an S3 method based on the name of the generic function and the class of its first argument. The specific function it looks for follows the naming pattern generic.class – this is why it is advisable not to use dots when naming functions, classes, or other objects.

As an example, let’s construct a matrix object mat and examine a call to head(mat):

## [1] "matrix"
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   10   19   28   37
## [2,]    2   11   20   29   38
## [3,]    3   12   21   30   39
## [4,]    4   13   22   31   40
## [5,]    5   14   23   32   41
## [6,]    6   15   24   33   42

The object mat has class matrix so UseMethod("head") searches for a function (method) called head.matrix() to apply to mat:

## function (x, n = 6L, ...) 
## {
##     stopifnot(length(n) == 1L)
##     n <- if (n < 0L) 
##         max(nrow(x) + n, 0L)
##     else min(n, nrow(x))
##     x[seq_len(n), , drop = FALSE]
## }
## <bytecode: 0x7fc65b986aa0>
## <environment: namespace:utils>

You can see all the methods associated with a generic function using methods().

## [1] head.data.frame* head.default*    head.ftable*     head.function*  
## [5] head.matrix      head.table*     
## see '?methods' for accessing help and source code

The * following some methods is used to denote methods that are not exported as part of the namesapce of the packages in which they are defined. For instance, the head.data.frame method is defined in the (base) package utils, but is not exported.

## function (x, n = 6L, ...) 
## {
##     stopifnot(length(n) == 1L)
##     n <- if (n < 0L) 
##         max(nrow(x) + n, 0L)
##     else min(n, nrow(x))
##     x[seq_len(n), , drop = FALSE]
## }
## <bytecode: 0x7fc65c3a7848>
## <environment: namespace:utils>

When an object has more than one class, R searches successively until a suitable method is found. The sloop package and its function s3_dispatch() are helpful for understanding how this works.

Here is an example:

## [1] "green"  "matrix"
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   10   19   28   37
## [2,]    2   11   20   29   38
## [3,]    3   12   21   30   39
## [4,]    4   13   22   31   40
## [5,]    5   14   23   32   41
## [6,]    6   15   24   33   42
##    head.green
## => head.matrix
##  * head.default

If a suitable method is not found, S3 generics revert to a default method when defined and throw an error if not. We can call some methods explicitly:

##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   10   19   28   37
## [2,]    2   11   20   29   38
## [3,]    3   12   21   30   39
## [4,]    4   13   22   31   40
## [5,]    5   14   23   32   41
## [6,]    6   15   24   33   42

but others such as head.default are not exposed as previously discusssed. You can view the source code for unexposed S3 generics using getS3method('generic','class').

## function (x, n = 6L, ...) 
## {
##     stopifnot(length(n) == 1L)
##     n <- if (n < 0L) 
##         max(length(x) + n, 0L)
##     else min(n, length(x))
##     x[seq_len(n)]
## }
## <bytecode: 0x7fc65c352e30>
## <environment: namespace:utils>

Defining a new S3 method is as simple as defining a function and naming it accordingly. Here we define a method head.green.

Now we can test it under various conditions.

## [1] "green"  "matrix"
## This is a green matrix.
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1   10   19   28   37
## [2,]    2   11   20   29   38
## [3,]    3   12   21   30   39
## [4,]    4   13   22   31   40
## [5,]    5   14   23   32   41
## [6,]    6   15   24   33   42
## This a generic green object.
## The object is not green!
## [1] 1 2 3 4 5 6

In our definition of head.green, notice the use of NextMethod() to dispatch a method previously defined on one of the parent classes.

We can similarly define our own S3 generic functions via UseMethod().

Note that the first argument to both UseMethod() and NextMethod() should be the a character vector with the name of the generic.

As a quick, somewhat contrived, example of how we might use this, we could define a col_boxplot function to pick colors according to the class of the object passed.

You should be aware that the class of the object returned by some generic functions (especially primitives) can depend on the input class.

## [1] "green"
## [1] "red"
## [1] "numeric"

Defining an S3 class

There are four common “styles” of S3 object:

  1. vector style S3 objects include the “factor” and “Date” classes which are built on atomic vectors and use attributes to add additional structure,
  2. scalar style S3 objects use a list to describe aspects of a a single thing, e.g. the “lm” and “glm” classes,
  3. a record style S3 object such as used for the “POSIXlt” class has a fixed set of elements all of equal lengths that represent various aspects of each datum,
  4. a data.frame style object similarly has

The majority of S3 objects you will encounter are in the scalar style – they are are simply lists plus a class attribute. As an example, consider the class lm returned by the lm function for linear regression modeling. We can find the definition of an lm object in the R documentatation.

The S4 System

The S3 system described above is very flexible making it easy to work with, but at the expense of the safety and uniformity of a more formal OO system.

The S4 system is a more formal OO system in R. One key difference is that S4 classes have formal definitions and classes, methods, and generics must all be explicitly defined as such. The functionality of the S4 object system comes from the (base) “methods” package.

Defining an S4 class

S4 classes are defined using the setClass function:

Create a new instance of an S4 class using new:

## An object of class "color_vector"
## Slot "name":
## [1] "x"
## 
## Slot "data":
## numeric(0)
## 
## Slot "color":
## [1] "darkgreen"

The function new is used above as a constructor for creating an object with the desired class. Most S4 classes defined in packages you download have their own constructors which you should use when defined. We can create a default constructor by assigning the output of setClass a name:

## [1] "color_vector"
## attr(,"package")
## [1] ".GlobalEnv"

You could also create an explicit constructor by writing a function that calls new and manipulates the object in some way, say providing defaults for attributes.

Accessing slots in an S4 object

You can access and set attributes for an S4 object using an @ symbol, the slot function, or an attr(obj, 'name') construction:

## [1] "darkgreen"
## [1] "darkgreen"
## [1] "name"  "data"  "color" "class"

In addition, authors of S4 classes often provide accessor functions to get at the most common slots. Here is an example:

## [1] "darkgreen"
## [1] "red"
## Warning in color(LETTERS): Object LETTERS is not of class color_vector.
## [1] "grey8"

Validator

A validator is a function that ensures an object is a valid member of a given class. Here is an example validator for our “color_vector” class.

## Class "color_vector" [in ".GlobalEnv"]
## 
## Slots:
##                                     
## Name:       name      data     color
## Class: character   numeric character

S4 Methods

We can control how an object of class color_vector gets displayed by defining a show method (the S4 equivalent of print).

## An object of class "color_vector"
## Slot "name":
## [1] "Green Values"
## 
## Slot "data":
##  [1] -0.4553221  1.3953472  2.2242374  0.9066720  0.3874196 -1.0676045
##  [7]  0.7065391 -0.1764389  0.7440158  0.6381587
## 
## Slot "color":
## [1] "darkgreen"

Now, when we call show on an object of class color_vector R will use the custom method.

## name: Green Values, color: darkgreen
## 
## Data: num [1:10] -0.455 1.395 2.224 0.907 0.387 ...
## name: Green Values, color: darkgreen
## 
## Data: num [1:10] -0.455 1.395 2.224 0.907 0.387 ...

We could similarly define a method that allows the user to change the value in the color slot. This is a so-called “setter” function.

## [1] "color<-"
## [1] "purple"
## name: Green Values, color: purple
## 
## Data: num [1:10] -0.455 1.395 2.224 0.907 0.387 ...

Resources