Reading

Concurrent and Asynchronous Computing

Asynchronous computing refers to having events that occur independenty of the primary control flow in our program.

In a traditional, synchronous program each statement or expression blocks while evaluating. In other words, it forces the program to wait until if continues. An asynchronous program, in contrast, has some statements that do not block – allowing the program to continue until either (1) the value of the earlier statement is needed or (2) execution resources such as CPU cores are exhausted.

In parallel programming we explicitly split portions of our program into chunks of code that can be executed independently. In concurrent programming we specify chunks of code that can be executed independently of others. A concurrent program can be executed sequentially or in parallel.

Traditionally concurrent programming has been focused on I/O bound tasks where one is querying external servers or databases and would otherwise have to wait for each query to finish and return before sending the next request. Concurrency helps in this situation because it allows the program to wait in multiple queues at once. The video at this link explains how concurrency helps to load webpages more quickly.

Concurrent Programming with Futures in R

The R package future provides utilities that allow us to write concurrent programs using an abstraction known as a future. Quoting the package author,

In programming, a future is an abstraction for a value that may be available at some point in the future.

Once the future has resolved, its value becomes available immediately. If we request the value of a future that has not yet resolved the request blocks leading our program to wait until the value becomes available.

Implicit and Explicit Futures

An implicit future can be created using the future assignment operator future::%<-%

Here is a pedagological example.

First, using sequential code …

##    user  system elapsed 
##   0.000   0.000   8.005

Now using implicit futures …

##    user  system elapsed 
##   0.014   0.001   5.019

We can also create explicit futures using the future() function and then value() to query the result.

##    user  system elapsed 
##   0.013   0.000   5.013

For further examples, consider

Controlling how futures are resolved

In the code above, we called plan(multisession) to specify that we want futures to be resolved in independent background R sessions. The other options we will explore are sequential and multicore.

Refer to the example script here.

What we learn:

  • background sessions are initiated when we call plan(multisession) and persist until we change the plan,

  • from the RStudio GUI plan(multicore) falls back to plan(sequential),

  • in batch mode or from a command line interface plan(multicore) uses forked processes like mclapply with preschedule = FALSE,

  • messages are captured and print to the console when requesting values.

You can find additional examples here.

Using Futures

The concecpt of a future provides a useful tool for expressing the dependencies in code we write. While futures can be used for explicit parallelism, they also offer additional flexibility by not “blocking” until necessary. This can be useful even when first writing scripts for data analysis, as it allows you to continue to explore data in the main session while waiting for long-running processes to conclude in the background.

For examples, refer to the course repo.