multiprocessing slides

Process-based Parallelism with Multiprocessing¶

Stats 507, Fall 2021

James Henderson, PhD
November 16, 2021

Overview¶

Parallel & Asynchronous Computing
multiprocessing
Pipes and Queues
Pool
Background Tasks
asyncio
Random Numbers
Takeaways

MP Demo¶

These slides are intended to be presented/read alongside the multiprocess demo from the course repo.
The demo relies on a number of functions defined in cv_funcs.py.

Parallel Computing in Data Science¶

Many core data science methods are trivially parallel - composed of a collection of independent tasks:
- Monte Carlo approximations,
- Bootstrap replication and other resampling methods,
- Cross-validation,
- Bagging estimators such as a random forest.

Built-in Parallelism¶

A number of functions (e.g. sklearn estimators) have built-in support for parallel computation:
- LogisticRegressionCV() using n_jobs parameter,
- RandomForestClassifier() using n_jobs parameter.
Prefer built-in methods when available.

Asynchronous Computing¶

Asynchronous computing refers to having events that occur independently of the primary control flow in our program.
In a traditional, synchronous program each statement or expression blocks while evaluating -- it forces the program to wait until it completes.
An asynchronous program has some statements that do not block -- allowing the control flow to continue until either:
- the value of the non-blocking statement is needed, or
- execution resources such as CPU cores are exhausted.

Concurrent Programs for I/O bound tasks¶

Traditionally concurrent programming has been focused on I/O bound tasks.
If querying external servers or databases, would otherwise have to wait for each query to finish and return before sending the next request.
Concurrency helps in this situation because it allows the program to wait in multiple queues at once.

Parallel Computing¶

Modern computers, including laptops and desktops, have multiple processors or cores.
A parallel program takes advantage of this architecture to complete more than one task at a time -- reducing the "wall time" a CPU-bound program takes to run.
Concurrency including parallelism can be implemented using threads, processes, futures or other abstractions.

Parallelism is not Magic¶

When thinking of parallelizing some portion of a program, remember that parallelism is not magic.
There is some computational overhead involved in splitting the task, initializing child processes, communicating data, and collating results.
For this reason, there is usually little to be gained in parallelizing already fast computations.
An overly parallelized program incurs more overhead than necessary to use available resources.

Vectorization > Parallelism¶

Writing vectorized code is often more efficient than writing parallel code.

`multiprocessing`¶

The built-in multiprocessing module provides process-based parallelism.
Other modules in the standard library that support parallelism and asynchronous computations include:
- concurrent.futures,
- threading,
- asyncio.

In [ ]:

import multiprocessing as mp
import cv_funcs as cvf

Process¶

Create a child process using mp.Process().
The Process object's .start() method spawns a new Python process.
On Unix, can be started by forking (efficient).
A forked child process has read-only access to the objects in the parent process's namespace.
On Windows or MacOS (recently) only the "spawn" option for an independent process is supported.

Process¶

The target argument is used to define a callable to be run when the process has been initialized.
The args and kwargs parameters are used to pass arguments to the callable passed to target.
A Process should be setup and started within a "main gate".
In an interactive session, local functions will not be recognized.

Process¶

To block until a child process has completed, call its .join() method.
This will also shutdown the child process.
Can call .close() method to shutdown zombie processes.
Use mp.active_children() to see active child processes.

Pipes and Queues¶

When using multiple processes, one generally uses message passing for communication between processes and avoids having to use any synchronization primitives like locks.

For passing messages one can use Pipe() (for a connection between two processes) or a queue (which allows multiple producers and consumers).

--docs

Queues¶

For trivially parallel tasks, use queues which easily generalize to multiple processes.
A Queue() is implemented using a Pipe() but handles synchronization implicitly.
Create a Queue using mp.Queue() with (optionally) a maximum size.

Queues¶

Producer processes use a queue's .put() method to enter items into the queue.
Consumer processes use a queue's .get() method to accept an item from the queue.
Both have optional block and timeout arguments.

Queue Pattern¶

We'll follow the pattern here which uses two queues:
- task_queue sends tasks from the parent process to child processes,
- done_queue sends results from the child processes to the parent process.
In this case, task_queue has a single producer and (potentially) multiple consumers.
The done_queue has (potentially) multiple producers and a single consumer.

Queue Pattern¶

In the pattern we'll use two key functions:
- worker() iterates over tasks in the queue until it receives the sentinel to stop ('STOP');
- calculate() takes the tuple represent the task and calls the callable with the unpacked arguments.
We encapsulate the full pattern into mp_apply().

Pool¶

A pool of worker processes can be setup with mp.Pool().
The Pool object has several methods for dispatching work to these child/worker processes.
The most straightforward is .map() which takes a function and an iterable.

Pool¶

A Pool object must be explicitly closed using .close() which will wait for assigned processes to close.
A tidy way to ensure this implicitly is to use a with statement.
The .join() method can be used after .close() to block until all tasks are completed.

Chunking¶

The .map() method accepts an argument chunksize to determine how tasks are assigned to workers.
Larger chunks result in less communication overhead.
For tasks with predictable and low-variance run time, best to chunk so each worker processes a single chunk.
For tasks with high-variance or long-tailed run times, better to use a smaller chunk size to keep all workers busy for as long as possible.

Star maps¶

A Pool object's .starmap() method can be used to parallelize function calls over more than one argument.

Background task(s)¶

During model and notebook development, long-running tasks can interrupt our flow by blocking the active process (kernel).
Running these tasks asynchronously ("in the background") using a non-blocking workflow can help us to be more productive.
We implement this concept using the "queue" pattern in the functions bg_task() and bg_get().
See also Pool.map_async().

asyncio¶

The asyncio is designed for writing concurrent I/O operations -- particularly useful for working with websites.
There are three key concepts:
- Define asynchronous, non-blocking functions using async def.
- Block and retrieve results using await,
- Every asynchronous function call must be awaited.
You can think of the "coroutine" as a value that will be available at some point in the future.

Background Tasks using asyncio¶

We implement the "background tasks" pattern using asyncio in async_task().

Random Numbers¶

Many statistical and machine learning applications rely on pseudo-random numbers for things like sampling from distributions and stochastic optimization, e.g. bootstrap, Monte Carlo.
Care needs to be taken to ensure random number streams behave as expected when using parallel computations.
Issues of both reproducibility and (pseudo)-independence.
Read more about this at https://numpy.org/doc/stable/reference/random/parallel.html

Takeaways¶

Many data science methods can be trivially parallelized.
The multiprocessing module provides process-based parallelism.
Reduce run-time by spreading computations across multiple Python sessions.
Run long-running code in the background during development.
Use built-in methods for parallel computing when available.

Process-based Parallelism with Multiprocessing¶

Overview¶

MP Demo¶

Parallel Computing in Data Science¶

Built-in Parallelism¶

Asynchronous Computing¶

Concurrent Programs for I/O bound tasks¶

Parallel Computing¶

Parallelism is not Magic¶

Vectorization > Parallelism¶

multiprocessing¶

Process¶

Process¶

Process¶

Pipes and Queues¶

Queues¶

Queues¶

Queue Pattern¶

Queue Pattern¶

Pool¶

Pool¶

Chunking¶

Star maps¶

Background task(s)¶

asyncio¶

Background Tasks using asyncio¶

Random Numbers¶

Takeaways¶

`multiprocessing`¶