Stats 507, Fall 2021
James Henderson, PhD
November 16, 2021
multiprocess
demo from the course repo. cv_funcs.py
. LogisticRegressionCV()
using n_jobs
parameter,RandomForestClassifier()
using n_jobs
parameter.multiprocessing
¶multiprocessing
module provides process-based parallelism.concurrent.futures
,threading
,asyncio
. import multiprocessing as mp
import cv_funcs as cvf
mp.Process()
. Process
object's .start()
method spawns a new Python process.target
argument is used to define a callable to be run when the
process has been initialized. args
and kwargs
parameters are used to pass arguments to the
callable passed to target
. Process
should be setup and started within a "main gate". .join()
method..close()
method to shutdown zombie processes. mp.active_children()
to see active child processes. When using multiple processes, one generally uses message passing for communication between processes and avoids having to use any synchronization primitives like locks.
For passing messages one can use Pipe() (for a connection between two processes) or a queue (which allows multiple producers and consumers).
--docs
Queue()
is implemented using a Pipe()
but handles synchronization
implicitly. Queue
using mp.Queue()
with (optionally) a maximum size. .put()
method to enter items into the
queue..get()
method to accept an item from
the queue. block
and timeout
arguments. task_queue
sends tasks from the parent process to child processes,done_queue
sends results from the child processes to the parent
process. task_queue
has a single producer and (potentially)
multiple consumers.done_queue
has (potentially) multiple producers and a single
consumer. worker()
iterates over tasks in the queue until it receives the
sentinel to stop ('STOP');calculate()
takes the tuple represent the task and calls the callable
with the unpacked arguments.mp_apply()
. mp.Pool()
. Pool
object has several methods for dispatching work to these
child/worker processes..map()
which takes a function and an iterable.Pool
object must be explicitly closed using .close()
which will wait
for assigned processes to close.with
statement. .join()
method can be used after .close()
to block until all
tasks are completed. .map()
method accepts an argument chunksize
to determine how
tasks are assigned to workers. Pool
object's .starmap()
method can be used to parallelize function
calls over more than one argument. bg_task()
and bg_get()
.Pool.map_async()
. async def
.await
,async_task()
.