Python3 Standard Library: Multprocessing manages processes like threads

Keywords: Python less

1. Multprocessing manages processes like threads

The multiprocessing module contains an API that is based on the threading API and divides work into multiple processes.In some cases, multiprocessing can replace threading as a temporary replacement to utilize multiple CPU cores, thereby avoiding computing bottlenecks caused by Python global interpreter locks.
Because of this similarity between multiprocessing and threading modules, the first few examples here have been modified from the threading example.Features in multiprocessing that are not provided by threading are described later.

1.1 multiprocessing Base

The easiest way to create a second process is to instantiate a Process object with a target function and call start() to get it working.

import multiprocessing

def worker():
    """worker function"""
    print('Worker')

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker)
        jobs.append(p)
        p.start()

The word "Worker" in the output will be printed five times, but depending on the order of execution, it is not clear which is first because each process is competing to access the output stream.

In most cases, it is more useful to provide parameters to tell a process what to do when it is created.Unlike threading, to pass a parameter to a multiprocessing Process, the parameter must be able to be serialized using pickle.The following example passes a number to be printed to each worker process.

import multiprocessing

def worker(num):
    """thread worker function"""
    print('Worker:', num)

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(i,))
        jobs.append(p)
        p.start()

Integer parameters are now included in messages printed by each worker process.

1.2 Importable Target Functions

There is a difference between the threading and multiprocessing examples, where extra protection is used for the main.Based on how the new process is started, it is required that the child process be able to import scripts containing the target function.The main parts of the application can be wrapped in a u main_check to ensure that modules do not run recursively across subprocesses when imported.Another way is to import the target function from a separate script.

import multiprocessing

def worker():
    """worker function"""
    print('Worker')
    return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = multiprocessing.Process(
            target=worker,
        )
        jobs.append(p)
        p.start()

Calling the main program produces output similar to the first example.

1.3 Identify the current process

Identifying or naming processes by passing parameters is cumbersome and unnecessary.Each Process instance has a name that can be changed from its default value when a process is created.Naming processes is useful for tracking processes, especially if there are multiple types of processes running simultaneously in the application.

import multiprocessing
import time

def worker():
    name = multiprocessing.current_process().name
    print(name, 'Starting')
    time.sleep(2)
    print(name, 'Exiting')

def my_service():
    name = multiprocessing.current_process().name
    print(name, 'Starting')
    time.sleep(3)
    print(name, 'Exiting')

if __name__ == '__main__':
    service = multiprocessing.Process(
        name='my_service',
        target=my_service,
    )
    worker_1 = multiprocessing.Process(
        name='worker 1',
        target=worker,
    )
    worker_2 = multiprocessing.Process(  # default name
        target=worker,
    )

    worker_1.start()
    worker_2.start()
    service.start()

In the debug output, each line contains the name of the current process.Rows with process name column Process-3 correspond to unnamed
Process worker_1.

1.4 Daemon

By default, the main program does not exit until all subprocesses exit.In some cases, you may need to start a background process that can run all the time without blocking the main program from exiting. If a service cannot interrupt the process in an easy way, or if you want the process to stop in half without losing or destroying data (such as generating a "heartbeat" task for a service monitoring tool), then use daemon for these servicesProcesses are useful.

To mark a process as a daemon, set its daemon property to True.Processes are not daemons by default.

import multiprocessing
import time
import sys

def daemon():
    p = multiprocessing.current_process()
    print('Starting:', p.name, p.pid)
    sys.stdout.flush()
    time.sleep(2)
    print('Exiting :', p.name, p.pid)
    sys.stdout.flush()

def non_daemon():
    p = multiprocessing.current_process()
    print('Starting:', p.name, p.pid)
    sys.stdout.flush()
    print('Exiting :', p.name, p.pid)
    sys.stdout.flush()


if __name__ == '__main__':
    d = multiprocessing.Process(
        name='daemon',
        target=daemon,
    )
    d.daemon = True

    n = multiprocessing.Process(
        name='non-daemon',
        target=non_daemon,
    )
    n.daemon = False

    d.start()
    time.sleep(1)
    n.start()

There is no "Exiting" message from the daemon in the output because all non-daemon processes (including the main program) have exited before the daemon wakes up from its 2-second sleep.

The daemon automatically terminates before the main program exits to avoid leaving the "orphaned" process running.To verify this, you can find the process ID value printed when the program is running and examine the process with a ps-like command.

1.5 Waiting Process

To wait for a process to complete its work and exit, you can use the join() method.

import multiprocessing
import time

def daemon():
    name = multiprocessing.current_process().name
    print('Starting:', name)
    time.sleep(2)
    print('Exiting :', name)

def non_daemon():
    name = multiprocessing.current_process().name
    print('Starting:', name)
    print('Exiting :', name)

if __name__ == '__main__':
    d = multiprocessing.Process(
        name='daemon',
        target=daemon,
    )
    d.daemon = True

    n = multiprocessing.Process(
        name='non-daemon',
        target=non_daemon,
    )
    n.daemon = False

    d.start()
    time.sleep(1)
    n.start()

    d.join()
    n.join()

Since the main process uses join() to wait for the daemon to exit, this time an "Exiting" message will be printed.

By default, join() is blocked indefinitely.You can pass a timeout parameter to this module (this is a floating-point number indicating the number of seconds to wait before the process becomes inactive).join() returns even if the process has not completed within this timeout period.

import multiprocessing
import time

def daemon():
    name = multiprocessing.current_process().name
    print('Starting:', name)
    time.sleep(2)
    print('Exiting :', name)

def non_daemon():
    name = multiprocessing.current_process().name
    print('Starting:', name)
    print('Exiting :', name)

if __name__ == '__main__':
    d = multiprocessing.Process(
        name='daemon',
        target=daemon,
    )
    d.daemon = True

    n = multiprocessing.Process(
        name='non-daemon',
        target=non_daemon,
    )
    n.daemon = False

    d.start()
    n.start()

    d.join(1)
    print('d.is_alive()', d.is_alive())
    n.join()

Since the incoming timeout value is less than the time the daemon sleeps, the process remains "alive" after join() returns.

1.6 Terminate process

Although a poison pill is best used to signal a process that it should exit, it can be useful to force a process to end if it appears to have been suspended or deadlocked.Calling terminate() on a process object terminates the child process.

import multiprocessing
import time

def slow_worker():
    print('Starting worker')
    time.sleep(0.1)
    print('Finished worker')

if __name__ == '__main__':
    p = multiprocessing.Process(target=slow_worker)
    print('BEFORE:', p, p.is_alive())

    p.start()
    print('DURING:', p, p.is_alive())

    p.terminate()
    print('TERMINATED:', p, p.is_alive())

    p.join()
    print('JOINED:', p, p.is_alive())

1.7 Process Exit Status

The status code generated when the process exits can be accessed through the exitcode property.The table below lists the range of values available for this property.(

Exit Code Meaning
== 0 No errors occurred
> 0 The process has an error and exits with that error code
< 0 Process as a -1 * exitcode
import multiprocessing
import sys
import time

def exit_error():
    sys.exit(1)

def exit_ok():
    return

def return_value():
    return 1

def raises():
    raise RuntimeError('There was an error!')

def terminated():
    time.sleep(3)

if __name__ == '__main__':
    jobs = []
    funcs = [
        exit_error,
        exit_ok,
        return_value,
        raises,
        terminated,
    ]
    for f in funcs:
        print('Starting process for', f.__name__)
        j = multiprocessing.Process(target=f, name=f.__name__)
        jobs.append(j)
        j.start()

    jobs[-1].terminate()

    for j in jobs:
        j.join()
        print('{:>15}.exitcode = {}'.format(j.name, j.exitcode))

The process that generated the exception automatically gets exitcode 1.

1.8 Log

When debugging concurrency problems, this can be useful if you have access to the internal state of the object provided by multiprocessing.Logging can be enabled using a convenient module-level function called log_to_stderr().It uses logging to create a logger object and an additional processor to send log messages to standard error channels.

import multiprocessing
import logging
import sys


def worker():
    print('Doing some work')
    sys.stdout.flush()


if __name__ == '__main__':
    multiprocessing.log_to_stderr(logging.DEBUG)
    p = multiprocessing.Process(target=worker)
    p.start()
    p.join()

By default, the log level is set to NOTSET, meaning no messages are generated.By passing in a different log level, you can initialize the logger and specify the required level of detail.

 

To process the logger directly (modify its log level or increase the processor), you can use get_logger().

import multiprocessing
import logging
import sys

def worker():
    print('Doing some work')
    sys.stdout.flush()

if __name__ == '__main__':
    multiprocessing.log_to_stderr()
    logger = multiprocessing.get_logger()
    logger.setLevel(logging.INFO)
    p = multiprocessing.Process(target=worker)
    p.start()
    p.join()

Using the name multiprocessing, you can also configure the logger through the logging configuration file API.

1.9 Derived Processes

To start working in a separate process, although the easiest way is to use Process and pass along a target function, you can also use a custom subclass.

import multiprocessing

class Worker(multiprocessing.Process):

    def run(self):
        print('In {}'.format(self.name))
        return

if __name__ == '__main__':
    jobs = []
    for i in range(5):
        p = Worker()
        jobs.append(p)
        p.start()
    for j in jobs:
        j.join()

Derived classes should override run() to complete their work.

1.10 Delivering messages to processes

Similar to threads, a common usage pattern for multiple processes is to divide one work into multiple work processes and run in parallel.Effective use of multiple processes often requires some kind of communication between them so that work can be broken down and results can be aggregated.An easy way to accomplish interprocess communication with multiprocessing is to use a Queue to pass messages back and forth.Any object that can be serialized with a pickle can be passed through a Queue.

import multiprocessing

class MyFancyClass:

    def __init__(self, name):
        self.name = name

    def do_something(self):
        proc_name = multiprocessing.current_process().name
        print('Doing something fancy in {} for {}!'.format(
            proc_name, self.name))

def worker(q):
    obj = q.get()
    obj.do_something()

if __name__ == '__main__':
    queue = multiprocessing.Queue()

    p = multiprocessing.Process(target=worker, args=(queue,))
    p.start()

    queue.put(MyFancyClass('Fancy Dan'))

    # Wait for the worker to finish
    queue.close()
    queue.join_thread()
    p.join()

This small example simply sends a message to a worker process, and the main process waits for the worker process to complete.

A more complex example shows how to manage multiple worker processes that consume data from a JoinableQueue and pass the results back to the parent process.Poison technology is used here to stop the process.After a specific task is established, the main program adds a "stop" value to each worker process in the job queue.When a worker process encounters this specific value, it exits its processing cycle.The main process uses the join() method of the task queue to wait for all tasks to complete before it begins processing the results.

import multiprocessing
import time

class Consumer(multiprocessing.Process):

    def __init__(self, task_queue, result_queue):
        multiprocessing.Process.__init__(self)
        self.task_queue = task_queue
        self.result_queue = result_queue

    def run(self):
        proc_name = self.name
        while True:
            next_task = self.task_queue.get()
            if next_task is None:
                # Poison pill means shutdown
                print('{}: Exiting'.format(proc_name))
                self.task_queue.task_done()
                break
            print('{}: {}'.format(proc_name, next_task))
            answer = next_task()
            self.task_queue.task_done()
            self.result_queue.put(answer)

class Task:

    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __call__(self):
        time.sleep(0.1)  # pretend to take time to do the work
        return '{self.a} * {self.b} = {product}'.format(
            self=self, product=self.a * self.b)

    def __str__(self):
        return '{self.a} * {self.b}'.format(self=self)

if __name__ == '__main__':
    # Establish communication queues
    tasks = multiprocessing.JoinableQueue()
    results = multiprocessing.Queue()

    # Start consumers
    num_consumers = multiprocessing.cpu_count() * 2
    print('Creating {} consumers'.format(num_consumers))
    consumers = [
        Consumer(tasks, results)
        for i in range(num_consumers)
    ]
    for w in consumers:
        w.start()

    # Enqueue jobs
    num_jobs = 10
    for i in range(num_jobs):
        tasks.put(Task(i, i))

    # Add a poison pill for each consumer
    for i in range(num_consumers):
        tasks.put(None)

    # Wait for all of the tasks to finish
    tasks.join()

    # Start printing results
    while num_jobs:
        result = results.get()
        print('Result:', result)
        num_jobs -= 1

Although jobs are queued sequentially, their execution is parallel, so the order in which they are completed cannot be guaranteed.

1.11 Interprocess Signal Transfer

Event classes provide a simple way to transfer state information between processes.Events can switch between set and unset states.By using an optional timeout value, the user of the event object can wait for its state to never change from set to set.

import multiprocessing
import time

def wait_for_event(e):
    """Wait for the event to be set before doing anything"""
    print('wait_for_event: starting')
    e.wait()
    print('wait_for_event: e.is_set()->', e.is_set())

def wait_for_event_timeout(e, t):
    """Wait t seconds and then timeout"""
    print('wait_for_event_timeout: starting')
    e.wait(t)
    print('wait_for_event_timeout: e.is_set()->', e.is_set())

if __name__ == '__main__':
    e = multiprocessing.Event()
    w1 = multiprocessing.Process(
        name='block',
        target=wait_for_event,
        args=(e,),
    )
    w1.start()

    w2 = multiprocessing.Process(
        name='nonblock',
        target=wait_for_event_timeout,
        args=(e, 2),
    )
    w2.start()

    print('main: waiting before calling Event.set()')
    time.sleep(3)
    e.set()
    print('main: event is set')

wait() returns when the time is up and there are no errors.The caller is responsible for checking the state of the event using is_set().

1.12 Control Resource Access

If you need to share a resource among multiple processes, you can use a Lock to avoid access conflicts in this case.

import multiprocessing

def worker_with(lock, f):
    with lock:
        fs = open(f, "a+")
        fs.write('Lock acquired via with\n')
        fs.close()

def worker_no_with(lock, f):
    lock.acquire()
    try:
        fs = open(f, "a+")
        fs.write('Lock acquired directly\n')
        fs.close()
    finally:
        lock.release()

if __name__ == "__main__":
    f = "file.txt"

    lock = multiprocessing.Lock()
    w = multiprocessing.Process(target=worker_with, args=(lock, f))
    nw = multiprocessing.Process(target=worker_no_with, args=(lock, f))

    w.start()
    nw.start()

    w.join()
    nw.join()

In this example, if the two processes do not synchronize their output stream access with locks, messages printed to the console may become tangled.

1.13 Synchronization Operation

Condition objects can be used to synchronize parts of a workflow so that some parts run in parallel and others run in sequence, even if they are in different processes.

import multiprocessing
import time

def stage_1(cond):
    """perform first stage of work,
    then notify stage_2 to continue
    """
    name = multiprocessing.current_process().name
    print('Starting', name)
    with cond:
        print('{} done and ready for stage 2'.format(name))
        cond.notify_all()

def stage_2(cond):
    """wait for the condition telling us stage_1 is done"""
    name = multiprocessing.current_process().name
    print('Starting', name)
    with cond:
        cond.wait()
        print('{} running'.format(name))

if __name__ == '__main__':
    condition = multiprocessing.Condition()
    s1 = multiprocessing.Process(name='s1',
                                 target=stage_1,
                                 args=(condition,))
    s2_clients = [
        multiprocessing.Process(
            name='stage_2[{}]'.format(i),
            target=stage_2,
            args=(condition,),
        )
        for i in range(1, 3)
    ]

    for c in s2_clients:
        c.start()
        time.sleep(1)
    s1.start()

    s1.join()
    for c in s2_clients:
        c.join()

In this example, two processes run the second phase of a job in parallel, but only if the first phase is complete.

1.14 Controlling concurrent access to resources

Sometimes it may be necessary to allow multiple worker processes to access a resource at the same time, but limit the total number.At this point we can use Semaphore to manage.

import multiprocessing
import time

def worker(s, i):
    s.acquire()
    print(multiprocessing.current_process().name + " acquire")
    time.sleep(i)
    print(multiprocessing.current_process().name + " release")
    s.release()

if __name__ == "__main__":
    s = multiprocessing.Semaphore(2)
    for i in range(5):
        p = multiprocessing.Process(target=worker, args=(s, i * 2))
        p.start()

1.15 Manage Sharing Status

Manager is responsible for coordinating the status of information shared among all its users.

import multiprocessing

def worker(d, key, value):
    d[key] = value

if __name__ == '__main__':
    mgr = multiprocessing.Manager()
    d = mgr.dict()
    jobs = [
        multiprocessing.Process(
            target=worker,
            args=(d, i, i * 2),
        )
        for i in range(10)
    ]
    for j in jobs:
        j.start()
    for j in jobs:
        j.join()
    print('Results:', d)

Because this list was created by the manager, it will be shared by all processes, and all processes will see updates to this list.In addition to lists, the manager supports dictionaries.

1.16 Shared Namespace

In addition to dictionaries and lists, Manager can create a shared Namespace.

import multiprocessing

def producer(ns, event):
    ns.value = 'This is the value'
    event.set()

def consumer(ns, event):
    try:
        print('Before event: {}'.format(ns.value))
    except Exception as err:
        print('Before event, error:', str(err))
    event.wait()
    print('After event:', ns.value)

if __name__ == '__main__':
    mgr = multiprocessing.Manager()
    namespace = mgr.Namespace()
    event = multiprocessing.Event()
    p = multiprocessing.Process(
        target=producer,
        args=(namespace, event),
    )
    c = multiprocessing.Process(
        target=consumer,
        args=(namespace, event),
    )

    c.start()
    p.start()

    c.join()
    p.join()

All named values added to Namespace are visible to all clients receiving Namespace instances.

Updates to variable-value content in the namespace do not propagate automatically.

import multiprocessing

def producer(ns, event):
    # DOES NOT UPDATE GLOBAL VALUE!
    ns.my_list.append('This is the value')
    event.set()

def consumer(ns, event):
    print('Before event:', ns.my_list)
    event.wait()
    print('After event :', ns.my_list)

if __name__ == '__main__':
    mgr = multiprocessing.Manager()
    namespace = mgr.Namespace()
    namespace.my_list = []

    event = multiprocessing.Event()
    p = multiprocessing.Process(
        target=producer,
        args=(namespace, event),
    )
    c = multiprocessing.Process(
        target=consumer,
        args=(namespace, event),
    )

    c.start()
    p.start()

    c.join()
    p.join()

To update this list, you need to associate it with a namespace object again.

1.17 Process Pool

In some cases, the work to be done can be broken down and distributed independently across multiple worker processes, and in this simple case, a Pool class can be used to manage a fixed number of worker processes.The returns from each job are collected and returned as a list.The pool parameters include the number of processes and the function to run when the task process is started (called once per subprocess).

import multiprocessing

def do_calculation(data):
    return data * 2

def start_process():
    print('Starting', multiprocessing.current_process().name)

if __name__ == '__main__':
    inputs = list(range(10))
    print('Input   :', inputs)

    builtin_outputs = list(map(do_calculation, inputs))
    print('Built-in:', builtin_outputs)

    pool_size = multiprocessing.cpu_count() * 2
    pool = multiprocessing.Pool(
        processes=pool_size,
        initializer=start_process,
    )
    pool_outputs = pool.map(do_calculation, inputs)
    pool.close()  # no more tasks
    pool.join()  # wrap up current tasks

    print('Pool    :', pool_outputs)

The result of the map() method is functionally equivalent to that of the built-in map(), except that the tasks run in parallel.Since the process pool processes inputs in parallel, close() and join() can be used to synchronize the task process with the main process to ensure proper cleanup is completed.

By default, Pool creates a fixed number of worker processes and passes jobs to them until there are no more jobs.Setting the maxtasksperchild parameter tells the pool to restart a worker process after completing some tasks to avoid long-running worker processes consuming more system resources.

import multiprocessing

def do_calculation(data):
    return data * 2

def start_process():
    print('Starting', multiprocessing.current_process().name)

if __name__ == '__main__':
    inputs = list(range(10))
    print('Input   :', inputs)

    builtin_outputs = list(map(do_calculation, inputs))
    print('Built-in:', builtin_outputs)

    pool_size = multiprocessing.cpu_count() * 2
    pool = multiprocessing.Pool(
        processes=pool_size,
        initializer=start_process,
        maxtasksperchild=2,
    )
    pool_outputs = pool.map(do_calculation, inputs)
    pool.close()  # no more tasks
    pool.join()  # wrap up current tasks

    print('Pool    :', pool_outputs)

When the pool finishes its assigned tasks, the worker process is restarted even if there is no more work to do.From the output below, you can see that although there are only 10 tasks and each worker process can complete two tasks at a time, there are eight worker processes created here.

Posted by laurton on Mon, 30 Mar 2020 18:55:34 -0700