Explain multithreading and multiprocessing in detail. After reading it, you will understand

Keywords: Python Linux Visual Studio

What are Python's support modules for concurrent programming?

Module nameeffect
threadingRealize multithreading, and make use of the principle that the computer CPU and IO can execute at the same time, so that the CPU can access another task during the computer IO operation.
multprocessingmultprocessing
asyncioRealize asynchronous IO, and realize the asynchronous execution of function granularity by using the principle of simultaneous execution of CPU and IO in a single thread.
LockLock resources to prevent resource competition and access conflict.
QueueRealize the data communication between different threads and processes, and realize the producer consumer mode
Thread pool / process poolSimplify the task submission of threads and processes, wait for the end, obtain results, etc.
subprocessRealize the process of starting external programs (such as. exe program) and input and output interaction.
Why Python is slow

1. Dynamically typed language, executing while interpreting
2. There are no type restrictions on the definition of variables, and the data type needs to be checked at any time, resulting in performance degradation,
3. There is a lack of steps to translate the source code into machine code. The execution of machine code is very fast, but Python is very slow while translating
4. Existence of Gil:

CPU Intensive Computing & IO intensive computing

typeexplainCommon scenarios
CPU boundCPU intensive computing, also known as computing intensive, means that I/O can be completed in a very short time. The CPU needs a lot of calculation and processing. It is characterized by a very high CPU occupancy rate.Compression, decompression, regular expression search
IO intensive computing (I/O-bound)IO intensive computing refers to the operation of the system. Most of the conditions are that the CPU is waiting for I/O (hard disk / memory) reading and writing operations, and the CPU occupancy is low.File handler, network request, read / write database

Multi process & multi thread & the use of multi coprocess

nameadvantageshortcomingscene
multprocessingMulti core CPU can be used for parallel operationIt takes up the most resources and can start fewer than threadsCPU intensive computing
MultithreadingCompared with the process: it is lighter and occupies less resourcesCompared with the process: it is lighter and occupies less resourcesIO intensive computing, with a small number of tasks running at the same time
Coroutine (asyncio)Coroutine (asyncio)The supported libraries are limited (AIO HTTP supports, requests does not support), and the code implementation is complexThe supported libraries are limited (AIO HTTP supports, requests does not support), and the code implementation is complex

Multi process & multi thread implementation overview

1. Advantages of multi process

#Calculate whether one hundred larger numbers are prime numbers (CPU bound)

import math
import time
from concurrent.futures import ThreadPoolExecutor
from concurrent.futures import ProcessPoolExecutor


PRIMES = [112272535095293] * 100

def timer(f):
    def inner():
        start_time = time.time()
        f()
        print(time.time() - start_time)
    return inner

def is_prime(n):
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return  False
    sqrt_n = int(math.floor(math.sqrt(n)))
    for i in range(3, sqrt_n + 1, 2):
        if n % i == 0:
            return False
    return True

@timer
def single_thread():
    for n in PRIMES:
        is_prime(n)

@timer
def multi_thread():
    with ThreadPoolExecutor() as pool:
        pool.map(is_prime, PRIMES)

@timer
def multi_process():
    with ProcessPoolExecutor() as pool:
        pool.map(is_prime, PRIMES)

if __name__ == '__main__':
    single_thread()
    multi_thread()
    multi_process()

# 54.2112915 single thread
# 55.0219151 multithreading
# 10.2812801 multi process
#It can be seen that the speed of multithreading is lower and the speed of multiprocessing is the fastest in CPU intensive computing.

1. Common multithreading

#Speed comparison between single thread and multi thread

import time
import threading

# Task function
def run_task(task):
    print(f'Thread:{threading.current_thread().name}', 'Task in progress:', task)
    # Blocking for two seconds is used to simulate IO operations such as network requests
    time.sleep(2)


# Running five tasks in a single thread
def single_thread():
    start_time = time.time()
    for task in range(1, 6):
        run_task(task)
    print('<single_thread> cost:', time.time() - start_time)


# Multithreading runs five tasks
def multi_thread():
    start_time = time.time()
    threads = []
    # Five threads are opened here. Too many threads will have switching overhead, and the number of threads needs to be created reasonably.
    for task in range(1, 6):
        threads.append(
            threading.Thread(target=run_task, args=(task,))
        )
    # Start thread
    for thread in threads:
        thread.start()
    # Thread waiting
    for thread in threads:
        thread.join()
    print('<multi_thread> cost', time.time() - start_time)


if __name__ == '__main__':
    single_thread()
    multi_thread()


-----------------
give the result as follows:
Thread: MainThread Task in progress: 1 # All run as the main thread
 Thread: MainThread Task in progress: 2 
Thread: MainThread Task in progress: 3
 Thread: MainThread Task in progress: 4
 Thread: MainThread Task in progress: 5
<single_thread> cost: 10.036925792694092
 Thread: Thread-1 Task in progress: 1 # Threads created 1-5
 Thread: Thread-2 Task in progress: 2
 Thread: Thread-3 Task in progress: 3
 Thread: Thread-4 Task in progress: 4
 Thread: Thread-5 Task in progress: 5
<multi_thread> cost 2.003706693649292
# The speed is obviously improved. Of course, this example is only a simple demonstration, and the specific needs to be determined according to the scene.

2. Multithreading of producer consumer model

import time
import threading
import queue

# Task function
def run_task(queue_obj):
    # Judge whether the task queue is empty and obtain the next task data
    while not queue_obj.empty():
        task = queue_obj.get()
        print(f'Thread:{threading.current_thread().name}', 'Task in progress:', task)
        # Blocking for two seconds is used to simulate IO operations such as network requests
        time.sleep(2)


def single_thread(queue_obj):
    start_time = time.time()
    run_task(queue_obj)
    print('<single_thread> cost:', time.time() - start_time)

# Multithreading, open up five threads for consumption of production data, namely queue_ Task data in obj
def multi_thread(queue_obj):
    start_time = time.time()
    threads = []
    for task in range(1, 6):
        threads.append(
            threading.Thread(target=run_task, args=(queue_obj,))
        )
    for thread in threads:
        thread.start()
    for thread in threads:
        thread.join()
    print('<multi_thread> cost', time.time() - start_time)


if __name__ == '__main__':
    # Open two producers respectively and fill in task data
    queue_obj = queue.Queue()
    queue_obj2 = queue.Queue()
    for task in range(1, 6):
        queue_obj.put(task)
    single_thread(queue_obj)
    for task in range(1, 6):
        queue_obj2.put(task)
    multi_thread(queue_obj2)
# The results are the same as above

Expose thread safety issues:

import threading

# lock = threading.Lock() # Use locks to resolve conflicts
# Account class, attribute balance
class Account:
    def __init__(self, balance):
        self.balance = balance

# Withdrawal, when the requested quantity > = balance, correct; Otherwise, the returned balance is insufficient`
def draw(account, amount):
    # Lock conflict resolution
    # with lock:
        if account.balance >= amount:
            time.sleep(0.1) # Simulate thread blocking
            print(threading.current_thread().name, 'success')
            account.balance -= amount
            print(threading.current_thread().name, 'Balance:', account.balance)
        else:
            print(threading.current_thread().name, 'Sorry, your credit is running low')


if __name__ == '__main__':
    account = Account(1000)
    ta = threading.Thread(name='ta', target=draw, args=(account,800))
    tb = threading.Thread(name='tb', target=draw, args=(account,800))

    ta.start()
    tb.start()
#give the result as follows
---------------

tb Successful withdrawal
tb Balance 200
ta Successful withdrawal
ta balance -600
# The problem is exposed. When the thread is switched and the account attribute has not changed, the data will be out of sync.

Principle of thread pool:

  • The system needs to allocate resources for new threads and recycle resources for terminating threads. If the newly created threads can be saved and reused, the overhead of creating and terminating threads - thread pool can be reduced.

Advantages of thread pool:

1. Improve performance: reduce the overhead of system creation and termination, and reuse thread resources.
2. It is suitable for handling a large number of sudden requests or requiring a large number of threads to complete tasks, but the actual task processing is short.
3. Defense function: it can effectively avoid the problem of excessive system load caused by too many threads.
Usage 1:

import time
import threading
import concurrent.futures

def run_task(tasks):
    time.sleep(2)
    return f'{tasks}Execution complete'

# task list
tasks = [('task' + str(i)) for i in range(1, 6)]

start_time = time.time()
with concurrent.futures.ThreadPoolExecutor() as pool:  # Create pool
    result = pool.map(run_task, tasks)  
    # Transfer function and task list. The return value 'result' is a generator containing the return value of each task object
    results = list(zip(tasks, result))
    for i, j in results:
        print(i, j)
print(time.time() - start_time)

# The results are as follows:
Task 1 task 1 execution completed
 Task 2 task 2 execution completed
 Task 3 task 3 execution completed
 Task 4 task 4 execution completed
 Task 5 task 5 execution completed
2.0206351280212402

Method 2:

import time
import threading
import concurrent.futures

def run_task(tasks):
    time.sleep(2)
    return f'{tasks}Execution complete'

# task list
tasks = [('task' + str(i)) for i in range(1, 6)]

start_time = time.time()
with concurrent.futures.ThreadPoolExecutor() as pool:
    results = {}
    for task in tasks:
        # Transfer functions and individual tasks
        # Return the 'future' object: < future at 0x246f486b1c0 state = pending >
        # The object has a 'result' method that returns the return value of the current task
        futures = pool.submit(run_task, task)
        results[task] = futures
    for i, j in results.items():
        print(i, j.result())  # Note the call location
print(time.time() - start_time)

# The results are the same as above

Using thread pool to accelerate IO in flash

import json
import time
import flask
import cpncurrent

app = flask.Flask(__name__)



# Analog disk IO
def read_file():
    time.sleep(0.1)
    return "file result"

# Analog database IO
def connect_db():
    time.sleep(0.2)
    return "db result"

# Simulate calling apiIO
def create_api():
    time.sleep(0.3)
    return "api result"


@app.route('/')
def index():
    file_io = read_file()
    db_io = connect_db()
    api_io = create_api()
    return json.dumps({
        "file_io": file_io,
        "db_io": db_io,
        "api_io": api_io,
    })

if __name__ == '__main__':
    app.run()
# Use time curl to return results:
0.623s

Using thread pool Transformation:

import json
import time
from concurrent.futures import ThreadPoolExecutor
import flask

app = flask.Flask(__name__)
pool = ThreadPoolExecutor()  # Initialize thread pool object

def connect_db():
    time.sleep(0.5)
    return "db result"


def read_file():
    time.sleep(0.3)
    return "file result"


def create_api():
    time.sleep(0.2)
    return "api result"


@app.route('/')
def index():
    # Submit task
    file_io = pool.submit(read_file)
    db_io = pool.submit(connect_db)
    api_io = pool.submit(create_api)
    return json.dumps({
        "file_io": file_io.result(),  # Get result object
        "db_io": db_io.result(),
        "api_io": api_io.result(),
    })


if __name__ == '__main__':
    # Multi process pools are defined here
    app.run()
# Use time curl to return results:
0.318s
# Halve IO time

# A multi process pool is similar to a multi-threaded pool, but because it does not share the environment, it needs to be defined in the mian entry function.

asyncio implements asynchronous IO

import time
import asyncio
import aiohttp

urls = [
    f"https://www.cnblogs.com/sitehome/p/{page}"
    for page in range(1, 50 + 1)
]

# Define collaboration:
# The async keyword represents the steps that are called after the event cycle.
# await keyword means that when the IO arrives, it will not block, but carry out the next cycle and continue to call async code
async def async_spider(url):
    async with aiohttp.ClientSession() as session:
        async with session.get(url) as resp:
            result = await resp.text()
            print(f'{url},{len(result)}')

# Construct event loop object
loop = asyncio.get_event_loop()

# Create task list
tasks = [
    loop.create_task(async_spider(url))
    for url in urls
]

start_time = time.time()
# Cycle until the tasks in tasks are completed
loop.run_until_complete(asyncio.wait(tasks))
print(time.time() - start_time)
# 3.1056487560272217

Semaphores control the number of concurrent asynchronous IO S

import time
import asyncio
import aiohttp

urls = [
    f"https://www.cnblogs.com/sitehome/p/{page}"
    for page in range(1, 50 + 1)
]
# Define semaphore
semaphore = asyncio.Semaphore(10)

async def async_spider(url):
    async with semaphore:  # Semaphore control
        async with aiohttp.ClientSession() as session:
            async with session.get(url) as resp:
                result = await resp.text()
                await asyncio.sleep(3)
                # Ten concurrencies are achieved every three seconds of blocking
                print(f'{url},{len(result)}')

loop = asyncio.get_event_loop()

tasks = [
    loop.create_task(async_spider(url))
    for url in urls
]

start_time = time.time()
loop.run_until_complete(asyncio.wait(tasks))
print(time.time() - start_time)
# 3.1056487560272217

Posted by gillypogi on Sun, 28 Nov 2021 07:02:26 -0800