Python Road [Article 16]: Python Concurrent Programming

Processes and threads

process

If there are two programs A and B, program A needs to read a large amount of data input (I/O operation) in the process of half execution.
At this time, the CPU can only wait quietly for task A to read the data before continuing to execute, which wastes CPU resources in vain.
In the process of program A reading data, let program B execute. When program A reads the data, let program B execute.
Program B pauses, and then program A continues to execute?
Sure, but here's a key word: switch
Since it is switching, it involves state preservation, state recovery, and the system capital required by program A and program B.
Sources (memory, hard disk, keyboard, etc.) are different. Naturally, you need something to record programs A and B.
What resources are needed, how to identify program A and program B, and so on, so there is an abstraction called process.

Process definition:

A process is a dynamic execution process of a program on a data set.
Processes generally consist of three parts: program, data set and process control block.
We write programs to describe what functions the process needs to accomplish and how to accomplish them.
Data set is the resource that the program needs to use in the process of execution.
The process control block is used to record the external characteristics of the process and describe the process of execution and change. The system can use it to control and manage the process. It is a system.
Unified perception is the only sign of the existence of a process.

Take an example to illustrate the process:
Imagine a computer scientist with good cooking skills baking a birthday cake for his daughter. He has a recipe for birthday cake and needs it in the kitchen.
Raw materials: flour, eggs, sugar, vanilla juice, etc. In this analogy, the recipe for making cakes is the program (that is, the algorithm described in proper form). The computer scientist is the processor (cpu).
The ingredients for making cakes are input data. The process is the sum of a series of actions by the chef, such as reading recipes, fetching raw materials and baking cakes.
Now suppose that the son of a computer scientist came in crying and said that his head had been stung by a bee. The computer scientist recorded him.
Where did you go according to the recipe (save the current status of the process)? Then take out a first aid manual and follow the instructions in it to deal with stings. this
Here, we see processors switching from one process (making cakes) to another high priority process (implementing medical treatment), each process.
Have their own procedures (recipes and first aid manuals). When the bee stings were disposed of, the computer scientist came back to make cakes from him.
Go on with that step when you leave.

thread

Threads are designed to reduce the consumption of context switching, improve the concurrency of the system, and break through the defect that a process can only do one thing.
Make concurrency into the process possible.

Suppose a text program needs to accept keyboard input, display content on the screen, and save information to the hard disk. If only
A process is bound to cause the embarrassment of doing the same thing at the same time (when saved, you can't enter content through the keyboard). If there are many
Each process is responsible for a task, process A is responsible for receiving keyboard input, and process B is responsible for displaying content on the screen.
Task, process C is responsible for saving content to the hard disk. Here, the collaboration among processes A, B and C involves the problem of process communication, and
There is something in common that needs to be owned - - - text content, and constant switching results in performance loss. If there is a mechanism, it can make
Tasks A, B and C share resources, so that context switching requires less content to be saved and restored, while reducing the amount of communication carried.
That's good for the loss of performance. Yes, this mechanism is threading.

Thread is also called lightweight process. It is a basic CPU execution unit and the smallest unit in the process of program execution. Thread ID, program
The counter, register set and stack are composed together. The introduction of threads reduces the overhead of concurrent execution of programs and improves the concurrency of operating systems.
Performance. Threads do not have their own system resources.

Relational differences between threaded processes:

1. A program has at least one process and a process has at least one thread.

2. Processes have independent memory units in the execution process, and multiple threads share memory, which greatly improves the efficiency of the program.

3. Threads are different from processes in execution. Each separate thread has an entry to run a program, a sequential execution sequence, and
Exit of the program. However, threads can not be executed independently. They must be controlled by multiple threads provided by the application program depending on the application program.

4. A process is a running activity of a program with a certain independent function on a data set. A process is a system that allocates and adjusts resources.
An independent unit of degree.
Threads are an entity of a process and the basic unit of CPU scheduling and allocation. Threads are smaller than processes and can run independently.
Basically, they do not own system resources, they only have a few essential resources in operation (such as program counters, a set of registers and stacks).
It can share all the resources owned by a process with other threads belonging to the same process.
One thread can create and revoke another thread; multiple threads in the same process can execute concurrently.

GIL of python

In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.)

The core idea is that no matter how many threads you start, how many CPUs you have, Python will calmly allow only one thread to run at the same time when it executes.

2. python thread and threading module

1. Two ways of calling threads

The threading module is based on the threading module. Threading module processes and controls threads in a low-level and original way. threading module provides a more convenient api to process threads by encapsulating threads twice.

Direct call:

import threading
import time
 
def sayhi(num): #Define the functions to be run by each thread
 
    print("running on number:%s" %num)
 
    time.sleep(3)
 
if __name__ == '__main__':
 
    t1 = threading.Thread(target=sayhi,args=(1,)) #Generate a thread instance
    t2 = threading.Thread(target=sayhi,args=(2,)) #Generate another thread instance
 
    t1.start() #Start threads
    t2.start() #Start another thread
 
    print(t1.getName()) #Get the thread name
    print(t2.getName())

Inheritance calls:

import threading
import time


class MyThread(threading.Thread):
    def __init__(self,num):
        threading.Thread.__init__(self)
        self.num = num

    def run(self):#Define the functions to be run by each thread

        print("running on number:%s" %self.num)

        time.sleep(3)

if __name__ == '__main__':

    t1 = MyThread(1)
    t2 = MyThread(2)
    t1.start()
    t2.start()
    
    print("ending......")

3. Examples of threading. threading

Join & Daemon method

import threading
from time import ctime,sleep
import time

def ListenMusic(name):

        print ("Begin listening to %s. %s" %(name,ctime()))
        sleep(3)
        print("end listening %s"%ctime())

def RecordBlog(title):

        print ("Begin recording the %s! %s" %(title,ctime()))
        sleep(5)
        print('end recording %s'%ctime())


threads = []


t1 = threading.Thread(target=ListenMusic,args=('killer',))
t2 = threading.Thread(target=RecordBlog,args=('python thread',))

threads.append(t1)
threads.append(t2)

if __name__ == '__main__':

    for t in threads:
        #t.setDaemon(True) #Note: Be sure to set it before start
        t.start()
        # t.join()
    # t1.join()
    t1.setDaemon(True)

    #t2.join()########Consider the results under these three join locations?
    print ("all over %s" %ctime())

join(): The parent thread of the child thread will be blocked until the child thread has finished running.

setDaemon(True): 

         To declare a thread as a daemon thread, it must be set before the start() method call. If it is not set as a daemon thread, the program will be suspended indefinitely. This method is basically the opposite of join.

         When we run a program, we execute a main thread. If the main thread creates a sub-thread, the main thread and the sub-thread run separately, then when the main thread completes.

         When you want to exit, it checks whether the sub-thread is complete. If the sub-thread is not completed, the main thread will wait for the sub-thread to complete before exiting. But sometimes what we need is just the main thread.

         Completed, regardless of whether the sub-thread is completed or not, it must exit with the main thread, then you can use the setDaemon method.

Other methods:

# run(): a run method for automatically executing thread objects after a thread is scheduled by the cpu
# start(): Start thread activity.
# isAlive(): Returns whether the thread is active.
# getName(): Returns the thread name.
# setName(): Set the thread name.

Some methods provided by threading module:
# threading.currentThread(): Returns the current thread variable.
# threading.enumerate(): Returns a list of running threads. Running refers to threads that start and end, excluding threads that start and terminate.
# threading.activeCount(): Returns the number of threads running with the same results as len(threading.enumerate()).

4. Synchronization Lock

import threading
import time

def addNum():
    global num #Get this global variable in each thread
    #num-=1

    temp=num
    #print('--get num:',num )
    time.sleep(0.1)
    num =temp-1 #Operate - 1 on this common variable

num = 1000  #Setting a shared variable
thread_list = []
for i in range(1000):
    t = threading.Thread(target=addNum)
    t.start()
    thread_list.append(t)

for t in thread_list: #Waiting for all threads to finish executing
    t.join()

print('final num:', num )

Observation: What are the results of time.sleep(0.1)/0.001/0.0000001, respectively?

Many threads are operating on the same shared resource at the same time, which causes resource destruction. What should we do? (join causes serialization, losing the meaning of the thread)

We can solve this problem by synchronizing locks.

R=threading.Lock()
 
####
def sub():
    global num
    R.acquire()
    temp=num-1
    time.sleep(0.1)
    num=temp
    R.release()

5. Recursive Lock

When multiple resources are shared between threads, if two threads occupy a part of the resources and wait for each other's resources at the same time, it will cause deadlock, because the system judges that this part of the resources are being used, so the two threads will wait all the time without external force. Examples of deadlocks

import threading,time

class myThread(threading.Thread):
    def doA(self):
        lockA.acquire()
        print(self.name,"gotlockA",time.ctime())
        time.sleep(3)
        lockB.acquire()
        print(self.name,"gotlockB",time.ctime())
        lockB.release()
        lockA.release()

    def doB(self):
        lockB.acquire()
        print(self.name,"gotlockB",time.ctime())
        time.sleep(2)
        lockA.acquire()
        print(self.name,"gotlockA",time.ctime())
        lockA.release()
        lockB.release()

    def run(self):
        self.doA()
        self.doB()
if __name__=="__main__":

    lockA=threading.Lock()
    lockB=threading.Lock()
    threads=[]
    for i in range(5):
        threads.append(myThread())
    for t in threads:
        t.start()
    for t in threads:
        t.join()#Wait for the thread to finish, and I'll talk about it later.

Solution: Using recursive locks, the

lockA=threading.Lock()
lockB=threading.Lock()<br>#--------------<br>lock=threading.RLock()

To support multiple requests for the same resource in the same thread, python provides a "re-entrainable lock": threading.RLock. A Lock and a counter variable are maintained inside RLock, and counter records the number of acquisitions so that resources can be acquired multiple times. Until all acquires of one thread are release d, other threads will not be able to obtain resources.

application

import time

import threading

class Account:
    def __init__(self, _id, balance):
        self.id = _id
        self.balance = balance
        self.lock = threading.RLock()

    def withdraw(self, amount):

        with self.lock:
            self.balance -= amount

    def deposit(self, amount):
        with self.lock:
            self.balance += amount


    def drawcash(self, amount):#Nested lock.acquire scenarios in lock.acquire

        with self.lock:
            interest=0.05
            count=amount+amount*interest

            self.withdraw(count)


def transfer(_from, to, amount):

    #Locks cannot be added here because other methods performed by other threads are also insecure without locks.
     _from.withdraw(amount)

     to.deposit(amount)



alex = Account('alex',1000)
yuan = Account('yuan',1000)

t1=threading.Thread(target = transfer, args = (alex,yuan, 100))
t1.start()

t2=threading.Thread(target = transfer, args = (yuan,alex, 200))
t2.start()

t1.join()
t2.join()

print('>>>',alex.balance)
print('>>>',yuan.balance)

6. Synchronization Conditions (Event)

import threading,time
class Boss(threading.Thread):
    def run(self):
        print("BOSS: Everyone has to work overtime until 23 tonight.:00. ")
        print(event.isSet())
        event.set()
        time.sleep(5)
        print("BOSS: <23:00>It's time to leave work.")
        print(event.isSet())
        event.set()
class Worker(threading.Thread):
    def run(self):
        event.wait()
        print("Worker: Hey... Hardship!")
        time.sleep(1)
        event.clear()
        event.wait()
        print("Worker: OhYeah!")
if __name__=="__main__":
    event=threading.Event()
    threads=[]
    for i in range(5):
        threads.append(Worker())
    threads.append(Boss())
    for t in threads:
        t.start()
    for t in threads:
        t.join()

Semaphore

The semaphore is used to control the number of concurrent threads. Bounded Semaphore or Semaphore manages a built-in counter, and every time acquire() is called - 1, release() is called + 1.

The counter must not be less than 0. When the counter is 0, acquire() will block the thread to a synchronous lock state until release() is called by other threads. (Similar to the concept of parking space)

The only difference between Bounded Semaphore and Semaphore is that the former will check whether the value of the counter exceeds the initial value of the counter when release() is called, and if it exceeds, an exception will be thrown.

import threading,time
class myThread(threading.Thread):
    def run(self):
        if semaphore.acquire():
            print(self.name)
            time.sleep(3)
            semaphore.release()
if __name__=="__main__":
    semaphore=threading.Semaphore(5)
    thrs=[]
    for i in range(100):
        thrs.append(myThread())
    for t in thrs:
        t.start()

8. Multithread Sharp Tool - Queue

1. Lists are insecure data structures

import threading,time

li=[1,2,3,4,5]

def pri():
    while li:
        a=li[-1]
        print(a)
        time.sleep(1)
        try:
            li.remove(a)
        except Exception as e:
            print('----',a,e)

t1=threading.Thread(target=pri,args=())
t1.start()
t2=threading.Thread(target=pri,args=())
t2.start()

Think: How to accomplish the above functions through columns?

queue is especially useful in threaded programming when information must be exchanged safely between multiple threads.

2. Queue queue class method

Create a Queue Object
import Queue
q = Queue.Queue(maxsize = 10)
The Queue.Queue class is a synchronous implementation of a queue. The queue length can be infinite or finite. The queue length can be set by the optional parameter maxsize of the constructor of Queue. If maxsize is less than 1, the queue length is infinite.

Put a value in the queue
q.put(10)
Invoke the put() method of the queue object to insert an item at the end of the queue. put() has two parameters, the first item is required and the value of the insertion item; the second block is optional and defaults to
1. If the queue is currently empty and block is 1, the put() method suspends the calling thread until a data unit is empty. If block is 0, the put method will throw a Full exception.

Remove a value from the queue
q.get()
Call the get() method of the queue object to delete from the queue head and return an item. The optional parameter is block, which defaults to True. If the queue is empty and the block is True,
get() suspends the calling thread until an item is available. If the queue is empty and the block is False, the queue raises an Empty exception.

The Python Queue module has three kinds of queues and constructors:
1. FIFO queue of Python Queue module. class queue.Queue(maxsize)
2. LIFO is similar to heap, i.e. FIFO. class queue.LifoQueue(maxsize)
3. Another is that the lower the priority queue level, the earlier it comes out. class queue.PriorityQueue(maxsize)

Common methods in this package (q = Queue.Queue()):
q.qsize() returns the size of the queue
q.empty() If the queue is empty, return True, and vice versa False
q.full() If the queue is full, return True, and vice versa False
q.full and maxsize size correspond
q.get([block[, timeout]) gets the queue, timeout waiting time
q.get_nowait() is equivalent to q.get(False)
Non-blocking q.put(item) writes to the queue, timeout waiting time
q.put_nowait(item) is equivalent to q.put(item, False)
q.task_done() After completing a task, the q.task_done() function sends a signal to the queue whose task has been completed.
q.join() actually means waiting until the queue is empty before performing other operations

3,other mode:

import queue

#First in, last out

q=queue.LifoQueue()

q.put(34)
q.put(56)
q.put(12)

#priority
# q=queue.PriorityQueue()
# q.put([5,100])
# q.put([7,200])
# q.put([3,"zhurui"])
# q.put([4,{"name":"simon"}])

while 1:

  data=q.get()
  print(data)

Producer-consumer model:

Why use the producer-consumer model

In the world of threads, producers are threads of production data and consumers are threads of consumption data. In multi-threaded development, if the producer processes quickly and the consumer processes slowly, then the producer must wait for the consumer to finish processing before continuing to produce data. In the same way, if the consumer's processing power is greater than that of the producer, then the consumer must wait for the producer. In order to solve this problem, producer and consumer models are introduced.

What is the producer-consumer model?

The producer-consumer model solves the strong coupling problem between producer and consumer through a container. Producers and consumers do not communicate directly with each other, but through blocking queues to communicate, so producers do not have to wait for consumers to process the data after production, and throw it directly to the blocking queue. Consumers do not look for producers to ask for data, but directly from the blocking queue, blocking queue is equivalent to a buffer. It balances the processing power of producers and consumers.

This is like, in a restaurant, the chef cooks the dishes, not directly communicate with customers, but to the front desk, and customers do not need to go to the dishes without the chef, directly to the front desk to collect, which is also a coupling process.

import time,random
import queue,threading

q = queue.Queue()

def Producer(name):
  count = 0
  while count <10:
    print("making........")
    time.sleep(random.randrange(3))
    q.put(count)
    print('Producer %s has produced %s baozi..' %(name, count))
    count +=1
    #q.task_done()
    #q.join()
    print("ok......")
def Consumer(name):
  count = 0
  while count <10:
    time.sleep(random.randrange(4))
    if not q.empty():
        data = q.get()
        #q.task_done()
        #q.join()
        print(data)
        print('\033[32;1mConsumer %s has eat %s baozi...\033[0m' %(name, data))
    else:
        print("-----no baozi anymore----")
    count +=1

p1 = threading.Thread(target=Producer, args=('A',))
c1 = threading.Thread(target=Consumer, args=('B',))
# c2 = threading.Thread(target=Consumer, args=('C',))
# c3 = threading.Thread(target=Consumer, args=('D',))
p1.start()
c1.start()
# c2.start()
# c3.start()

9. Multiprocess Module multiprocessing

Because of the existence of GIL, multithreading in Python is not really multithreading. If you want to make full use of the resources of multicore CPU, in python, most cases need to use multiprocesses.

The multiprocessing package is a multiprocess management package in Python. Similar to threading.Thread, it can create a process using the multiprocessing.Process object. This process can run functions written in Python programs. The Process object has the same usage as the Thread object, and also has methods of start(), run(), join(). In addition, there are Lock/Event/Semaphore/Condition classes in the multiprocessing package (these objects can be passed to each process through parameters like multithreading) to synchronize processes, which are used in the same way as the same name classes in the threading package. So, a large part of multiprocessing uses the same API as threading, but in a multi-process scenario

1. Process Call

Call mode 1

from multiprocessing import Process
import time
def f(name):
    time.sleep(1)
    print('hello', name,time.ctime())

if __name__ == '__main__':
    p_list=[]
    for i in range(3):
        p = Process(target=f, args=('alvin',))
        p_list.append(p)
        p.start()
    for i in p_list:
        p.join()
    print('end')

Call mode 2

from multiprocessing import Process
import time

class MyProcess(Process):
    def __init__(self):
        super(MyProcess, self).__init__()
        #self.name = name

    def run(self):
        time.sleep(1)
        print ('hello', self.name,time.ctime())


if __name__ == '__main__':
    p_list=[]
    for i in range(3):
        p = MyProcess()
        p.start()
        p_list.append(p)

    for p in p_list:
        p.join()

    print('end')

Example 3:

from multiprocessing import Process
import os
import time
def info(title):
  
    print("title:",title)
    print('parent process:', os.getppid())
    print('process id:', os.getpid())

def f(name):
    info('function f')
    print('hello', name)

if __name__ == '__main__':
    info('main process line')
    time.sleep(1)
    print("------------------")
    p = Process(target=info, args=('yuan',))
    p.start()
    p.join()

Posted by pngtest on Sat, 31 Aug 2019 06:41:54 -0700

Programmer Group