py concurrent programming (threads, processes, coroutines)

Keywords: Python less Programming

I. Operating System

Operating system is a system program used to coordinate, manage and control computer hardware and software resources. It is located between hardware and application.
Programs are software running on the system with certain functions, such as browsers, music players and so on. The definition of the operating system's kernel: The operating system's kernel is a management and control program that manages all the physical resources of the computer, including file system, memory management, device management and process management.

Processes and threads

Process:
    If there are two programs A and B, program A needs to read a large amount of data input (I/O operation) in the process of half execution.
    At this time, the CPU can only wait quietly for task A to read the data before continuing to execute, which wastes CPU resources in vain.
    In the process of program A reading data, let program B execute. When program A reads the data, let program B pause, and then let program A continue to execute?
    Sure, but here's a key word: switch
    Since it is switching, it involves state preservation, state recovery, and the system capital required by program A and program B.
    Sources (memory, hard disk, keyboard, etc.) are different. Naturally, you need something to record programs A and B.
    What resources are needed, how to identify program A and program B, and so on, so there is an abstract concept called process.

Process definition:
    A process is a dynamic execution process of a program on a data set.
    Processes generally consist of three parts: program, data set and process control block.

    Data set is the resource that the program needs to use in the process of execution.
    The process control block is used to record the external characteristics of the process and describe the process of execution and change. The system can use it to control and manage the process. It is a system.
    Unified perception is the only sign of the existence of a process.

Threading:
   Threads are designed to reduce the consumption of context switching, improve the concurrency of the system, and break through the defect that a process can only do one thing.
Make concurrency into the process possible.
   Thread is also called lightweight process. It is a basic CPU execution unit and the smallest unit in the process of program execution. Thread ID, program
 The counter, register set and stack are composed together.
   The introduction of threads reduces the overhead of concurrent execution of programs and improves the concurrency performance of operating systems.
   Threads do not have their own system resources.
Relational differences between threaded processes:
  A program has at least one process and a process has at least one thread.
  Processes have independent memory units in the execution process, and multiple threads share memory, which greatly improves the efficiency of the program.
  Threads are different from processes in execution. Each separate thread has an entry to run a program, a sequential execution sequence, and
  Export of 4 procedures. However, threads can not be executed independently. They must be controlled by multiple threads provided by the application program depending on the application program.
  5. Process is a running activity of a program with certain independent functions on a data set. Process is the system's resource allocation and allocation.
  An independent unit of six degrees.
  Threads are an entity of a process and the basic unit of CPU scheduling and allocation. Threads are smaller than processes and can run independently.
  8. Basically, they do not own system resources. They only have a few essential resources (such as program counters, a set of registers and stacks) in operation.
  It can share all the resources owned by a process with other threads belonging to the same process.
 One thread can create and revoke another thread; multiple threads in the same process can execute concurrently.
But python's GIL (Global Interpretation Lock) limits the number of threads scheduled by the CPU at the same time, that is to say, no matter how many threads you start, how many CPUs you have, Python allows only one thread to run at the same time. In order to achieve multi-CPU parallel effect, only multi-threading + co-operation can be opened.
For IO-intensive tasks, python's multithreading makes sense, and for computing-intensive tasks
python is not applicable.

Thread and threading module

The threading module is based on the threading module. Threading module processes and controls threads in a low-level and original way. threading module provides a more convenient api to process threads by encapsulating threads twice.

Direct call:
import threading
import time

def sayhi(num): #Define the functions to be run by each thread

    print("running on number:%s" %num)

    time.sleep(3)

if __name__ == '__main__':

    t1 = threading.Thread(target=sayhi,args=(1,)) #Generate a thread instance
    t2 = threading.Thread(target=sayhi,args=(2,)) #Generate another thread instance

    t1.start() #Startup thread
    t2.start() #Start another thread

    print(t1.getName()) #Get the thread name
    print(t2.getName())
Inheritance calls:
import threading
import time

class MyThread(threading.Thread):
    def __init__(self,num):
        threading.Thread.__init__(self)
        self.num = num

    def run(self):#Define the functions to be run by each thread

        print("running on number:%s" %self.num)

        time.sleep(3)

if __name__ == '__main__':

    t1 = MyThread(1)
    t2 = MyThread(2)
    t1.start()
    t2.start()

    print("ending......")
Examples of threading.thread ing
import threading
from time import ctime,sleep
import time

def ListenMusic(name):

        print ("Begin listening to %s. %s" %(name,ctime()))
        sleep(3)
        print("end listening %s"%ctime())

def RecordBlog(title):

        print ("Begin recording the %s! %s" %(title,ctime()))
        sleep(5)
        print('end recording %s'%ctime())


threads = []


t1 = threading.Thread(target=ListenMusic,args=('Seaman',))
t2 = threading.Thread(target=RecordBlog,args=('python thread',))

threads.append(t1)
threads.append(t2)

if __name__ == '__main__':

    for t in threads:
        #t.setDaemon(True) #Note: Be sure to set it before start
        t.start()
        # t.join() The parent thread of the child thread will be blocked until the child thread has finished running.
    # t1.join()
    t1.setDaemon(True)

    #t2.join()########Consider the results under these three join locations?
    print ("all over %s" %ctime())

setDaemon(True):

To declare a thread as a daemon thread, it must be set before the start() method call. If it is not set as a daemon thread, the program will be suspended indefinitely. This method is basically the opposite of join.

When we run a program, we execute a main thread, and if the main thread creates another sub-thread, the main thread and the sub-thread run separately, then when the main thread completes the exit, we will check whether the sub-thread completes. If the sub-thread is not completed, the main thread will wait for the sub-thread to complete before exiting. But sometimes what we need is to exit with the main thread as long as the main thread completes, regardless of whether the sub-thread completes or not, then we can use the setDaemon method.

Other methods:

# run(): a run method for automatically executing thread objects after a thread is scheduled by the cpu
 # start(): Start thread activity.
# isAlive(): Returns whether the thread is active.
# getName(): Returns the thread name.
# setName(): Set the thread name.

Some methods provided by threading module:
# threading.currentThread(): Returns the current thread variable.
# threading.enumerate(): Returns a list of running threads. Running refers to threads that start and end, excluding threads that start and terminate.
# threading.activeCount(): Returns the number of threads running with the same results as len(threading.enumerate()).

3. Synchronization lock

import time
import threading

def addNum():
    global num #Get this global variable in each thread
    #num-=1

    temp=num
    #print('--get num:',num )
    time.sleep(0.1)#Whentime.sleep(0.1)  /0.001/0.0000001 Sometimes the result is wrong.
    num =temp-1 #Operate - 1 on this common variable

num = 100  #Setting a shared variable
thread_list = []
for i in range(100):#Start 100 threads, each of which executes addNum
    t = threading.Thread(target=addNum)
    t.start()
    thread_list.append(t)

for t in thread_list: #Waiting for all threads to finish executing
    t.join()

print('final num:', num )
   When the time slice of CPU time polling is less than the execution time of threads, when multiple threads call global variables, it will cause multiple threads to perform the same task, resulting in erroneous results and thread insecurity.
join causes serialization and loses the meaning of threads, so synchronous locks are needed to prohibit CPU time polling to switch other threads during the execution of tasks.
R=threading.Lock()

####
def sub():
    global num
    R.acquire()#Lock up
    temp=num-1
    time.sleep(0.1)
    num=temp
    R.release()#Unlock
Thread deadlocks and recursive locks
Thread deadlock:
   When multiple resources are shared between threads, if two threads occupy part of the resources and wait for each other's resources at the same time, it will cause deadlock, because the system judges that this part of the resources are being used, and all the two threads will wait without external force. Here is an example of a deadlock:
import threading,time

class myThread(threading.Thread):
    def doA(self):
        lockA.acquire()
        print(self.name,"gotlockA",time.ctime())
        time.sleep(3)
        lockB.acquire()
        print(self.name,"gotlockB",time.ctime())
        lockB.release()
        lockA.release()

    def doB(self):
        lockB.acquire()
        print(self.name,"gotlockB",time.ctime())
        time.sleep(2)
        lockA.acquire()
        print(self.name,"gotlockA",time.ctime())
        lockA.release()
        lockB.release()

    def run(self):
        self.doA()
        self.doB()
if __name__=="__main__":

    lockA=threading.Lock()
    lockB=threading.Lock()
    threads=[]
    for i in range(5):
        threads.append(myThread())
    for t in threads:
        t.start()
    for t in threads:
        t.join()
Thread deadlocks can be avoided by using recursive locks.

lockB=threading.Lock()#-------------->lock=threading.RLock()
To support multiple requests for the same resource in the same thread, python provides a "re-entrainable lock": threading.RLock. A Lock and a counter variable are maintained inside RLock, and counter records the number of acquisitions so that resources can be acquired multiple times. Until all acquires of one thread are release d, other threads will not be able to obtain resources.

import threading,time

class myThread(threading.Thread):
def doA(self):
lockA.acquire()
print(self.name,"gotlockA",time.ctime())
time.sleep(3)
lockB.acquire()
print(self.name,"gotlockB",time.ctime())
lockB.release()
lockA.release()

def doB(self):
lockB.acquire()
print(self.name,"gotlockB",time.ctime())
time.sleep(2)
lockA.acquire()
print(self.name,"gotlockA",time.ctime())
lockA.release()
lockB.release()

def run(self):
self.doA()
self.doB()
if __name__=="__main__":

lockA=threading.Lock()
lockB=threading.Lock()
threads=[]
for i in range(5):
threads.append(myThread())
for t in threads:
t.start()
for t in threads:
t.join()
Synchronization Conditions (Event)
event = threading.Event()

# a client thread can wait for the flag to be set
event.wait()
# a server thread can set or reset it
event.set()
event.clear()

If the flag is set, the wait method doesn't do anything.
If the flag is cleared, wait will block until it becomes set again.
Any number of threads may wait for the same event.

import threading,time
class Boss(threading.Thread):
    def run(self):
        print("BOSS: Everyone has to work overtime until 22 tonight.:00. ")
        print(event.isSet())#False
        event.set()
        time.sleep(5)
        print("BOSS: <22:00>It's time to leave work.")
        print(event.isSet())
        event.set()
class Worker(threading.Thread):
    def run(self):
        event.wait() #Once event is set, it is equivalent to pass
        print("Worker: What a bitter fate!")
        time.sleep(1)
        event.clear()
        event.wait()
        print("Worker: OhYeah!")
if __name__=="__main__":
    event=threading.Event()
    threads=[]
    for i in range(5):
        threads.append(Worker())
    threads.append(Boss())
    for t in threads:
        t.start()
    for t in threads:
        t.join()

Semaphore

The semaphore is used to control the number of concurrent threads. Bounded Semaphore or Semaphore manages a built-in counter, and every time acquire() is called - 1, release() is called + 1.

The counter must not be less than 0. When the counter is 0, acquire() will block the thread to a synchronous lock state until release() is called by other threads. (Similar to the concept of parking space)

The only difference between Bounded Semaphore and Semaphore is that the former will check whether the value of the counter exceeds the initial value of the counter when release() is called, and if it exceeds, an exception will be thrown.

import threading,time
class myThread(threading.Thread):
    def run(self):
        if semaphore.acquire():
            print(self.name)
            time.sleep(5)
            semaphore.release()
if __name__=="__main__":
    semaphore=threading.Semaphore(5)#Thread concurrency is 5
    thrs=[]
    for i in range(100):
        thrs.append(myThread())
    for t in thrs:
        t.start()
Queue
Create a Queue Object
import Queue
q = Queue.Queue(maxsize = 10)
The Queue.Queue class is a synchronous implementation of a queue. The queue length can be infinite or finite. The queue length can be set by the optional parameter maxsize of the constructor of Queue. If maxsize is less than 1, the queue length is infinite.

Put a value in the queue
q.put(10)
Invoke the put() method of the queue object to insert an item at the end of the queue. put() has two parameters, the first item is required and the value of the insertion item; the second block is optional and defaults to
 1. If the queue is currently empty and block is 1, the put() method suspends the calling thread until a data unit is empty. If block is 0, the put method will throw a Full exception.

Remove a value from the queue
q.get()
Call the get() method of the queue object to delete from the queue head and return an item. The optional parameter is block, which defaults to True. If the queue is empty and the block is True,
get() suspends the calling thread until an item is available. If the queue is empty and the block is False, the queue raises an Empty exception.

The Python Queue module has three kinds of queues and constructors:
1. FIFO queue of Python Queue module. class queue.Queue(maxsize)
2. LIFO is similar to heap, i.e. FIFO. class queue.LifoQueue(maxsize)
3. Another is that the lower the priority queue level, the earlier it comes out. class queue.PriorityQueue(maxsize)

Common methods in this package (q = Queue.Queue()):
q.qsize() returns the size of the queue
 q.empty() If the queue is empty, return True, and vice versa False
 q.full() If the queue is full, return True, and vice versa False
 q.full and maxsize size correspond
 q.get([block[, timeout]) gets the queue, timeout waiting time
 q.get_nowait() is equivalent to q.get(False)
Non-blocking q.put(item) writes to the queue, timeout waiting time
 q.put_nowait(item) is equivalent to q.put(item, False)
q.task_done() After completing a task, the q.task_done() function sends a signal to the queue whose task has been completed.
q.join() actually means waiting until the queue is empty before performing other operations

Producer-consumer model
In the world of threads, producers are threads of production data and consumers are threads of consumption data. In multi-threaded development, if the producer processes quickly and the consumer processes slowly, then the producer must wait for the consumer to finish processing before continuing to produce data. In the same way, if the consumer's processing power is greater than that of the producer, then the consumer must wait for the producer. In order to solve this problem, producer and consumer models are introduced.

The producer-consumer model solves the strong coupling problem between producer and consumer through a container. Producers and consumers do not communicate directly with each other, but through blocking queues to communicate, so producers do not have to wait for consumers to process the data after production, and throw it directly to the blocking queue. Consumers do not look for producers to ask for data, but directly from the blocking queue, blocking queue is equivalent to a buffer. It balances the processing power of producers and consumers.

import time,random
import queue,threading

q = queue.Queue()

def Producer(name):
  count = 0
  while count <10:
    print("making........")
    time.sleep(random.randrange(3))
    q.put(count)
    print('Producer %s has produced %s baozi..' %(name, count))
    count +=1
    #q.task_done()
    #q.join()
    print("ok......")
def Consumer(name):
  count = 0
  while count <10:
    time.sleep(random.randrange(4))
    if not q.empty():
        data = q.get()
        #q.task_done()
        #q.join()
        print(data)
        print('\033[32;1mConsumer %s has eat %s baozi...\033[0m' %(name, data))
    else:
        print("-----no baozi anymore----")
    count +=1

p1 = threading.Thread(target=Producer, args=('A',))
c1 = threading.Thread(target=Consumer, args=('B',))
# c2 = threading.Thread(target=Consumer, args=('C',))
# c3 = threading.Thread(target=Consumer, args=('D',))
p1.start()
c1.start()
# c2.start()
# c3.start()

Multiprocess Module multiprocessing

Because of the existence of GIL, multithreading in Python is not really multithreading. If you want to make full use of the resources of multicore CPU, in python, most cases need to use multiprocesses.

The multiprocessing package is a multiprocess management package in Python. Similar to threading.Thread, it can create a process using the multiprocessing.Process object. This process can run functions written in Python programs. The Process object has the same usage as the Thread object, and also has methods of start(), run(), join(). In addition, there are Lock/Event/Semaphore/Condition classes in the multiprocessing package (these objects can be passed to each process through parameters like multithreading) to synchronize processes, which are used in the same way as the same name classes in the threading package. So, a large part of multiprocessing uses the same API as threading, but in a multi-process context.

Process calls are also similar to threads in that they can be divided into direct calls and inherited calls.

from multiprocessing import Process
import time
def f(name):
    time.sleep(1)
    print('hello', name,time.ctime())

if __name__ == '__main__':
    p_list=[]
    for i in range(3):
        p = Process(target=f, args=('alvin',))
        p_list.append(p)
        p.start()
    for i in p_list:
        p.join()
    print('end')
from multiprocessing import Process
import time

class MyProcess(Process):
    def __init__(self):
        super(MyProcess, self).__init__()
        #self.name = name

    def run(self):#Rewriting run Method
        time.sleep(1)
        print ('hello', self.name,time.ctime())


if __name__ == '__main__':
    p_list=[]
    for i in range(3):
        p = MyProcess()
        p.start()
        p_list.append(p)

    for p in p_list:
        p.join()

    print('end')
Class Process

Construction method:

Process([group [, target [, name [, args [, kwargs]]]]])

Group: Thread group, which is not yet implemented, must be None in the library reference.
target: The method to be executed;
Name: process name;
args/kwargs: The parameters to be passed into the method.

Example method:

is_alive(): Returns whether the process is running.

join([timeout]): Blocks a process in the current context until the process calling this method terminates or reaches the specified timeout (optional parameter).

start(): The process is ready for CPU scheduling

run(): strat() calls the run method, which executes the default run() method if the incoming target is not specified at the time of the instance process.

terminate(): Stop the work process immediately, regardless of whether the task is completed or not.

Properties:

daemon: The same setDeamon functionality as threads

Name: Process name.

pid: Process number.

import time
from  multiprocessing import Process

def foo(i):
    time.sleep(1)
    print (p.is_alive(),i,p.pid)
    time.sleep(1)

if __name__ == '__main__':
    p_list=[]
    for i in range(10):
        p = Process(target=foo, args=(i,))
        #p.daemon=True
        p_list.append(p)

    for p in p_list:
        p.start()
    # for p in p_list:
    #     p.join()

    print('main process end')

Interprocess communication

1. Process Queue
from multiprocessing import Process, Queue
import queue

def f(q,n):
    #q.put([123, 456, 'hello'])
    q.put(n*n+1)
    print("son process",id(q))

if __name__ == '__main__':
    q = Queue()  # q=queue.Queue() thread queue
    print("main process",id(q))

    for i in range(3):
        p = Process(target=f, args=(q,i))
        p.start()

    print(q.get())
    print(q.get())
    print(q.get())
2, pipeline
from multiprocessing import Process, Pipe

def f(conn):   #conn=child_conn
    conn.send([12, {"name":"yuan"}, 'hello'])
    response=conn.recv()
    print("response",response)
    conn.close()
    print("q_ID2:",id(child_conn))

if __name__ == '__main__':

    parent_conn, child_conn = Pipe() #Generating bidirectional pipeline
    print("q_ID1:",id(child_conn))
    p = Process(target=f, args=(child_conn,))#The main process generates a Process instance and calls function f to pass child_conn
    p.start()
    print(parent_conn.recv())   # prints "[42, None, 'hello']"
    parent_conn.send("Hello!")
    p.join()
3,Managers
Queue and pipe only realize data interaction, but not data sharing, that is, one process changes the data of another process.

A manager object returned by Manager() controls a server process which holds Python objects and allows other processes to manipulate them using proxies.
The manager object returned by manager() controls a server process that holds Python objects and allows other processes to manipulate them using proxies.
A manager returned by Manager() will support types list, dict, Namespace, Lock, RLock, Semaphore, BoundedSemaphore, Condition, Event, Barrier, Queue, Value and Array.
from multiprocessing import Process, Manager
#managers data sharing
def f(d, l,n):
    d[n] = '1'
    d['2'] = 2
    d[0.25] = None
    l.append(n)
    print("son process:",id(d),id(l))
if __name__ == '__main__':
    with Manager() as manager:
        d = manager.dict()#
        l = manager.list(range(5))
        print("main process:",id(d),id(l))
        p_list = []
        for i in range(10):
            p = Process(target=f, args=(d,l,i))
            p.start()
            p_list.append(p)
        for res in p_list:
            res.join()
        print(d)
        print(l)

Process synchronization

from multiprocessing import Process,Lock

def f(l,i):
    # with l:
    l.acquire()
    print('hello world %s' % i)
    l.release()
    #Locking process to prevent resource grabbing
if __name__ == '__main__':
    lock=Lock()

    for num in range(10):
        Process(target=f,args=(lock,num)).start()

Process pool

from  multiprocessing import Process,Pool
import time,os

def Foo(i):
    time.sleep(1)
    print(i)
    return 'hello %s' %i

def Bar(arg):
    print('hello')
    print(os.getpid())
    print(os.getppid())
    print('logger:',arg)
if __name__ == '__main__':

    pool = Pool(8) #Process pool object

    # Bar(1)
    # print("----------------")

    for i in range(100):
        #pool.apply(func=Foo, args=(i,)) synchronization interface
        #pool.apply_async(func=Foo, args=(i,)) asynchronous interface
        #A callback function is a function that is executed after an action or function has been successfully executed. Main process call
        pool.apply_async(func=Foo, args=(i,),callback=Bar)

    pool.close()
    pool.join() #close and join call order is fixed
    print('end')

Association

Co-operation, also known as micro-threading, fibre. The English name is Coroutine.

Advantage 1: High execution efficiency. Because subroutine switching is not thread switching, but controlled by the program itself, there is no overhead of thread switching, and the more threads there are, the more obvious the performance advantages of the collaboration will be.

Advantage 2: There is no need for multi-threaded locking mechanism, because there is only one thread, and there is no concurrent write variable conflict. It is better to control shared resources in the process without locking, only to judge the state, so the execution efficiency is much higher than that of multi-threading.

Because a coroutine is executed by a thread, how do you use a multi-core CPU? The simplest method is multi-process + co-process, which can make full use of multi-core and give full play to the efficiency of the co-process, and achieve very high performance.

A simple implementation of yield:

import time
import queue
def consumer(name):
    print("--->ready to eat baozi...")
    while True:
        new_baozi = yield #generator
        print("[%s] is eating baozi %s" % (name,new_baozi))
        #time.sleep(1)

def producer():

    r = con.__next__()
    r = con2.__next__()
    n = 0
    while 1:
        time.sleep(1)
        print("\033[32;1m[producer]\033[0m is making baozi %s and %s" %(n,n+1) )
        #Switching conditions
        con.send(n)
        con2.send(n+1)
        n +=2

#Concurrent effect without thread
if __name__ == '__main__':
    con = consumer("c1")
    con2 = consumer("c2")
    p = producer()
Greenlet
greenlet is a co-programming module implemented in C. Compared with python's own yield, it allows you to switch between arbitrary functions at will, without having to declare the function as generator first.
from greenlet import greenlet


def test1():
    print(12)
    gr2.switch()
    print(34)
    gr2.switch()


def test2():
    print(56)
    gr1.switch()
    print(78)


gr1 = greenlet(test1)
gr2 = greenlet(test2)
gr1.switch()
import gevent

import requests,time


start=time.time()

def f(url):
    print('GET: %s' % url)
    resp =requests.get(url)
    data = resp.text
    print('%d bytes received from %s.' % (len(data), url))

gevent.joinall([

        gevent.spawn(f, 'https://www.python.org/'),
        gevent.spawn(f, 'https://www.yahoo.com/'),
        gevent.spawn(f, 'https://www.baidu.com/'),
        gevent.spawn(f, 'https://www.sina.com.cn/'),

])

# f('https://www.python.org/')
#
# f('https://www.yahoo.com/')
#
# f('https://baidu.com/')
#
# f('https://www.sina.com.cn/')

print("cost time:",time.time()-start)
gevent
















Posted by iyia12co on Thu, 12 Sep 2019 02:04:50 -0700