Process Threads Cooperate Those Things

I. Processes and Threads

1. Process

Our computer applications are all processes, assuming that the computer we use is single-core, and the CPU can only execute one process at the same time. When the program is blocked by I/O, it is too wasteful for the CPU to wait with the program. The CPU will execute other programs. At this time, it involves switching. Before switching, it is necessary to save the running state of the previous program in order to recover. So something is needed to record this thing, and the concept of process can be introduced.

A process is a dynamic execution process of a program on a data set. The process consists of three parts: program, data set and process control block. Programs are used to describe what functions of a process are and how they are accomplished; data sets are the resources used in the execution of a program; process control blocks are used to preserve the state of the program.

2. Threads

Many threads can be opened in a process. Why should there be a process instead of a thread? Because in a program, threads share a set of data. If they are all made into processes and each process has a single piece of memory, then the set of data should be copied several copies to each program, which is unreasonable, so there are threads.

Threads, also known as lightweight processes, are a basic unit of cpu execution and the smallest unit in the process of program execution. A process will have at least one main thread, in which threading module is used in sub-threading.

3. The relationship between process threads

(1) A thread can only belong to one process, and a process can have multiple threads, but at least one thread.

(2) Resources are allocated to the process, which is the main body of the program. All threads of the same process share all resources of the process.

(3) The cpu is allocated to threads, that is, threads are really running on the cpu.

(4) Thread is the smallest execution unit and process is the smallest resource management unit.

4. Parallelism and concurrency
Parallel processing is a computational method that can perform two or more tasks simultaneously in a computer system. Parallel processing can work in different aspects of the same program at the same time.

Concurrent processing is that several programs are running in a single cpu in the same period of time, but only one program runs on the cpu at any time.

The focus of concurrency is to have the ability to handle multiple tasks, not necessarily at the same time; while the focus of concurrency is to have the ability to handle multiple tasks at the same time. Parallelism is a subset of concurrency

What I said above is that Python has a GIL lock, which limits the use of cpu by only one thread of a process at the same time.

II. threading module

The function of this module is to create new threads. There are two ways to create threads:

1. Direct creation

import threading
import time

def foo(n):
    print('>>>>>>>>>>>>>>>%s'%n)
    time.sleep(3)
    print('tread 1')

t1=threading.Thread(target=foo,args=(2,))
#arg The tuple must follow. t1 Is the created subthread object
t1.start()#Running subprocesses

print('ending')

The above code creates a sub-thread in the main thread

The result is: Print > > > > > > 2, print ending, and then wait 3 seconds to print thread 1.

2. Another way is to create threaded objects by inheriting classes

import  threading
import time

class MyThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        print('ok')
        time.sleep(2)
        print('end')

t1=MyThread()#Creating Thread Objects
t1.start()#Activate Thread Objects
print('end again')

3.join () method

The function of this method is that the parent thread of the child thread will wait for the child thread to run until it has finished running.

import threading
import time

def foo(n):
    print('>>>>>>>>>>>>>>>%s'%n)
    time.sleep(n)

    print('tread 1')
def bar(n):
    print('>>>>>>>>>>>>>>>>%s'%n)
    time.sleep(n)
    print('thread 2')
s=time.time()
t1=threading.Thread(target=foo,args=(2,))
t1.start()#Running subprocesses

t2=threading.Thread(target=bar,args=(5,))
t2.start()

t1.join()     #It just blocked the main thread from running. t2 No problem
t2.join()
print(time.time()-s)
print('ending')
'''
//Operation results:
>>>>>>>>>>>>>>>2
>>>>>>>>>>>>>>>>5
tread 1
thread 2
5.001286268234253
ending
'''

4.setDaemon() method

The purpose of this method is to declare a thread as a daemon thread, which must be set before the start() method calls.

By default, the main thread checks whether the sub-thread is completed after running, and if it is not, the main thread waits for the sub-thread to complete before exiting. But if the main thread exits after it has finished running without pipe, set Daemon (True)

import  threading
import time

class MyThread(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)

    def run(self):
        print('ok')
        time.sleep(2)
        print('end')

t1=MyThread()#Creating Thread Objects
t1.setDaemon(True)
t1.start()#Activate Thread Objects
print('end again')
#The result is printed immediately. ok and end again 
#Then the program terminates and does not print. end

By default, the main thread is non-daemon thread, and the sub-threads are inherited main threads, so the default is non-daemon thread as well.

5. Other methods

isAlive(): Whether the return thread is active

getName(): Returns the thread name

setName(): Set the thread name

threading.currentThread(): Returns the current thread variable

threading.enumerate(): Returns a list of running threads

threading.activeCount(): Returns the number of threads running

III. All kinds of locks

1. Synchronization Lock (User Lock, Mutex Lock)

Let's start with an example:

The requirement is that we have a global variable with a value of 100. We have 100 threads. The operations performed by each thread are to subtract the global variable by one, and finally import threading.

import threading
import time

def sub():

    global num
    temp=num

    num=temp-1
    time.sleep(2)
num=100


l=[]for i in range(100):
    t=threading.Thread(target=sub,args=())
    t.start()
    l.append(t)
for i in l:
    i.join()

print(num)

It seems that everything is normal. Now let's change it. In the middle of temp=num and num=temp-1 of the sub function, add a time.sleep(0.1), and you will find something wrong. The result will be 99 printed in two seconds and changed to time.sleep(0.0001). The result is uncertain, but it's more than 90. What's the matter?

This is about the GIL lock in Python. Let's give it a stroke:

For the first time, a global variable num=100 is defined, and then 100 sub-threads are opened. But Python's GIL lock restricts only one thread to use the CPU at the same time, so the 100 threads are in the state of grabbing the lock. Whoever grabs it can run its own code. In the initial case, each thread grabs the CPU and immediately performs the operation of subtracting one global variable, so there will be no problem. But after the change, before the global variable is subtracted by one, let him sleep for 0.1 seconds, the program falls asleep, the CPU can not wait for this thread all the time. When this thread is blocked by I/O, other threads can grab the CPU again. So other threads grab it and start executing the code. We must know that 0.1 seconds has been a long time for the operation of the cpu, and this time is enough for the first time. Before one thread wakes up, the other threads grab the CPU once. They get 100 nums, and when they wake up, they perform 100-1 operations, so the final result is 99. The same reason, if the sleep time is shorter, it becomes 0.001, maybe when the 91 threads first grab the cpu, the first threads have waked up and changed the global variables. So the 91 threads get 99 global variables, and then the second and third threads wake up and modify the global variables respectively, so the final result is an unknown number. Understand the process with a picture

This is the thread security problem, as long as it involves threads, there will be this problem. The solution is to lock

We add a lock to the whole world, lock the operation involving data operation with the lock, and turn this code into serial code.

import threading
import time

def sub():

    global num
    lock.acquire()#Acquisition locks
    temp=num
    time.sleep(0.001)

    num=temp-1
    lock.release()#Release lock
    time.sleep(2)
num=100


l=[]
lock=threading.Lock()
for i in range(100):
    t=threading.Thread(target=sub,args=())
    t.start()
    l.append(t)
for i in l:
    i.join()

print(num)

After acquiring the lock, it must be released before it can be retrieved again. This lock is called a user lock.

2. Deadlock and Recursive Lock

Deadlock is a phenomenon that two or more processes or threads wait for each other because of mutual restriction in the execution process. If there is no external force, they will be stuck there forever. For instance:

 1 import threading,time
 2 
 3 class MyThread(threading.Thread):
 4     def __init(self):
 5         threading.Thread.__init__(self)
 6 
 7     def run(self):
 8 
 9         self.foo()
10         self.bar()
11     def foo(self):
12         LockA.acquire()
13         print('i am %s GET LOCKA------%s'%(self.name,time.ctime()))
14         #Each thread has a default name. self.name Get the name.
15 
16         LockB.acquire()
17         print('i am %s GET LOCKB-----%s'%(self.name,time.ctime()))
18 
19         LockB.release()
20         time.sleep(1)
21         LockA.release()
22 
23     def bar(self):#and
24         LockB.acquire()
25         print('i am %s GET LOCKB------%s'%(self.name,time.ctime()))
26         #Each thread has a default name. self.name Get the name.
27 
28         LockA.acquire()
29         print('i am %s GET LOCKA-----%s'%(self.name,time.ctime()))
30 
31         LockA.release()
32         LockB.release()
33 
34 LockA=threading.Lock()
35 LockB=threading.Lock()
36 
37 for i in range(10):
38     t=MyThread()
39     t.start()
40 
41 #Operation results:
42 i am Thread-1 GET LOCKA------Sun Jul 23 11:25:48 2017
43 i am Thread-1 GET LOCKB-----Sun Jul 23 11:25:48 2017
44 i am Thread-1 GET LOCKB------Sun Jul 23 11:25:49 2017
45 i am Thread-2 GET LOCKA------Sun Jul 23 11:25:49 2017
46 And then it got stuck.

Deadlock example

In the example above, thread 2 is waiting for thread 1 to release the B lock and thread 1 is waiting for thread 2 to release the A lock, which restricts each other.

When we use mutexes, once we use more locks, it's easy to have this problem.

In Python, to solve this problem, Python provides a concept called Reusable Lock, which maintains a lock and a counter variable inside the lock. Counter records the number of acquisitions. Each acquisition, counter adds 1. Each release,counter decreases 1. Only when the value of counter is 0, other threads can get resources. Now replace Lo with RLock. Ck, it won't get stuck in operation:

 1 import threading,time
 2 
 3 class MyThread(threading.Thread):
 4     def __init(self):
 5         threading.Thread.__init__(self)
 6 
 7     def run(self):
 8 
 9         self.foo()
10         self.bar()
11     def foo(self):
12         RLock.acquire()
13         print('i am %s GET LOCKA------%s'%(self.name,time.ctime()))
14         #Each thread has a default name. self.name Get the name.
15 
16         RLock.acquire()
17         print('i am %s GET LOCKB-----%s'%(self.name,time.ctime()))
18 
19         RLock.release()
20         time.sleep(1)
21         RLock.release()
22 
23     def bar(self):#and
24         RLock.acquire()
25         print('i am %s GET LOCKB------%s'%(self.name,time.ctime()))
26         #Each thread has a default name. self.name Get the name.
27 
28         RLock.acquire()
29         print('i am %s GET LOCKA-----%s'%(self.name,time.ctime()))
30 
31         RLock.release()
32         RLock.release()
33 
34 LockA=threading.Lock()
35 LockB=threading.Lock()
36 
37 RLock=threading.RLock()
38 for i in range(10):
39     t=MyThread()
40     t.start()

Examples of recursive locks

3. Semaphore (semaphore)
This is also a lock, you can specify how many threads can get the lock at the same time, up to five (the mutex mentioned above can only be obtained by one thread)

4.Event objects
Threads run independently. Event objects are needed if there is communication between threads, or if a thread needs to perform the next operation according to the state of a thread. Event object can be regarded as a flag bit, the default value is false. If a thread waits for an Event object and the flag bit in the Event object is false at this time, the thread will wait until the flag bit is true. After the flag bit is true, all threads waiting for the Event object will be awakened.

Official documents say queues are very useful for data security in multithreading

Queues can be understood as a data structure that stores and reads and writes data. It's like adding a lock to the list.

There is a default parameter block=True in the get method. Change this parameter to False, and queue.Empty will be wrong if the value is not reached.

Join is used to block processes, and it makes sense to use it in conjunction with task_done. Event object can be used to understand, no times put(), join counter plus 1, no times task_done (), counter minus 1, counter is 0, before the next put()

The queues mentioned above are FIFO mode, and there are LIFO mode and priority queue.

There's a global lock (GIL) in python that prevents multithreading from using multicores, but if it's a multiprocess, the lock won't be limited. How to open multiple processes, you need to import a multiprocessing module

Although multi-process can be opened, we must pay attention to not too much, because inter-process switching consumes system resources very much. If thousands of sub-processes are opened, the system will collapse, and inter-process communication is also a problem. Therefore, processes can be used or not, and processes can be used less or less.

Each process is a separate piece of space in memory. It can share data without threads, so the queue can only be passed from parent process to child process by parameter.

The socket that I learned before is actually a pipe. The sock of the client and conn of the server are the two ends of the pipe. This is also the way to play in the process. There should also be two ends of the pipe.

We have realized communication between processes through process queue and pipeline, but we have not realized data sharing yet.

Data sharing between processes needs to be implemented by referencing a manager object. All data types used are created by using manager points.

The function of the process pool is to maintain a maximum amount of processes. If the maximum amount is exceeded, the program will block until the available processes are known.

In hand, the world I have, say go away. Knowing the coroutines, the process threads mentioned above will be forgotten.

Think back and forth about the word "yield". Familiar with it, no, that's the one used by the generator. Yield is an amazing thing, which is a feature of Python.

Normally, the function stops when it encounters return, and then returns the value after return. By default, None, yield and return are very similar, but when it encounters yield, it does not stop immediately, but pauses until it encounters next(), (the principle of for loop is also next()). You can also follow a variable in front of the field by sending () to the field to store the value in the variable in front of the field.

Look at the above example, the whole process does not appear lock, but also to ensure data security, more importantly, can control the order, elegant realization of concurrency, throw off multi-threaded streets.

Threads are called micro-processes, and co-processes are called micro-threads. The coroutine has its own register context and stack, so it can retain the state of the last call.

This module encapsulates yield, which makes program switching very convenient, but it can not achieve the function of value transfer.

gevent provides better collaboration support for Python. Its basic principles are:

When a Greenlet encounters an IO operation, it will automatically switch to another greenlet, wait for the IO operation to complete, and then switch back, so as to ensure that there is always Greenlet running, not waiting.

We take IO for example. It involves two system objects, one is the thread or process that calls the IO, the other is the system kernel. When reading data, it will go through two stages:

The process of copying data from the kernel state to the user state (because the data transmission of the network is realized by the physical device, which is hardware and can only be processed by the kernel state of the operating system, but the reading data is used by the program, so the switch of this step is needed).

Under linux, the default sockets are blocking. Looking back on the sockets we used before, sock and conn are two connections. The server can only monitor one connection at the same time, so if the server is waiting for the client to send messages, other connections can not connect to the server.

In this mode, waiting for data and replicating data all need to wait, so the whole process is blocked.

After the server establishes the connection, with this command, it becomes a non-blocking IO mode.

In this mode, if there is data, it will fetch it, and if there is no error, it can add an exception capture. It is not blocked while waiting for data, but it is blocked when copy ing data.

The advantage is that the waiting time can be used, but the disadvantage is also obvious: there are many system calls, which consume a lot; and when the program does something else, the data arrives, although it will not be lost, but the data received by the program is not real-time.

Now let's replace the first role of accept with select. The advantage of select is that it can listen to many objects, no matter which object activity is, it can react and collect the active objects into a list.

But the function of establishing connection is accept. With this, we can realize tcp chat in a concurrent way.

Only when a connection is established, the sock is active, and the list will have this object. If after the connection is established, the active object is not sock, but conn in the process of sending and receiving messages. So in practice, it is necessary to judge whether the object in the list is sock.

In this model, the process of waiting for data and copy ing data is blocked, so it is also called full blocking. Compared with blocking IO model, the advantage of this model is to handle multiple connections.

Only select is supported under windows, but in linux, all three are supported. Epoll is the best, the only advantage of select is that it can be used on many platforms, but the disadvantage is also obvious, that is, the efficiency is very poor. poll is the intermediate transition between epoll and select. Compared with select, there is no limit to the number of polls that can be monitored. There is no maximum connection limit for epoll, and the monitoring mechanism is completely changed. The selection mechanism is polling (every data is checked once, even if it finds a change, it will continue to check). The epoll mechanism is a callback function, which calls the callback function if any object changes.

This mode is non-blocking in the whole process, only non-blocking in the whole process can be called asynchronism. Although this mode looks good, but in practice, if the request volume is large, the efficiency will be very low, and the task of the operating system is very heavy.

If you learn this module, you don't need to use select, poll, or epoll. Their interfaces are all this module. We just need to know how to use this interface and what it encapsulates, without considering it.

In this module, the binding of sockets and functions uses a regesier() method. The usage of the module is very fixed. The example of the server side is as follows:

Programmer Group

Process Threads Cooperate Those Things

Hot Keywords