Python process and thread nanny teaching, a machine with multiple hands

Keywords: Python ElasticSearch crawler Data Analysis

How important are process threads? At the beginning of learning Python, you may not feel it, because the code you write can be executed from top to bottom, but in fact, it is very elementary. In order to make full use of the computer configuration to speed up the program progress, we often use multi process and multi thread when actually developing and writing projects.

For example, in our crawler, a program without multiprocessing and multithreading is like working with only one hand. After multiprocessing and multithreading is turned on, several dozens of hands are working. You need 10 minutes to climb the data, and others may finish it in less than one minute.

Private letter Xiaobian 01 can obtain a large number of Python learning resources

Process thread is also the knowledge of the last section of Python introduction. Basically, my Python novice tutorial series is about to be updated.

Before starting the process and thread teaching of Python, there are some basic concepts to explain to you.

1, Multitasking operating system

The operating system can perform multiple tasks. For example, in our Windows system, in addition to the tasks currently being performed and you can see, there are many tasks being performed in the background. You can call out the task manager with Ctrl+Alt+Del and have a look.

I often see the properties of several core processors in my computer configuration. For example, my computer is 12 core, that is, the computer can execute 12 tasks at most and run 12 processes at most at the same time.

But why can our computers run hundreds of tasks at the same time?

In fact, this is due to the task scheduling of the operating system. Most operating systems schedule in the form of preemptive time slices. The system can quickly switch between multiple tasks in A very small time. For example, the 8-core operating system can theoretically only execute 8 tasks at the same time in one second, but the system may switch between hundreds of tasks in one second. Task A is executed once, Task B is executed once C task execution... As A result, many tasks can be executed within 1 second, resulting in hundreds of tasks visible to the naked eye.

The term is "macro parallel, micro serial". In fact, the computer can only execute the number of tasks that do not exceed the number of configured cores in extreme time, and 8 cores can only execute 8 tasks.

1. What is a process?

Now that we have talked about tasks, a process is a task. A process is equivalent to a task. It is the smallest unit for the operating system to allocate resources. In python, multitasking can be accomplished by using process, which is a way to realize multitasking.

2. What is a thread?

Multiple subtasks of a process are called threads. Threads are the smallest execution unit of a process. A process can have many threads, and the tasks executed by each thread are different.

Python supports both multiprocessing and multithreading. Next, we will start to learn about Python processes and threads.

2, multiprocessing in Python (package)

If you use multiple processes, your Python code is executed line by line from beginning to end, which is actually executing a process, which should be well understood.

To make more use of CPU resources, we can use multiprocessing. Here is a package commonly used in Python multiprocessing. It has many functions, such as subprocessing, communication, sharing, execution in different forms, etc. Let's learn about some commonly used.

1.Process - process class

Process is a process class in multiprocessing, through which multiple processes can be realized. Let's take a look at its usage first, and then we will have practical examples.

Process(target,name,args,kwargs)
  • Target is the target. Where is the new process to be executed by the system? You have to give the system a goal.
  • Name is the name of the process. You can set it or not. The default is Process-N, n is from 1,2,3... N, and the system takes the name from small to large by default.
  • args and kwargs are parameters that can be passed to the target.

There are many methods in Process, and the most commonly used method is start() to start the Process.

Process name.start()	#Start process

For example, the written code is as follows. I want to see the effect of opening and not opening the multi process call function.

import time#2 functions to be executed simultaneously def music():    
for i in range(5):  #5 times        
print("Listening to music...")        
time.sleep(0.2) 
#Delay 0.2s to make the effect comparison more obvious. def movie():    
for i in range(5):        
print("Watch the video...")        
time.sleep(0.2) 
#Delay 0.2smusic()movie()
print("Main process execution completed")

When multiple processes are not started, the execution effect is as follows:

It can be seen that this is a very normal operation. The program runs from top to bottom line by line. If the three loops in music() are not executed, they will not be executed in movie (). If the two functions are not executed, they will not execute the print in the last line ("main process execution is completed").

Let's look at adding multiple processes to the code of the above case:

import timeimport multiprocessing
# 2 functions to be executed simultaneously def music():    
for i in range(5):  
# 5 times        
print("Listening to music...")        
time.sleep(0.2)  
# Delay 0.2s to make the effect comparison more obvious. def movie():    
for i in range(5):        
print("Watch the video...")        
time.sleep(0.2)  
# Delay 0.2sif__ name__ ==  "__main__":  
# Solve the recursion problem when calling package under Windows system    
# Create child process    
music_process = multiprocessing.Process(target=music)    
movie_process = multiprocessing.Process(target=movie)    
# Enable process music_process.start()    
movie_process.start()    
print("Main process execution completed")

I added an IF statement to the code to judge__ name__ Well, why? Because in Windows system, the multiprocessing package will be recursive, that is, it will be executed repeatedly between "import module and call module". If you don't believe it, you can remove the if statement and report an error if you put all the code outside. This is a phenomenon that will occur in Windows system. mac, linux and other systems don't need to add ifl for judgment.

About__ name__ = I talked about "main" during the initialization of modules and packages. If you don't understand it, you can go back and have a look.

Operation effect

It can be seen that after the process is started, there are three processes running at the same time. One is the main process executing from top to bottom, and the following output is "main process execution completed". The other two sub processes execute music () and movie() processes. From their execution speed, they are running at the same time, Therefore, it is not necessary to wait for the code in one of the functions to execute three times before starting the second function.

For the same code, your execution effect may be different from mine, because the effect is randomly allocated according to the current situation of the system, but it does not affect you to see that its result is multithreading.

Finally, as we mentioned earlier, args and kwargs can pass parameters in Process. Args is the transfer of general parameters. Kwargs passes parameters in the form of dictionary. Let's take the above code as an example.

2. Get the number of the current process

We mentioned earlier that there are multiple processes performing tasks at the same time during code execution. How can we view the number of the current process to know which processes are running? Which are the main processes and which are the child processes? Three methods. Let's take a look at the methods first, and then use them together with examples.

(1) Get the number of the current process:

You need to use the getpid() method in an os module. The usage is as follows:

os.getpid()

(2) Gets the name of the current process

The multiprocessing package is still used here. There is a current in it_ The method of process() is used as follows:

multiprocessing.current_process()

(3) Gets the number of the current parent process (main process)

Which parent process does the child process belong to? This uses getppid() in the os module. The usage is as follows:

os.getppid()

Then we can see the methods. Let's get and print the name and number of the current process and the number of the parent process based on the example just now.

import timeimport multiprocessingimport os
# 2 functions to be executed simultaneously def music():    
print("music Child process name:", multiprocessing.current_process())    
print("music Subprocess No.:", os.getpid())    
print("music Number of main process:", os.getppid())    
for i in range(5):  
# 5 times        
print("Listening to music...")        
time.sleep(0.2)  
# Delay 0.2s to make the effect comparison more obvious. def movie(a, b):    
print("movie Child process name:", multiprocessing.current_process())    
print("movie Subprocess No.:", os.getpid())    
print("movie Number of main process:", os.getppid())    
for i in range(5):        
print("Watch the video...")        
time.sleep(0.2)  
# Delay 0.2sif__ name__ ==  "__main__":  
# Solve the recursion problem when calling package under Windows system    
# Create child process    
music_process = multiprocessing.Process(target=music)    
movie_process = multiprocessing.Process(target=movie, kwargs={"a": 30, "b": 40})    
# Enable process    
music_process.start()    
movie_process.start()   
print("Main process number:",os.getpid())

Operation results:

Yes, as long as we use the method of obtaining threads, the number and name can be printed.

3, Multithreading module

Multiple processes can run several tasks at the same time. As we mentioned earlier, the minimum unit of a process is a thread, so a thread can also perform multiple tasks. If a process has only one task (main process), it can also be said to have only one thread. For example, when we do not use multi process to run code, we can say one main process or one main thread.

1. Multithreaded Thread class

A commonly used module for multithreading is threading. There is a class that teaches Thread, which is similar to the Process class we used in multithreading. Let's take a look at the usage first:

Thread(target=None,name=None,args=(),kwargs=None)
  • Target: executable target
  • Name: the name of the thread. The default is Thread-N
  • args/kwargs: target parameters

Similarly, multithreading should also have a method to start, which is similar to the previous one:

start()

There is also a method to get the thread Name:

threading.current_thread()

Knowing these knowledge points, we begin to give examples: use similar examples to the above to use our multithreading.

import threading,timedef music(name,loop):    
for i in range(loop):        
print("Listen to the music %s , The first%s second"%(name,i))        
time.sleep(0.2)def movie(name,loop):    
for i in range(loop):        
print("watch movie%s , The first%s second"%(name,i))        
time.sleep(0.2)if __name__ =="__main__":    
music_thread = threading.Thread(target=music,args=("The closest person",3))    
movie_thread = threading.Thread(target=movie,args=("Tang Tan 2",3))    music_thread.start()    
movie_thread.start()    
print("Main thread execution completed")

Operation results:

The closest person to listening to music , The 0th time to see a movie Tang Tan 2 , The 0th time the main thread is completed, the closest person to listen to music , The first time to see a movie Tang Tan 2 , The first time to see a movie Tang Tan 2 , The closest person to listening to music for the second time , 2nd time

It can be seen that our multithreading is actually similar to multithreading. It can also run multiple tasks. Here we also add the use of parameters.

2. Inherit Thread class

In addition to using the above method to implement multithreaded tasks, we can also implement multithreading by inheriting classes.

For example: print "cool" and "hair gone" through multithreading.

import threading,time
#Create class MyThread(threading.Thread) for multithreading:    
def __init__(self,name):    
#initialization        
super().__init__()  #Call the initialization method of the parent class Thread        
self.name = name    
#name becomes an instance property    
def run(self):        
#Thread to do        
for i in range(5):            
print(self.name)            
time.sleep(0.2) 
#Instantiate child thread
t1 = MyThread("be doomed")
t2 = MyThread("The hair is gone")
t1.start()
t2.start()

The class MyThread is created by ourselves. It inherits from the parent class threading.Thread. At the same time, we need to write the initialization method of MyThread and make preparations every time it is called. We have also talked about super (). int (). We have talked about it in the previous object-oriented article. If you don't understand it, you can take a look at the content of the object-oriented article.

Operation results:

Cool hair is gone cool hair is gone cool hair is gone cool hair is gone cool hair is gone cool hair is gone

There are random effects. Your effect may be different from mine. When running multithreaded code on each computer, whichever thread can grab the time slice will execute first.

Multithreading can be implemented through class Thread inheritance.

epilogue

After the process thread is finished, basically all the knowledge points of Python introduction are finished, and the rest is an additional chapter. Basically, from the beginning of Python foundation to the later advanced programming section, if you finish learning, you can go in any direction of advanced python. Come on!

Posted by jonathandg on Mon, 29 Nov 2021 03:07:19 -0800