catalogue
1 overview of Python multiprocess programming
Problems and solutions to be solved:
1 overview of Python multiprocess programming
- Multithreading in Python cannot take advantage of multi-core. If you want to make full use of the resources of multi-core CPU, you need to use multi-process in most cases in Python. Python provides multiprocessing.
- The multiprocessing module is used to start sub processes and execute our customized tasks (such as functions) in the sub processes. The multiprocessing module has many functions: supporting sub processes, communicating and sharing data, performing different forms of synchronization, and providing Process, Queue, Pipe, Lock and other components.
- Unlike threads, processes do not have any shared state, and the data modified by the process is limited to the process.
2 needs and programmes
Background:
In a folder path of the server, the data packet sent from the software end will be received (if the reception is not complete, the folder name will be suffixed with '- downloading'). The data packet is in folder format, and there is a ZIP package with the same name as the folder name.
The algorithm needs to decompress the sent data packet, and then process the data in the compressed packet. After the data processing, the folder and its contents will be moved to another place for archiving.
Requirements:
The server needs to support multi concurrency.
Solution:
Considering the weakness of python multithreading, multi process implementation is adopted.
Problems and solutions to be solved:
- How to avoid the same packet being parsed by multiple processes?
- Scenario: use queues to build producer (main process) and consumer (sub process) patterns. The folder names of all newly generated packets are put into a shared queue, and all processes take the packet names from the queue.
- When the main process is killed, how to ensure that the child process exits at the same time without becoming an orphan process?
- Scheme: build the form of 'Process Group' + send SIGKILL signal to the process group
- reference resources:
3 complete code
from multiprocessing import Process, Queue import os import time import signal import zipfile import shutil def unzip_file(sample_key_pair): try: zip_name = sample_key_pair + '.zip' sampleraw_zip = os.path.join('./zip/'+ sample_key_pair, zip_name) with zipfile.ZipFile(sampleraw_zip) as z: z.extractall(path='./zip/'+ sample_key_pair, members=None, pwd=None) except Exception as e: print('Fail to unzip file: {}', e) def mv_dir(sample_key_pair): shutil.move('./zip/'+ sample_key_pair, './mv_zip') def gan_huo_de_jin_cheng(zip_Queue): print('Subprocess pid yes %s, group id is %s' % (os.getpid(), os.getppid())) while True: if not zip_Queue.empty(): # 1 extract a compressed packet sample_key_pair = zip_Queue.get() print('Subprocess:',os.getpid(),'Get compressed package',sample_key_pair) # 2 decompression unzip_file(sample_key_pair) # 3 move folder mv_dir(sample_key_pair) #Once the main process is killed, all child processes are closed. def term(sig_num, addtion): print('term current pid is %s, group id is %s' % (os.getpid(), os.getppid())) os.killpg(os.getpgid(os.getpid()), signal.SIGKILL) if __name__== '__main__': #When testing multi process programming, you need to add this, otherwise the debugging will report an error. raw_data_root = './zip' signal.signal(signal.SIGTERM, term) print('Main process pid yes %s' % os.getpid()) zip_Queue = Queue() #If the queue does not specify a size, it will use unlimited space, and the queue uses memory space zip_list = [] #Zip it_ The Queue is a copy of the same content, because the content in the Queue cannot be traversed and queried. #Create child process for i in range(3): t = Process(target=gan_huo_de_jin_cheng, args=(zip_Queue, )) t.daemon = True t.start() #After traversing the new compressed packet, it is pushed into the queue while True: time.sleep(1) #Retrieve folders once a second sample_raw_list = os.listdir(raw_data_root) if len(sample_raw_list) != 0: # 1. Retrieve whether there is a 'intact' compressed package in the folder, and the compressed package name is inconsistent with the name in the queue. for sample_raw_dir in sample_raw_list: sample_key_pair = sample_raw_dir.split('/')[-1] if(sample_key_pair.endswith("-downloading")): #If the packet has not been downloaded, skip continue # If the zip package folder is not in the queue, join the queue if sample_key_pair not in zip_list: zip_Queue.put(sample_key_pair) zip_list.append(sample_key_pair) if len(zip_list) > 100: zip_list.pop(0)#Delete the first. (when there are more than 100 zip_lists, delete one after another to prevent the zip_list from becoming larger and larger.)
Main references:
- python concurrent programming multi process (practice) - anne199534 - blog Garden
- Python 3 concurrent programming multi process queue (recommended) - yupidi - blog Garden
- python concurrent programming multi process queue - minger_lcm - blog Park
- Python multi process programming, how to make the main process and sub process exit_ Take a look - CSDN blog_ When the python main process exits, the child process also exits
- When the main process is killed, how to ensure that the child process exits at the same time and remains an orphan process - shy team windbreaker - blog park
- pytorch uses the "RuntimeError: An attempt has been made to start a new process before the..." solution - Grey letter network (software development blog aggregation)
- put_nowait and get_nowait - night rain in the Jianghu - blog Park
- Python Queue - yangyidba - blog Park
- python queue usage_ Great devil's blog - CSDN blog_ python queue