Concurrent programming multi process programming (python version)

Keywords: Python Concurrent Programming multiple processes

catalogue

1 overview of Python multiprocess programming

2 needs and programmes

Background:

Requirements:

Solution:

Problems and solutions to be solved:

3 complete code

1 overview of Python multiprocess programming

  • Multithreading in Python cannot take advantage of multi-core. If you want to make full use of the resources of multi-core CPU, you need to use multi-process in most cases in Python. Python provides multiprocessing.
  • The multiprocessing module is used to start sub processes and execute our customized tasks (such as functions) in the sub processes. The multiprocessing module has many functions: supporting sub processes, communicating and sharing data, performing different forms of synchronization, and providing Process, Queue, Pipe, Lock and other components.
  • Unlike threads, processes do not have any shared state, and the data modified by the process is limited to the process.

2 needs and programmes

Background:

In a folder path of the server, the data packet sent from the software end will be received (if the reception is not complete, the folder name will be suffixed with '- downloading'). The data packet is in folder format, and there is a ZIP package with the same name as the folder name.

The algorithm needs to decompress the sent data packet, and then process the data in the compressed packet. After the data processing, the folder and its contents will be moved to another place for archiving.

Requirements:

The server needs to support multi concurrency.

Solution:

Considering the weakness of python multithreading, multi process implementation is adopted.

Problems and solutions to be solved:

  1. How to avoid the same packet being parsed by multiple processes?
    1. Scenario: use queues to build producer (main process) and consumer (sub process) patterns. The folder names of all newly generated packets are put into a shared queue, and all processes take the packet names from the queue.
  2. When the main process is killed, how to ensure that the child process exits at the same time without becoming an orphan process?
    1. Scheme: build the form of 'Process Group' + send SIGKILL signal to the process group
    2. reference resources:

      1. https://blog.csdn.net/lucia555/article/details/105957928/
         
      2. https://www.cnblogs.com/domestique/p/8241219.html

3 complete code

from multiprocessing import Process, Queue
import os
import time
import signal
 

import zipfile
import shutil


def unzip_file(sample_key_pair):
    try:
        zip_name = sample_key_pair + '.zip'
        sampleraw_zip = os.path.join('./zip/'+ sample_key_pair, zip_name)
        with zipfile.ZipFile(sampleraw_zip) as z:
            z.extractall(path='./zip/'+ sample_key_pair, members=None, pwd=None)
    except Exception as e:
        print('Fail to unzip file: {}', e)


def mv_dir(sample_key_pair):
    shutil.move('./zip/'+ sample_key_pair, './mv_zip')




def gan_huo_de_jin_cheng(zip_Queue):
    print('Subprocess pid yes %s, group id is %s' % (os.getpid(), os.getppid()))
    while True:
        
        if not zip_Queue.empty():
            # 1 extract a compressed packet
            sample_key_pair = zip_Queue.get()
            print('Subprocess:',os.getpid(),'Get compressed package',sample_key_pair)
            
            # 2 decompression
            unzip_file(sample_key_pair)

            # 3 move folder
            mv_dir(sample_key_pair)
            



#Once the main process is killed, all child processes are closed. 
def term(sig_num, addtion):
    print('term current pid is %s, group id is %s' % (os.getpid(), os.getppid()))
    os.killpg(os.getpgid(os.getpid()), signal.SIGKILL)
 



if __name__== '__main__':  #When testing multi process programming, you need to add this, otherwise the debugging will report an error.

    raw_data_root = './zip'

    signal.signal(signal.SIGTERM, term)
    print('Main process pid yes %s' % os.getpid())
    
    zip_Queue = Queue() #If the queue does not specify a size, it will use unlimited space, and the queue uses memory space
    zip_list = [] #Zip it_ The Queue is a copy of the same content, because the content in the Queue cannot be traversed and queried.

    #Create child process
    for i in range(3): 
        t = Process(target=gan_huo_de_jin_cheng, args=(zip_Queue, ))
        t.daemon = True
        t.start()
    

    #After traversing the new compressed packet, it is pushed into the queue
    while True:

        time.sleep(1) #Retrieve folders once a second

        sample_raw_list = os.listdir(raw_data_root)
        if len(sample_raw_list) != 0:
            
            # 1. Retrieve whether there is a 'intact' compressed package in the folder, and the compressed package name is inconsistent with the name in the queue.
            for sample_raw_dir in sample_raw_list:

                sample_key_pair = sample_raw_dir.split('/')[-1]
                if(sample_key_pair.endswith("-downloading")): #If the packet has not been downloaded, skip
                    continue
                
                # If the zip package folder is not in the queue, join the queue
                if sample_key_pair not in zip_list: 
                    zip_Queue.put(sample_key_pair)
                    zip_list.append(sample_key_pair)
                    
                    if len(zip_list) > 100:
                        zip_list.pop(0)#Delete the first. (when there are more than 100 zip_lists, delete one after another to prevent the zip_list from becoming larger and larger.)


        

Main references:

  1. python concurrent programming multi process (practice) - anne199534 - blog Garden
  2. Python 3 concurrent programming multi process queue (recommended) - yupidi - blog Garden
  3. python concurrent programming multi process queue - minger_lcm - blog Park
  4. Python multi process programming, how to make the main process and sub process exit_ Take a look - CSDN blog_ When the python main process exits, the child process also exits
  5. When the main process is killed, how to ensure that the child process exits at the same time and remains an orphan process - shy team windbreaker - blog park
  6. pytorch uses the "RuntimeError: An attempt has been made to start a new process before the..." solution - Grey letter network (software development blog aggregation)
  7. put_nowait and get_nowait - night rain in the Jianghu - blog Park
  8. Python Queue - yangyidba - blog Park
  9. python queue usage_ Great devil's blog - CSDN blog_ python queue

Posted by guzman-el-bueno on Thu, 28 Oct 2021 03:53:37 -0700