Python orchestration
Basics
Co programming is a way to realize concurrent programming
example
- Simple reptiles
import time def crawl_page(url): print('crawling {}'.format(url)) sleep_time = int(url.split('_')[-1]) time.sleep(sleep_time) print('OK {}'.format(url)) def main(urls): for url in urls: crawl_page(url) %time main(['url_1', 'url_2', 'url_3', 'url_4']) ########## output ########## crawling url_1 OK url_1 crawling url_2 OK url_2 crawling url_3 OK url_3 crawling url_4 OK url_4 Wall time: 10 s
This is a simple crawler. When the main() function executes, call crawl_ The page() function performs network communication, receives the result after waiting for several seconds, and then executes the next one.
- The writing method of using association process
import asyncio async def crawl_page(url): print('crawling {}'.format(url)) sleep_time = int(url.split('_')[-1]) await asyncio.sleep(sleep_time) print('OK {}'.format(url)) async def main(urls): for url in urls: await crawl_page(url) %time asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4'])) ########## output ########## crawling url_1 OK url_1 crawling url_2 OK url_2 crawling url_3 OK url_3 crawling url_4 OK url_4 Wall time: 10 s
First, import asyncio, which contains most of the magic tools needed to implement the cooperation.
async modifiers declare asynchronous functions, where crawle_ Both page and main become asynchronous functions, and when asynchronous functions are called, a co program object can be obtained.
For example, if print(crawl_page(')), it will output < coroutine object crawl_ Page at XXXXXX >, which prompts you that this is a Python coroutine object, rather than actually executing this function.
Execution of the agreement
First, it can be called through await. The execution effect of await is the same as that of Python. That is to say, the program will block here, enter the called coroutine function, and continue after the execution is finished, which is the literal meaning of await.
Await in the code asyncio.sllep (sleep_ Tiem) will rest here for a few seconds, await crawl_page(url) will execute crawl_page() function.
Second, it can be asyncio.create_task() to create the task.
Finally, we need asyncio.run To trigger the run. asyncio.run This function is a feature of Python 3.7 later.
The result is still 10 seconds, because await is a synchronous call, so crawl_page(url) will not trigger the next call until the current call ends.
An important concept of collaboration: task
import asyncio async def crawl_page(url): print('crawling {}'.format(url)) sleep_time = int(url.split('_')[-1]) await asyncio.sleep(sleep_time) print('OK {}'.format(url)) async def main(urls): tasks = [asyncio.create_task(crawl_page(url)) for url in urls] for task in tasks: await task %time asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4'])) ########## output ########## crawling url_1 crawling url_2 crawling url_3 crawling url_4 OK url_1 OK url_2 OK url_3 OK url_4 Wall time: 3.99 s
Once you have the process object, you can pass the asyncio.creat_task to create a task.
After the task is created, it will be scheduled to execute soon, so that the code will not block the task here.
Another way to perform task s
import asyncio async def crawl_page(url): print('crawling {}'.format(url)) sleep_time = int(url.split('_')[-1]) await asyncio.sleep(sleep_time) print('OK {}'.format(url)) async def main(urls): tasks = [asyncio.create_task(crawl_page(url)) for url in urls] await asyncio.gather(*tasks) %time asyncio.run(main(['url_1', 'url_2', 'url_3', 'url_4'])) ########## output ########## crawling url_1 crawling url_2 crawling url_3 crawling url_4 OK url_1 OK url_2 OK url_3 OK url_4 Wall time: 4.01 s
*task unpacks the list and turns it into a function parameter; on the contrary, * * dict turns the dictionary into a function parameter.
Note: asyncio.create_task, asyncio.run All are provided by Python 3.7 and above.
Decryption process runtime
import asyncio async def worker_1(): print('worker_1 start') await asyncio.sleep(1) print('worker_1 done') async def worker_2(): print('worker_2 start') await asyncio.sleep(2) print('worker_2 done') async def main(): print('before await') await worker_1() print('awaited worker_1') await worker_2() print('awaited worker_2') %time asyncio.run(main()) ########## output ########## before await worker_1 start worker_1 done awaited worker_1 worker_2 start worker_2 done awaited worker_2 Wall time: 3 s
import asyncio async def worker_1(): print('worker_1 start') await asyncio.sleep(1) print('worker_1 done') async def worker_2(): print('worker_2 start') await asyncio.sleep(2) print('worker_2 done') async def main(): task1 = asyncio.create_task(worker_1()) task2 = asyncio.create_task(worker_2()) print('before await') await task1 print('awaited worker_1') await task2 print('awaited worker_2') %time asyncio.run(main()) ########## output ########## before await worker_1 start worker_2 start worker_1 done awaited worker_1 worker_2 done awaited worker_2 Wall time: 2.01 s
process analysis
One asyncio.run (main()), the program enters the main() function, and the event cycle is opened;
2. Task 1 and task 2 are created and enter the time cycle to wait for running; run to print and output 'before await';
3.await task1 is executed. The user selects to cut out from the current main task, and the time scheduler starts to schedule the worker_1;
4.worker_1 start running, run print output 'worker_ 1 start 'and run to await asyncio.sleep(1) From the current task, the time scheduler starts to schedule the worker_2;
5.worker_2 start running, run print output 'worker_ 2 start 'and run await asyncio.sleep(2) Cut from the current task;
6. The running time of all the above time should be between 1ms and 10ms, or even shorter. The time scheduler will suspend scheduling from this time;
7. In a second, worker_ The sleep of 1 is completed, and the time scheduler transmits the control weight to the task_1. Output 'worker_1 done’,task_1. Complete the task and exit from the event cycle;
8. After await Task1 is completed, the time scheduler will transfer the controller to the main task and output 'await worker'_ 1 ', and then continue to wait at await task2;
9. In two seconds, worker_ The sleep of 2 is completed, and the event scheduler transmits the control weight to the task_2. Output 'worker_2 done’,task_2. Complete the task and exit from the event cycle;
10. Main task output 'awaited worker_2 ', the whole task of the cooperation process ends, and the event cycle ends.
Example
- You want to limit the running events for some collaboration tasks, and cancel them once they time out.
import asyncio async def worker_1(): await asyncio.sleep(1) return 1 async def worker_2(): await asyncio.sleep(2) return 2 / 0 async def worker_3(): await asyncio.sleep(3) return 3 async def main(): task_1 = asyncio.create_task(worker_1()) task_2 = asyncio.create_task(worker_2()) task_3 = asyncio.create_task(worker_3()) await asyncio.sleep(2) task_3.cancel() res = await asyncio.gather(task_1, task_2, task_3, return_exceptions=True) print(res) %time asyncio.run(main()) ########## output ########## [1, ZeroDivisionError('division by zero'), CancelledError()] Wall time: 2 s
worker_1 normal operation, worker_2 error in operation, worker_3 if the execution time is too long and cancel led, all the information will be reflected in the final returned result res.
Note that return_exceptions=True. If you do not set this parameter, the error will be throw n to the execution layer completely, so you need to try except ion to catch it, which means that all other tasks that have not been executed are canceled.
To avoid this situation, return_ Just set exceptions to True.
Summary
-
There are two main differences between orchestration and multithreading: one is that the orchestration is a single thread; the other is that the user decides where to give up control and switch to the next task.
-
The writing method of synergetic process is more concise and clear
-
When writing a collaboration program, you should have a clear concept of event loop, know when the program needs to pause, wait for I/O, and when it needs to be executed together.