Asynchronous crawlers are too troublesome to write?Try Trio!

Keywords: Python Programming git pip

Personal Blog Visit http://www.x0100.top 

Trio, translated as Trio, provides easier asynchronous programming and is a more advanced encapsulation of asyncio.

It attempts to simplify complex asyncio modules.Simpler to use than asyncio and Twisted, but equally powerful.The project is still young and in the pilot stage, but the overall design is reliable.The author encourages you to try it out and ask him questions on git if you have problems.The author also provides an online chat room for easier communication: https://gitter.im/python-trio/general.

Dead work

  • Make sure your Python version is 3.5 and above.

  • Install trio.Python3-m PIP install --upgrade Trio

  • import trio runs with or without errors to proceed down.

Async method for knowledge preparation

Using trio also means that you need to always write asynchronous methods.

#A standard method
def regular_double(x):
    return 2 * x

#An asynchronous method
async def async_double(x):
    return 2 * x

There is no difference in appearance between the asynchronous method and the standard method except that there are more asyncs ahead.

"Async" is short for "asynchronous". To distinguish it from asynchronous functions, we call the standard function synchronous. Asynchronous functions differ from synchronous functions from the user's perspective in the following ways:

  1. To call an asynchronous function, you must use the await keyword.Therefore, instead of writing regular_double(3), write await async_double(3).

  2. You cannot use await in a synchronization function, otherwise an error will occur.
    Syntax error:

def print_double(x):
    print(await async_double(x))   # <-- SyntaxError here

However, in asynchronous functions, await is allowed:

async def print_double(x):
    print(await async_double(x))   # <-- OK!

To sum up: As a user, all the advantages of asynchronous functions over regular functions are that they are super-capable: they can call other asynchronous functions.

Other asynchronous functions can be called in asynchronous functions, but how can the first asynchronous function be called when everything starts and ends?

Let's keep looking down

How to call the first asynchronous function

import trio

async def async_double(x):
    return 2 * x

trio.run(async_double, 3)  # returns 6

Here we can use trio.run to call the first asynchronous function.
Next let's look at the other features of trio

Waiting in Asynchronous

import trio

async def double_sleep(x):
    await trio.sleep(2 * x)

trio.run(double_sleep, 3)  # does nothing for 6 seconds then returns

The asynchronous wait function trio.sleep is used here, which functions much like time.sleep() in the synchronous function, but because the await call is required, we know from the previous conclusions that this is the wait method used by the asynchronous function.

The fact that this example is not useful is that we can use the synchronization function to achieve this simple function.The main purpose here is to demonstrate that other asynchronous functions can be called through await in asynchronous functions.

Typical structure for asynchronous function calls

trio.run -> [async function] -> ... -> [async function] -> trio.whatever

Don't forget to write await

If you forget to write what happens to await, let's take a look at this example below

import time
import trio

async def broken_double_sleep(x):
    print("*yawn* Going to sleep")
    start_time = time.perf_counter()

    #Bad, I forgot to write await
    trio.sleep(2 * x)

    sleep_time = time.perf_counter() - start_time
    print("Woke up after {:.2f} seconds, feeling well rested!".format(sleep_time))

trio.run(broken_double_sleep, 3)

Discover after running

*yawn* Going to sleep
Woke up after 0.00 seconds, feeling well rested!
__main__:4: RuntimeWarning: coroutine 'sleep' was never awaited

Error reported. The error type is RuntimeWarning, followed by saying that the protocol sleep does not use await.

When we print trio.sleep(3), we see that this is a protocol, that is, an asynchronous function is known by the previous content.

We can change the trio.sleep(2 * x) above to await trio.sleep(2 * x).

Remember if runtime warning: coroutine'Runtime Warning: coroutine'...'was never awaited', which means there's a place where you didn't write await.

Run multiple asynchronous functions

It would be worthless for a trio to simply use such a pointless example as await trio.sleep, so let's run several asynchronous functions with other trio functions.

# tasks-intro.py

import trio

async def child1():
    print("  child1: started! sleeping now...")
    await trio.sleep(1)
    print("  child1: exiting!")

async def child2():
    print("  child2: started! sleeping now...")
    await trio.sleep(1)
    print("  child2: exiting!")

async def parent():
    print("parent: started!")
    async with trio.open_nursery() as nursery:
        print("parent: spawning child1...")
        nursery.start_soon(child1)

        print("parent: spawning child2...")
        nursery.start_soon(child2)

        print("parent: waiting for children to finish...")
        # -- we exit the nursery block here --
    print("parent: all done!")

trio.run(parent)

Let's start with a step-by-step analysis by defining two asynchronous functions, child1 and child2, which are defined in a way similar to what we said above.

async def child1():
    print("child1: started! sleeping now...")
    await trio.sleep(1)
    print("child1: exiting!")

async def child2():
    print("child2: started! sleeping now...")
    await trio.sleep(1)
    print("child2: exiting!")

Next, we define parent as an asynchronous function that calls both child1 and child2

async def parent():
    print("parent: started!")
    async with trio.open_nursery() as nursery:
        print("parent: spawning child1...")
        nursery.start_soon(child1)

        print("parent: spawning child2...")
        nursery.start_soon(child2)

        print("parent: waiting for children to finish...")
        #Here we call u aexit_u and wait for child1 and child2 to finish running
    print("parent: all done!")

It creates a "nursery" by using the mysterious async with statement, and then adds child1 and child2 to nursery through the nusery method start_soon.

Let's talk about async with, but it's really easy, because we know we use with open() when we read the file again...To create a file handle, with involves two magic functions

Call u enter_u() at the beginning of the code block and then u exit_() at the end of the call. We call open() the context manager.The async with someobj statement and with are almost simply magic functions of the asynchronous methods it calls: u aenter_ and u aexit_u.We call someobj the Asynchronous Context Manager.

Going back to the code above, first we create an asynchronous block of code using async with
Both the Child1 and child2 functions are called by nursery.start_soon(child1) and nursery.start_soon(child2) to start running and return immediately, leaving both asynchronous functions running in the background.

Then wait for child1 and child2 to finish running, end the content in the async with code block, and print the last

"parent: all done!".

Let's see the results

parent: started!
parent: spawning child1...
parent: spawning child2...
parent: waiting for children to finish...
  child2: started! sleeping now...
  child1: started! sleeping now...
    [... 1 second passes ...]
  child1: exiting!
  child2: exiting!
parent: all done!

You can see that this is the same as what we analyzed above.See here, if you're familiar with threading, you'll find that it works like multithreading.But this is not a thread. The code here is all done in one thread. To distinguish between threads, we call child1 and child2 two tasks here. With tasks, we can only switch between certain locations that we call "checkpoints".Let's dig deeper later.

Tracker in trio

We know that many of the above tasks are switched in one thread, but we don't know how to switch. Only by knowing these can we learn a module better.
Fortunately, trio provides a set of tools for inspecting and debugging programs.We can implement the trio.abc.Instrumen interface by writing a Tracer class.The code is as follows

class Tracer(trio.abc.Instrument):
    def before_run(self):
        print("!!! run started")

    def _print_with_task(self, msg, task):
        # repr(task) is perhaps more useful than task.name in general,
        # but in context of a tutorial the extra noise is unhelpful.
        print("{}: {}".format(msg, task.name))

    def task_spawned(self, task):
        self._print_with_task("### new task spawned", task)

    def task_scheduled(self, task):
        self._print_with_task("### task scheduled", task)

    def before_task_step(self, task):
        self._print_with_task(">>> about to run one step of task", task)

    def after_task_step(self, task):
        self._print_with_task("<<< task step finished", task)

    def task_exited(self, task):
        self._print_with_task("### task exited", task)

    def before_io_wait(self, timeout):
        if timeout:
            print("### waiting for I/O for up to {} seconds".format(timeout))
        else:
            print("### doing a quick check for I/O")
        self._sleep_time = trio.current_time()

    def after_io_wait(self, timeout):
        duration = trio.current_time() - self._sleep_time
        print("### finished I/O check (took {} seconds)".format(duration))

    def after_run(self):
        print("!!! run finished")

Then we run the previous example, but this time we're passing in a Tracer object.

trio.run(parent, instruments=[Tracer()])

Then we'll find that we've printed a bunch of things and we'll do some analysis below.

!!! run started
### new task spawned: <init>
### task scheduled: <init>
### doing a quick check for I/O
### finished I/O check (took 1.787799919839017e-05 seconds)
>>> about to run one step of task: <init>
### new task spawned: __main__.parent
### task scheduled: __main__.parent
### new task spawned: <TrioToken.run_sync_soon task>
### task scheduled: <TrioToken.run_sync_soon task>
<<< task step finished: <init>
### doing a quick check for I/O
### finished I/O check (took 1.704399983282201e-05 seconds)

We don't need to worry about the previous lot of information. Look at ## new task spawned: u main_u.parent, and you know u main_u.parent created a task.

Once the initial administration is complete, trio starts running the parent function, and you can see that the parent function creates two subtasks.It then reaches the end of the asynchronization as a block and pauses.

>>> about to run one step of task: __main__.parent
parent: started!
parent: spawning child1...
### new task spawned: __main__.child1
### task scheduled: __main__.child1
parent: spawning child2...
### new task spawned: __main__.child2
### task scheduled: __main__.child2
parent: waiting for children to finish...
<<< task step finished: __main__.parent

Then to trio.run(), more internal processes are recorded.

>>> about to run one step of task: <call soon task>
<<< task step finished: <call soon task>
### doing a quick check for I/O
### finished I/O check (took 5.476875230669975e-06 seconds)

Then give these two subtasks a chance to run

>>> about to run one step of task: __main__.child2
  child2 started! sleeping now...
<<< task step finished: __main__.child2

>>> about to run one step of task: __main__.child1
  child1: started! sleeping now...
<<< task step finished: __main__.child1

Each task runs until trio.sleep() is called and then suddenly we go back to trio.run() to decide what to run next.What's wrong?The secret is that trio.run() and trio.sleep() work together, and trio.sleep() has some special magic to suspend the entire call stack, so it sends a notification to trio.run(), requests to wake up again in one second, and then suspends the task.When the task pauses, Python returns control to trio.run (), which decides what to do next.

Note: asyncio.sleep() cannot be used in trio.

Next it calls an operating system primitive to put the entire process to sleep

### waiting for I/O for up to 0.9997810370005027 seconds

1s After Hibernation

### finished I/O check (took 1.0006483688484877 seconds)
### task scheduled: __main__.child1
### task scheduled: __main__.child2

Remember how parent waited for the two subtasks to finish. Here's what parent was doing when child1 quit

>>> about to run one step of task: __main__.child1
  child1: exiting!
### task scheduled: __main__.parent
### task exited: __main__.child1
<<< task step finished: __main__.child1

>>> about to run one step of task: __main__.child2
  child2 exiting!
### task exited: __main__.child2
<<< task step finished: __main__.child2

Then do io first, and the parent task ends

### doing a quick check for I/O
### finished I/O check (took 9.045004844665527e-06 seconds)

>>> about to run one step of task: __main__.parent
parent: all done!
### task scheduled: <init>
### task exited: __main__.parent
<<< task step finished: __main__.parent

End with some internal action code

### doing a quick check for I/O
### finished I/O check (took 5.996786057949066e-06 seconds)
>>> about to run one step of task: <init>
### task scheduled: <call soon task>
### task scheduled: <init>
<<< task step finished: <init>
### doing a quick check for I/O
### finished I/O check (took 6.258022040128708e-06 seconds)
>>> about to run one step of task: <call soon task>
### task exited: <call soon task>
<<< task step finished: <call soon task>
>>> about to run one step of task: <init>
### task exited: <init>
<<< task step finished: <init>
!!! run finished

ok, this part only needs to talk about the operation mechanism, of course, remember that it is easier to understand trio.
Please look forward to more trio usage.

 

Watch the WeChat Public Number...

 

 

335 original articles were published, 230 were praised, 460,000 visits+
His message board follow

Posted by MattMan on Mon, 09 Mar 2020 19:12:11 -0700