Brainless blowing FastAPI performance flash? On the inappropriate performance comparison on the Internet and the self-test results that confused me

Keywords: Python api performance testing Flask WSGI

I've heard of it more than once. There is a FastAPI framework, which can crush flash and catch up with Golang, but it has not been tested. Today, I'm free to test and see the results. I don't know what went wrong, but I was surprised.

Before testing

In order to be lazy, I naturally want to find the previous test code from the Internet as a reference. Baidu's top performance tests on FastAPI and Flask include the following codes:

A little confused

After a brief look, I don't understand why unicorn is used to start the FastAPI, but only the startup method provided by flash. Why not use other WSGI servers?

I think this should be problematic, not to mention that they are not the same level framework (FastAPI is based on Starlette, which is the framework that should be compared with Flask). Even if they are compared, they should be started in the same way, right?

unicorn is a third-party ASGI server. Should Flask be started with a third-party WSGI server? I think it may not be fair to use its own WSGI server.

I originally wanted to start flash with gunicorn for comparison, but it was found that it was incompatible with Windows, so I changed waitress, a similar WSGI framework.

Start test

The Internet is tested with AB. instead of Apache, I used another testing tool, siege. The test method is unlimited connections. Test for 10 seconds. The command is as follows:

./siege.exe -b -t10s http://127.0.0.1:5000/

The test code is the same as that found before. Use the examples on their official websites to output HelloWorld, make slight modifications, and write the startup code into the file, so you don't need to use the command line to start.

  • Flask

    from flask import Flask
    from waitress import serve
    
    app = Flask(__name__)
    
    
    @app.route('/')
    def index():
      return {'message': 'hello world'}
    
    
    if __name__ == '__main__':
      app.run(host='0.0.0.0')
      # serve(app, host='0.0.0.0', port=5000)
  • FastAPI

    from fastapi import FastAPI
    import uvicorn
    
    app = FastAPI()
    
    
    @app.get("/")
    async def read_root():
      return {"Hello": "World"}
    
    
    if __name__ == "__main__":
      uvicorn.run(app, host="0.0.0.0", port=5000)

test result

Since the articles on the Internet are there, I also tested the results of using flash's own startup mode.

In addition, we also tested the results of using async in FastAPI (adding async is useless in practice. The document clearly states that async needs to be defined only when asynchronous functions are used internally and synchronous return is required, that is, when await is needed internally).

The results are as follows:

  • flask

    Transactions:                   4579 hits
    Availability:                 100.00 %
    Elapsed time:                   9.15 secs
    Data transferred:               0.11 MB
    Response time:                  0.03 secs
    Transaction rate:             500.66 trans/sec
    Throughput:                     0.01 MB/sec
    Concurrency:                   14.93
    Successful transactions:        4579
    Failed transactions:               0
    Longest transaction:            0.10
    Shortest transaction:           0.02
  • flask + waitress

    Transactions:                  12598 hits
    Availability:                 100.00 %
    Elapsed time:                  10.02 secs
    Data transferred:               0.31 MB
    Response time:                  0.01 secs
    Transaction rate:            1257.03 trans/sec
    Throughput:                     0.03 MB/sec
    Concurrency:                   14.89
    Successful transactions:       12598
    Failed transactions:               0
    Longest transaction:            0.03
    Shortest transaction:           0.00
  • fastapi + uvicorn

    Transactions:                   5278 hits
    Availability:                 100.00 %
    \Elapsed time:                  9.05 secs
    Data transferred:               0.09 MB
    Response time:                  0.03 secs
    Transaction rate:             583.20 trans/sec
    Throughput:                     0.01 MB/sec
    Concurrency:                   14.93
    Successful transactions:        5278
    Failed transactions:               0
    Longest transaction:            0.11
    Shortest transaction:           0.01
  • fastapi + uvicorn + async

    Transactions:                   5876 hits
    Availability:                 100.00 %
    \Elapsed time:                  9.31 secs
    Data transferred:               0.10 MB
    Response time:                  0.02 secs
    Transaction rate:             631.22 trans/sec
    Throughput:                     0.01 MB/sec
    Concurrency:                   14.84
    Successful transactions:        5876
    Failed transactions:               0
    Longest transaction:            0.12
    Shortest transaction:           0.00

From the Transaction rate, that is, the request processing rate, you can see:

  • The result of flash direct startup is slightly worse than that of FastAPI startup (500:583 / 631)
  • There is little difference between using asynchronous async for FastAPI (583:631)
  • The startup result of Flask with waitress WSGI server is 2.5 times faster than that without using it (1257:500), and it is also about 2 times faster than FastAPI

This result is completely different from that tested by others, and there is a gap with what I predicted. What do you think is wrong?

It is expected that the direct startup of Flask is a little slower than the FastAPI, but it is certainly not normal to use the waitress WSGI server to start so much faster.

So I went to check the source code of the two, and found that waitress defaults to 4 threads and unicorn defaults to 1 thread...

We had to change Flask to 1 thread to retest

serve(app, host='0.0.0.0', port=5000, threads=1)

The results are as follows:

Transactions:                   7492 hits
Availability:                 100.00 %
Elapsed time:                   9.07 secs
Data transferred:               0.19 MB
Response time:                  0.02 secs
Transaction rate:             825.84 trans/sec
Throughput:                     0.02 MB/sec
Concurrency:                   14.89
Successful transactions:        7492
Failed transactions:               0
Longest transaction:            0.07
Shortest transaction:           0.01

Change unicorn to 4 threads and retest

uvicorn.run("test-fastapi:app", host="0.0.0.0", port=5000, workers=4)

# You need to create a new 'pyproject.toml' file in the same directory. The content is:
[tool.poetry.scripts]
start = "test-fastapi:start"

The results are as follows:

Transactions:                   7782 hits
Availability:                 100.00 %
Elapsed time:                   9.24 secs
Data transferred:               0.13 MB
Response time:                  0.02 secs
Transaction rate:             842.39 trans/sec
Throughput:                     0.01 MB/sec
Concurrency:                   14.92
Successful transactions:        7782
Failed transactions:               0
Longest transaction:            0.15
Shortest transaction:           0.00

It can be seen that:

  • When using the waitress WSGI server, the result of single thread startup of Flask is 65% (825:500) faster than that of no use, and it is also much faster than that of FastAPI (825:583 / 631)
  • unicorn starts with 4 threads, with a small improvement (842:583 / 631), which is similar to waitress single thread

This result is also very unexpected. I'm a little unsure now. Is there something wrong with my test process?

It doesn't make sense in theory. After Unicorn opens four threads, the result is only more than twice as fast, and waitress opens four threads more than twice as fast, which means that the four threads are not fully utilized, and unicorn's single thread processing capacity is stronger. I don't know why the result is so poor.

Maybe it's because of the testing tool. After all, others use AB and specify the number of concurrency. The siege I use does not limit concurrency.

Moreover, the unicorn document also mentioned that Gunicorn can be used to manage the process, and the performance may be improved. I won't test it because of the device.

Write at the end

The purpose of this test is to refute the reasons mentioned above, just to say that the third-party WSGI server should be used to start flash during the comparison test.

Now I'm not sure about the final test results. I can only ensure that the test data and code are absolutely true. Friends who see this article had better test it by themselves and tell me why this result occurs.
Let's say more. There are too many mindless people blowing FastAPI on the Internet. They don't deny its advantages, such as supporting asynchrony, ws, automatically generating documents, emphasizing declaring variable types, etc., but there's no need to step on flame.

The continuous writing tape test took several hours and was idle.

Posted by shamuraq on Sun, 05 Dec 2021 09:52:55 -0800