select_related and prefetch_related for Django query optimization

Keywords: SQL Database Django Python

1. select_related query optimization

    select_related obtains all data at one time through multi-table join Association query, and improves performance by reducing the number of database queries, but the association table can not be too many, because the join operation is inherently more performance-consuming. This article uses Django debug toolbar tool to visually display the number of queries and query statements. If you don't use the "Django debug toolbar" tool, you can read my blog and configure it!

model.tb.objects.all().select_related()
model.tb.objects.all().select_related('Foreign key field')
model.tb.objects.all().select_related('Foreign key field__Foreign key field')

models.py

from django.db import models

class Publisher(models.Model):
    name = models.CharField(max_length=30, verbose_name="Name")
    address = models.CharField("address", max_length=50)
    city = models.CharField('City', max_length=60)
    state_province = models.CharField(max_length=30)
    country = models.CharField(max_length=50)
    website = models.URLField()

    class Meta:
        verbose_name = 'Publisher'
        verbose_name_plural = verbose_name

    def __str__(self):
        return self.name

class Author(models.Model):
    name = models.CharField(max_length=30)
    hobby = models.CharField(max_length=20, default="", blank=True)

    def __str__(self):
        return self.name

class Book(models.Model):
    title = models.CharField(max_length=100, verbose_name="Title")
    authors = models.ManyToManyField(Author)
    publisher = models.ForeignKey(Publisher, verbose_name="Press")
    publication_date = models.DateField(null=True)
    price = models.DecimalField(max_digits=5, decimal_places=2, default=10, verbose_name="Price")

    def __str__(self):
        return self.title

views.py

def index(request):
    obj = Book.objects.all()
    return render(request, "index.html", locals())

index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>
    <p>django debug toolbar!</p>
    {% for item in obj %}
        <div>{{ item.title }} {{ item.price }} {{ item.publisher.name }}</div>
    {% endfor %}
</body>
</html>

    When we don't use select_related, in the front-end template, every cycle sends a request to the database, because I have very few data in the table, all of which only initiate seven queries, but in actual production, the data of each table must be thousands of, traditional operation has a great impact on the performance of the database!


When we use select_related table join operation, please see the following example, only two queries were initiated!!!

def index(request):
    obj = Book.objects.all().select_related("publisher")
    return render(request, "index.html", locals())

Conclusion:

  1. select_related optimizes the one-to-one and many-to-one relationships of the main pins.

  2. select_related uses JOIN statements of SQL to optimize and improve performance by reducing the number of SQL queries.

  3. You can specify the field name that requires select_related by using variable length parameters. The specified recursive query (i.e. foreign keys of foreign keys, multi-level linked list queries) can also be implemented by using double underscores "" to connect field names. No specified fields will not be cached, no specified depth will not be cached, and Django will do the SQL query again if it wants to access.


2. Preetch_related query optimization

    Preetch_related() and select_related() are designed to reduce the number of SQL queries, but they are implemented in different ways. The latter is to solve the problem in the SQL query through JOIN statement. However, for many-to-many relationships, it is not wise to use SQL statements to solve them, because the tables obtained by JOIN will be very long, which will lead to the increase of running time and memory occupation of SQL statements. The solution to prefetch_related() is to query each table separately and then use Python to handle their relationship!

models.tb.objects.prefetch_related('Foreign key field')

We also use the example above to illustrate:

def index(request):
    obj = Book.objects.all().prefetch_related("publisher")
    return render(request, "index.html", locals())



    Using prefetch_related to optimize queries seems to initiate four database requests, but only two are actually SQL statements with horizontal lines in the graph. The other two are session-related, which we don't need to ignore. Let me explain how prefetch_related initiated the request. First step: get all the data from the book table; second step: select.. From... where... in (the foreign key ID of all publishers in the book table). By making two requests, the data of the book table and the publisher table associated with the book table are obtained (not all publisher table data, only the data associated with the book table!). Then the corresponding association of data is processed through Python.

Conclusion:

  1. Preetch_related optimizes one-to-many and many-to-many relationships.

  2. Preetch_related optimizes the relationships between tables by retrieving their contents separately and then using Python to process them.


The final outcome:

    select_related associates multiple tables by join s, retrieves data at a time and stores it in memory, but if there are too many tables associated, it will seriously affect the performance of the database.

    Preetch_related is a way to get the data of each table, store it in memory, and then process the association between them through Python.



Posted by ale_jrb on Thu, 13 Dec 2018 16:27:08 -0800