Model layer - 9: Query set API

Keywords: Database Django SQL Python

This section details the API of the query set, which is based on the following model, the same as the one in the previous section:

from django.db import models

class Blog(models.Model):
    name = models.CharField(max_length=100)
    tagline = models.TextField()

    def __str__(self):              # __unicode__ on Python 2
        return self.name

class Author(models.Model):
    name = models.CharField(max_length=200)
    email = models.EmailField()

    def __str__(self):              # __unicode__ on Python 2
        return self.name

class Entry(models.Model):
    blog = models.ForeignKey(Blog, on_delete=models.CASCADE)
    headline = models.CharField(max_length=255)
    body_text = models.TextField()
    pub_date = models.DateField()
    mod_date = models.DateField()
    authors = models.ManyToManyField(Author)
    n_comments = models.IntegerField()
    n_pingbacks = models.IntegerField()
    rating = models.IntegerField()

    def __str__(self):              # __unicode__ on Python 2
        return self.headline

1. When will the QuerySet be submitted

Internally, creating, filtering, slicing, and passing a QuerySet does not actually operate on the database, and no actual database operations occur until you submit the query set.

QuerySet can submit query operations using the following methods:

  • iteration

QuerySet is iterative and executes the actual database query when the query set is first iterated.For example, the following statement prints the headline s of all Entries in the database:

for e in Entry.objects.all():
    print(e.headline)
  • Slice: If you use the sliced "step" parameter, Django executes a database query and returns a list.
  • Pickling/Caching
  • repr()
  • len(): When you call len() on a QuerySet, the database operation is submitted.
  • List(): Calling list() on a QuerySet will force the submission operation entry_list = list(Entry.objects.all())
  • bool()

Test Boolean values, like this:

if Entry.objects.filter(headline="Test"):
   print("There is at least one Entry with the headline Test")

Note: If you need to know if there is at least one record (and not a real object), using exists() will be more efficient.

2. QuerySet

The following is a formal definition of QuerySet:

class QuerySet(model=None, query=None, using=None)[source]

The QuerySet class has two public attributes for introspection:

Ordered: True if QuerySet is ordered, False otherwise.

db: If executed now, the database used is returned.

3. Return to the API of the new QuerySets

The following methods will all return a new QuerySets.The emphasis is on the bold API s, with few other usage scenarios.

Method Name explain
filter() Filter query objects.
exclude() Exclude objects that meet the criteria
annotate() Using aggregate functions
order_by() Sort Query Sets
reverse() Reverse sort
distinct() Reduplicate Query Set
values() Returns a QuerySet containing a dictionary of object specific values
values_list() Similar to values(), only tuples are returned, not dictionaries.
dates() Get Query Set from Date
datetimes() Get Query Set by Time
none() Create an empty query set
all() Get all objects
union() Union
intersection() intersection
difference() Difference set
select_related() Included Query Associated Objects
prefetch_related() Pre-query
extra() Additional SQL Queries
defer() Do not load specified fields
only() Load only specified fields
using() Select Database
select_for_update() Lock the selected object until the end of the transaction.
raw() Receive an original SQL query

1. filter()

filter(**kwargs)

Returns a collection of objects that satisfy the query parameters.

The parameters found (**kwargs) should match the format in the field lookup below.The relationship between multiple parameters is that of ADD.

2. exclude()

exclude(**kwargs)

Returns a new QuerySet containing objects that do not satisfy the given lookup parameters.

The parameters found (**kwargs) should match the format in the field lookup below.Multiple parameters are connected through an AND, and all the content is put into NOT().

The following example excludes all records where pub_date is later than 2005-1-3 and headline is Hello:

Entry.objects.exclude(pub_date__gt=datetime.date(2005, 1, 3), headline='Hello')

The following example excludes all records where pub_date is later than January 3, 2005 or headline is Hello:

Entry.objects.exclude(pub_date__gt=datetime.date(2005, 1, 3)).exclude(headline='Hello')

3. annotate()

annotate(args, *kwargs)

Query the object using the provided aggregate expression.

Expressions can be simple values, references to fields on a model (or any associated model), or aggregate expressions (mean, sum, and so on).

Each parameter of annotate() is an annotation, which is added to each object of the returned QuerySet.

The Annotation specified by the keyword parameter will use the keyword as an alias for Annotation.Aliases for anonymous parameters are generated based on the name of the aggregate function and the field of the model.Anonymous parameters can only be used in aggregate expressions that refer to a single field.All other forms must use keyword parameters.

For example, if you are working on a list of blogs, you might want to know how many Entries each Blog has:

>>> from django.db.models import Count
>>> q = Blog.objects.annotate(Count('entry'))
# The name of the first blog
>>> q[0].name
'Blogasaurus'
# The number of entries on the first blog
>>> q[0].entry__count
42

The Blog model itself does not define the entry_u count attribute, but by specifying an aggregate function with a keyword parameter, you can control the name of the Annotation:

>>> q = Blog.objects.annotate(number_of_entries=Count('entry'))
# The number of entries on the first blog, using the name provided
>>> q[0].number_of_entries
42

4. order_by()

order_by(*fields)

By default, objects in QuerySet are sorted according to the ordering property in the model's Meta class

Entry.objects.filter(pub_date__year=2005).order_by('-pub_date', 'headline')

The results above will be sorted in descending order by pub_date and then in ascending order by headline.The negative sign before'-pub_date'indicates the descending order.Ascending order is the default.To sort randomly, use'?', as follows:

Entry.objects.order_by('?')

Note: order_by('?') can be resource intensive and slow, depending on the database used.

To sort by fields in another model, you can use the syntax of the query Association model.That is, through the name of the field followed by two underscores (u), plus the name of the field in the new model until you want to join the model.Like this:

Entry.objects.order_by('blog__name', 'headline')

If the sorted field is associated with another model, Django will use the default sorting of the associated model, or if Meta.ordering is not specified, it will be sorted by the primary key of the associated model.For example, because the Blog model does not specify a default sort:

Entry.objects.order_by('blog')

Same as the following:

Entry.objects.order_by('blog__id')

If Blog sets ordering = ['name'], the first QuerySet will be equivalent to:

Entry.objects.order_by('blog__name')

You can also call the desc() or asc() method of the expression:

Entry.objects.order_by(Coalesce('summary', 'headline').desc())

Consider the following scenario, specifying a multi-value field to sort (for example, a ManyToManyField field or a reverse association of ForeignKey fields):

class Event(Model):
   parent = models.ForeignKey(
       'self',
       on_delete=models.CASCADE,
       related_name='children',
   )
   date = models.DateField()

Event.objects.order_by('children__date')

Here, each Event may have multiple sort data; each Event with multiple Childrens will be returned multiple times to the new QuerySet created by order_by().In other words, an operation on a QuerySet object using the order_by() method returns an expanded version of the new QuerySet object.Therefore, be careful when using multivalue fields to sort results.

There is no way to specify whether the sort is case-sensitive or not.For case sensitivity, Django will sort the results according to how they are sorted in the database.

A field can be sorted by Lower converting it to lowercase, which results in a case-matched sort:

Entry.objects.order_by(Lower('headline').desc())

You can check the QuerySet.ordered property to see if the query is sorted.

Each order_by() will clear any previous sorting.For example, the following queries will be sorted by pub_date instead of headline:

Entry.objects.order_by('headline').order_by('pub_date')

5. reverse()

reverse()

Reverse the ordering of the elements returned in the QuerySet.The second call to reverse() restores the original sort.

To get the last five elements in a QuerySet, do this:

my_queryset.reverse()[:5]

This is slightly different from using negative indexes directly in Python.Django does not support negative indexing and can only save the country by curves.

6. distinct()

distinct(*fields)

Remove duplicate rows from query results.

By default, QuerySet does not remove duplicate rows.When querying data across multiple tables, QuerySet may get duplicate results, and distinct() can be used to weight it.

7. values()

values(fields, *expressions)

Returns a queryset containing a dictionary of data instead of a model instance.

Each dictionary represents an object and the key corresponds to the attribute name of the model object.

The following example compares values() to a common model object:

# List contains Blog objects
>>> Blog.objects.filter(name__startswith='Beatles')
<QuerySet [<Blog: Beatles Blog>]>
# The list contains a data dictionary
>>> Blog.objects.filter(name__startswith='Beatles').values()
<QuerySet [{'id': 1, 'name': 'Beatles Blog', 'tagline': 'All the latest Beatles news.'}]>

This method receives the optional location parameter * fields, which specifies which fields values() should restrict.If a field is specified, each dictionary will contain only the keys/values of the specified field.If no fields are specified, each dictionary will contain keys and values for all fields in the database table.

For example:

>>> Blog.objects.values()
<QuerySet [{'id': 1, 'name': 'Beatles Blog', 'tagline': 'All the latest Beatles news.'}]>
>>> Blog.objects.values('id', 'name')
<QuerySet [{'id': 1, 'name': 'Beatles Blog'}]>

The values() method also has a keyword parameter **expressions, which is passed to annotate():

>>> from django.db.models.functions import Lower
>>> Blog.objects.values(lower_name=Lower('name'))
<QuerySet [{'lower_name': 'beatles blog'}]>

Before aggregation in the values() clause applies to other parameters in the same values() clause.If you need to group by another value, add it to the earlier values() clause.Like this:

>>> from django.db.models import Count
>>> Blog.objects.values('author', entries=Count('entry'))
<QuerySet [{'author': 1, 'entries': 20}, {'author': 1, 'entries': 13}]>
>>> Blog.objects.values('author').annotate(entries=Count('entry'))
<QuerySet [{'author': 1, 'entries': 33}]>

Be careful:

If you have a field foo that is a ForeignKey, the dictionary returned by the default foo_id parameter will have a key called foo, because this is the name of the hidden model property that holds the actual value.When foo_id is called and the name of the field is passed, either foo or values() can be passed, and the result is the same.Like this:

>>> Entry.objects.values()
<QuerySet [{'blog_id': 1, 'headline': 'First Entry', ...}, ...]>
>>> Entry.objects.values('blog')
<QuerySet [{'blog': 1}, ...]>
>>> Entry.objects.values('blog_id')
<QuerySet [{'blog_id': 1}, ...]>

When values() are used with distinct(), be aware that sorting may affect the final result.

If the values() clause is after an extra() call, the fields defined by the select parameter in extra() must be explicitly included in the values() call.Values (extra after the call (the call will ignore the extra fields selected).

It is not reasonable to call only() and defer() after values(), so a NotImplementedError will be raised.

The fields of the associated model can be referenced back through the ManyToManyField, ForeignKey, and OneToOneFiel properties:

>>> Blog.objects.values('name', 'entry__headline')
<QuerySet [{'name': 'My blog', 'entry__headline': 'An entry'},
     {'name': 'My blog', 'entry__headline': 'Another entry'}, ...]>

8. values_list()

values_list(*fields, flat=False)

Similar to values(), only tuples, not dictionaries, are returned during iteration.Each tuple contains the value of the corresponding field or expression passed to the values_list() call, so the first item is the first field, and so on.Like this:

>>> Entry.objects.values_list('id', 'headline')
<QuerySet [(1, 'First entry'), ...]>
>>> from django.db.models.functions import Lower
>>> Entry.objects.values_list('id', Lower('headline'))
<QuerySet [(1, 'first entry'), ...]>

If you pass only one field, you can also pass the flat parameter.If True, it means the result returned is a single value, not a tuple.As follows:

>>> Entry.objects.values_list('id').order_by('id')
<QuerySet[(1,), (2,), (3,), ...]>
>>> Entry.objects.values_list('id', flat=True).order_by('id')
<QuerySet [1, 2, 3, ...]>

If there are multiple fields, passing the flat will result in an error.

If no value is passed to values_list(), it returns all the fields in the model in the order defined in the model.

It is common to get specific field values for a model instance.You can use values_list(), then call get():

>>> Entry.objects.values_list('headline', flat=True).get(pk=1)
'First entry'

Both values() and values_list() are used for optimization in specific situations: retrieving a subset of data without creating a model instance.

Note the behavior when querying through ManyToManyField:

>>> Author.objects.values_list('name', 'entry__headline')
<QuerySet [('Noam Chomsky', 'Impressions of Gaza'),
 ('George Orwell', 'Why Socialists Do Not Believe in Fun'),
 ('George Orwell', 'In Defence of English Cooking'),
 ('Don Quixote', None)]>

Similarly, when querying the reverse foreign key, None is returned for entries without any authors.

>>> Entry.objects.values_list('authors')
<QuerySet [('Noam Chomsky',), ('George Orwell',), (None,)]>

9. dates()

dates(field, kind, order='ASC')

Returns a QuerySet representing a list of datetime.date objects for all available dates of a specific type in the QuerySet content.

The field parameter is the name of the DateField of the model.The kind parameter should be "year", "month" or "day".Each datetime.date object in the result list is truncated to the given type.

"Year" returns a list of all the different year values for that field.

"Month" returns a list of all the different year/month values for the field.

"Day" returns a list of all the different year/month/day values for the field.

The order parameter defaults to'ASC', or'DESC'.It specifies how to sort the results.

Example:

>>> Entry.objects.dates('pub_date', 'year')
[datetime.date(2005, 1, 1)]
>>> Entry.objects.dates('pub_date', 'month')
[datetime.date(2005, 2, 1), datetime.date(2005, 3, 1)]
>>> Entry.objects.dates('pub_date', 'day')
[datetime.date(2005, 2, 20), datetime.date(2005, 3, 20)]
>>> Entry.objects.dates('pub_date', 'day', order='DESC')
[datetime.date(2005, 3, 20), datetime.date(2005, 2, 20)]
>>> Entry.objects.filter(headline__contains='Lennon').dates('pub_date', 'day')
[datetime.date(2005, 3, 20)]

10. datetimes()

datetimes(field_name, kind, order='ASC', tzinfo=None)

Returns a QuerySet, a list of datetime.datetime objects, representing all available dates of a specific type in the QuerySet content.

field_name should be the name of the DateTimeField for the model.

The kind parameter should be "hour", "minute", "month", "year", "second" or "day".

Each datetime.datetime object in the result list is truncated to the given type.

The order parameter defaults to'ASC', or'DESC'.It specifies how to sort the results.

The tzinfo parameter defines the time zone to which data time is converted before interception.

11. none()

none()

Calling none() creates a query set that does not return any objects and does not execute any queries when accessing the results.

Example:

>>> Entry.objects.none()
<QuerySet []>
>>> from django.db.models.query import EmptyQuerySet
>>> isinstance(Entry.objects.none(), EmptyQuerySet)
True

12. all()

all()

Returns a copy of the current QuerySet (or QuerySet subclass).Usually used to get all QuerySet objects.

13. union()

union(*other_qs, all=False)

New feature in Django 1.11.That is the concept of union in a set!

Combine the results of two or more QuerySet s using the UNION operator of SQL.For example:

>>> qs1.union(qs2, qs3)

By default, the UNION operator selects only different values.To allow duplicate values, use the all=True parameter.

14. intersection()

intersection(*other_qs)

New feature in Django 1.11.That is the concept of intersection in a set!

Returns the common elements of two or more QuerySet s using the SQL INTERSECT operator.For example:

>>> qs1.intersection(qs2, qs3)

15. difference()

difference(*other_qs)

New feature in Django 1.11.That is the concept of difference set in a set!

The EXCEPT operator using SQL only preserves elements in QuerySets, but not in other QuerySets.For example:

>>> qs1.difference(qs2, qs3)

16. select_related()

select_related(*fields)

Query the data of the associated object along the foreign key relationship.This results in a complex query and a loss of performance, but no further database queries are required when using foreign key relationships in the future.

The following example explains the difference between a normal query and a select_related() query.Here is a standard query:

# Access the database.
e = Entry.objects.get(id=5)
# Access the database again to get the associated Blog object.
b = e.blog

The following is a select_related query:

# Access the database.
e = Entry.objects.select_related('blog').get(id=5)
# The database will not be accessed because e.blog has already been acquired in the previous query.
b = e.blog

select_related() can be used with any set of queries for objects:

from django.utils import timezone

# Find all the blogs with entries scheduled to be published in the future.
blogs = set()

for e in Entry.objects.filter(pub_date__gt=timezone.now()).select_related('blog'):
    # Without select_related(), the following statement will generate a database query for each iteration of the loop to get each entry associated blog.
    blogs.add(e.blog)

The order of filter() and select_related() is not important.The following sets of queries are identical:

Entry.objects.filter(pub_date__gt=timezone.now()).select_related('blog')
Entry.objects.select_related('blog').filter(pub_date__gt=timezone.now())

You can query along foreign keys.If you have the following models:

from django.db import models

class City(models.Model):
    # ...
    pass

class Person(models.Model):
    # ...
    hometown = models.ForeignKey(
        City,
        on_delete=models.SET_NULL,
        blank=True,
        null=True,
    )

class Book(models.Model):
    # ...
    author = models.ForeignKey(Person, on_delete=models.CASCADE)

Calling Book.objects.select_related ('author_u cluster').get (id=4) will cache the associated Person and related City:

b = Book.objects.select_related('author__hometown').get(id=4)
p = b.author         # Doesn't hit the database.
c = p.hometown       # Doesn't hit the database.
b = Book.objects.get(id=4) # No select_related() in this example.
p = b.author         # Hits the database.
c = p.hometown       # Hits the database.

Any ForeignKey and OneToOneField can be used in the fields passed to select_related().

You can also reverse-reference OneToOneField in the fields passed to select_related.That is, you can go back to the field that defines OneToOneField.Instead of specifying the name of the field, you can use the related_name of the associated object field.

17. prefetch_related()

prefetch_related(*lookups)

Automatically retrieves the related objects for each specified lookup in a single batch.

Similar to select_related, but with a completely different strategy.

Suppose you have these models:

from django.db import models

class Topping(models.Model):
    name = models.CharField(max_length=30)

class Pizza(models.Model):
    name = models.CharField(max_length=50)
    toppings = models.ManyToManyField(Topping)

    def __str__(self):              # __unicode__ on Python 2
        return "%s (%s)" % (
            self.name,
            ", ".join(topping.name for topping in self.toppings.all()),
        )

And run:

>>> Pizza.objects.all()
["Hawaiian (ham, pineapple)", "Seafood (prawns, smoked salmon)"...

The problem is that each QuerySet requires Pizza.objects.all() to query the database, so self.toppings.all() will run the query on the Toppings table for each item in Pizza Pizza. u str_().

You can use prefetch_related to reduce to just two queries:

>>> Pizza.objects.all().prefetch_related('toppings')

This means that now each time self.toppings.all() is called, instead of going to the database to find it, it is looking in a prefetched QuerySet cache.

You can also use normal join syntax to execute related fields for related fields.Suppose you add an additional model to the example above:

class Restaurant(models.Model):
    pizzas = models.ManyToManyField(Pizza, related_name='restaurants')
    best_pizza = models.ForeignKey(Pizza, related_name='championed_by')

The following are legal:

>>> Restaurant.objects.prefetch_related('pizzas__toppings')

This will prefetch all pizzas that belong to the restaurant, and all the ingredients that belong to those pizzas.This will result in a total of three queries - one for restaurants, one for pizzas, and one for ingredients.

>>> Restaurant.objects.prefetch_related('best_pizza__toppings')

This will get the best pizza and all the ingredients for the best pizza in every restaurant.This will be queried in three tables - one for restaurants, one for Best Pizza and one for ingredients.

Of course, you can also use best_pizza to get the select_related relationship to reduce the number of queries to 2:

>>> Restaurant.objects.select_related('best_pizza').prefetch_related('best_pizza__toppings')

18. extra()

extra(select=None, where=None, params=None, tables=None, order_by=None, select_params=None)

In some cases, Django's query syntax is difficult to simply express complex WHERE clauses, in which case new clauses can be injected into the SQL clause generated by extra().As a last resort, this is an old API that may be discarded at some point in the future.Use it only when you cannot express a query using other query methods.

For example:

>>> qs.extra(
...     select={'val': "select col from sometable where othercol = %s"},
...     select_params=(someparam,),
... )

Amount to:

>>> qs.annotate(val=RawSQL("select col from sometable where othercol = %s", (someparam,)))

19. defer()

defer(*fields)

In some complex data modeling scenarios, models may contain a large number of fields, some of which may contain large-sized data (such as text fields), and converting them to Python objects can be costly.

If you are using the results of a query set, you can tell Django not to retrieve them from the database when you don't know if these specific fields are needed when you first get the data.

Do not load by passing field names to defer():

Entry.objects.defer("headline", "body")

Query sets with delayed load fields will still return model instances.

Each delayed field is retrieved from the database when you access it (one at a time, not all delayed fields at a time).

defer() can be called multiple times.Each call adds a new field to the delay set:

# Delay body and headline fields.
Entry.objects.defer("body").filter(rating=5).defer("headline")

The order in which fields are added to the delay set does not matter.It is OK to defer() the defered field name again (the field will still be deferred).

You can use standard double underline symbols to separate the associated fields to load the fields in the association model:

Blog.objects.select_related().defer("entry__headline", "entry__body")

If you want to clear the set of deferred fields, pass None as a parameter to defer():

# Load all fields now.
my_queryset.defer(None)

The defer() method (and its sibling, only()) only applies to advanced use cases, and they provide an optimization method for data loading.

20. only()

only(*fields)

The only() method is the opposite of defer().

If you have a model where almost all fields need to be delayed, specifying a supplementary set of fields using only() makes the code simpler.

Suppose you have a model with fields biography, age, and name.The following two sets of queries are identical, in the case of deferred fields:

Person.objects.defer("age", "biography")
Person.objects.only("name")

Whenever you call only(), it replaces the immediately loaded set of fields.Consequently, consecutive calls to only() result in only the last field being considered:

# This will defer all fields except the headline.
Entry.objects.only("body", "rating").only("headline")

Since defer() acts incrementally (adding fields to the delay list), you can combine only() with defer() calls:

# Final result is that everything except "headline" is deferred.
Entry.objects.only("headline", "body").defer("body")
# Final result loads headline and body immediately (only() replaces any
# existing set of fields).
Entry.objects.defer("body").only("headline", "body")

When save() is called on an instance with a deferred field, only the loaded fields are saved.

21. using()

using(alias)

If you are using more than one database, this method is used to specify on which database to query QuerySet.The only parameter to the method is the database alias, defined in DATABASES.

For example:

# queries the database with the 'default' alias.
>>> Entry.objects.all()
# queries the database with the 'backup' alias
>>> Entry.objects.using('backup')

22. select_for_update()

select_for_update(nowait=False, skip_locked=False)

Returns a set of queries that lock rows until the end of the transaction and, if supported by the database, generates a SELECT... FOR UPDATE statement.

For example:

entries = Entry.objects.select_for_update().filter(author=request.user)

All matching rows will be locked until the end of the transaction.This means that locks can prevent data from being modified by other transactions.

Typically, if another transaction locks a related row, the query will be blocked until the lock is released.Using select_for_update(nowait=True) will not block the query.If other transactions hold conflicting locks, the query raises a DatabaseError exception.You can also use select_for_update(skip_locked=True) to ignore locked rows.Nowait and skip_locked are mutually exclusive.

Currently, the postgresql, oracle, and MySQL database backends support select_for_update().However, MySQL does not support the nowait and skip_locked parameters.

23. raw()

raw(raw_query, params=None, translations=None)

Receives an original SQL query, executes it, and returns an instance of django.db.models.query.RawQuerySet.

This RawQuerySet instance can be iterated, just like a normal QuerySet.

Posted by Sneo on Sat, 02 May 2020 12:02:14 -0700