- For those of you who have been using Django. Queryset is a great tool to assist us in database queries.
- Just by adding a relationship, you can easily query from one table to another easily. However, sometimes overuse of queries with a relationship can have a significant impact on the speed and performance of your application.
- In today's article, I will introduce you one way to optimize database queries in Django Framework that is using
select_related và fetch_related
Prepare
- First you need a Django project. Then create an app named
books
with the models as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | from django.db import models class Book(models.Model): name = models.CharField(max_length=200) public_at = models.DateField() author = models.ForeignKey('Author', on_delete=models.CASCADE) categories = models.ManyToManyField('Category', related_name='books') def __str__(self): return self.name class Author(models.Model): full_name = models.CharField(max_length=200) dob = models.DateField() def __str__(self): return self.full_name class Category(models.Model): name = models.CharField(max_length=200) description = models.CharField(max_length=200) def __str__(self): return self.name |
- Here I created 3 models. Author and Book are 1-n relations. Book and Category are relations
- Next add a decorator function to measure function execution time and the number of queries so that we can compare the efficiency before and after optimization.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | from django.db import connection, reset_queries import time import functools def query_debugger(func): @functools.wraps(func) def inner_func(*args, **kwargs): reset_queries() start_queries = len(connection.queries) start = time.perf_counter() result = func(*args, **kwargs) end = time.perf_counter() end_queries = len(connection.queries) print("Function : " + func.__name__) print("Number of Queries : {}".format(end_queries - start_queries)) print("Finished in : {}".format(end - start)) return result return inner_func |
Select_related
Question
- Now, I have a function that retrieves the names of all the books in the database and the author names of each book. Usually we do the following:
1 2 3 4 5 6 7 8 9 10 | @query_debugger def book_list(): queryset = Book.objects.all() books = [] for book in queryset: books.append({'name': book.name, 'author': book.author.full_name}) return books |
- I tested this function and produced the following result:
1 2 3 4 5 | Function : book_list Number of Queries : 6 Finished in : 0.01809368800604716 [{'name': 'Thiên long truyện', 'author': 'Trần Quang Hiệp'}, {'name': 'Linh vũ thiên hạ', 'author': 'Nguyễn Văn Diện'}, {'name': 'Vũ luyện điên phong', 'author': 'Nguyễn Quang Diệu'}, {'name': 'Lên bờ xuống ruộng', 'author': 'Nguyễn Văn Diện'}, {'name': 'Độc tôn', 'author': 'Lê Huỳnh Đức'}] |
- Now try to analyze:
- The system will first query the database and retrieve the information of all the books. Here is a query.
- Next iterate all the newly acquired books. Add information to the books variable. and query to retrieve author name. Each loop is 1 query. As in the above example, my database has 5 books, so the loop will execute 5 queries.
- The total function will be 1 + 5 = 6 queries as the result above.
- So where does the above function matter? Suppose now your database has 100 books, so in the above way, the system will have to perform 101 queries. And imagine if you had thousands of book rights? Obviously, it will significantly affect the performance of the system is not it.
Solution
- To fix the above problem, Django provides us with the
select_related
function. This function was born to prevent misuse of queries via a relationship, creating too many queries to the database. - The
select_related
function works by JOIN the fields of the related tables. Therefore,select_related
retrieves related objects in the same database query. -
select_related
only used forone-to-one
relationships orone-to-many
that containforeign-key
- Rewrite the above function with
select_related
:
1 2 3 4 5 6 7 8 9 10 11 | @query_debugger def book_list(): queryset = Book.objects.select_related('author') books = [] for book in queryset: books.append({'name': book.name, 'author': book.author.full_name}) return books |
- Result :
1 2 3 4 5 | Function : book_list Number of Queries : 1 Finished in : 0.004872963996604085 [{'name': 'Thiên long truyện', 'author': 'Trần Quang Hiệp'}, {'name': 'Linh vũ thiên hạ', 'author': 'Nguyễn Văn Diện'}, {'name': 'Vũ luyện điên phong', 'author': 'Nguyễn Quang Diệu'}, {'name': 'Lên bờ xuống ruộng', 'author': 'Nguyễn Văn Diện'}, {'name': 'Độc tôn', 'author': 'Lê Huỳnh Đức'}] |
- Can clearly see the difference. The number of queries decreased from 6 to 1. Because
select_related
only queries once. Get all book information and join related tables to Queryset. So then if you use relation with queryset, it will automatically retrieve the data in Queryset without querying the database again.
Prefetch_related
- If
select_related
is used forone-to-one
relationships orone-to-many
that containforeign-key
,prefetch_related
is used formany-to-many
relationships andone-to-many
. -
prefetch_related
does a separate lookup for each relationship. - For example: I want to retrieve all categories. Each category will contain the full names of the books of that category. Usually we do the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 | @query_debugger def category_with_book_list(): queryset = Category.objects.all() categories = [] for category in queryset: books = [book.name for book in category.books.all()] categories.append({'name': category.name, 'books': books}) return categories |
- Result
1 2 3 4 | Function : category_with_book_list Number of Queries : 6 Finished in : 0.009519919003651012 |
1 2 3 4 5 6 7 8 | [ {'books': ['Vũ luyện điên phong', 'Độc tôn'], 'name': 'Viễn tưởng'}, {'books': ['Thiên long truyện', 'Vũ luyện điên phong'], 'name': 'Ngôn tình'}, {'books': ['Linh vũ thiên hạ', 'Lên bờ xuống ruộng'], 'name': 'Trinh thám'}, {'books': ['Thiên long truyện', 'Linh vũ thiên hạ', 'Lên bờ xuống ruộng'], 'name': 'Tâm lý'}, {'books': ['Độc tôn'], 'name': 'Kinh dị'} ] |
- The problem with this function is similar to the example in the
select_related
section, each time thecategory.books.all()
runs, it generates a query to the DB. - Try replacing
prefetch_related
to see the difference.
1 2 3 4 5 6 7 8 9 10 11 12 13 | @query_debugger def category_with_book_list(): queryset = Category.objects.prefetch_related('books') categories = [] for category in queryset: books = [book.name for book in category.books.all()] categories.append({'name': category.name, 'books': books}) return categories |
- Result
1 2 3 4 5 6 7 8 9 10 11 | Function : category_with_book_list Number of Queries : 2 Finished in : 0.013129211998602841 [ {'books': ['Vũ luyện điên phong', 'Độc tôn'], 'name': 'Viễn tưởng'}, {'books': ['Thiên long truyện', 'Vũ luyện điên phong'], 'name': 'Ngôn tình'}, {'books': ['Linh vũ thiên hạ', 'Lên bờ xuống ruộng'], 'name': 'Trinh thám'}, {'books': ['Thiên long truyện', 'Linh vũ thiên hạ', 'Lên bờ xuống ruộng'], 'name': 'Tâm lý'}, {'books': ['Độc tôn'], 'name': 'Kinh dị'} ] |
- With
prefetch_related
, you will only need 2 queries to get all the above information. - The
Category.objects.prefetch_related('books')
function takes information of all categories and prefetches related books into Queryset Cache. Whenever thecategory.books.all()
function is called, instead of querying the DB to get information, it looks in the Queryset cache.
Conclude
- The problems may seem simple, but sometimes it's not simple at all, right?
- Hopefully, through this article, everyone can take a closer look at the code. Code optimization is essential for every developer.