Optimize database queries in Django with select_related and prefetch_related

Tram Ho

  • For those of you who have been using Django. Queryset is a great tool to assist us in database queries.
  • Just by adding a relationship, you can easily query from one table to another easily. However, sometimes overuse of queries with a relationship can have a significant impact on the speed and performance of your application.
  • In today's article, I will introduce you one way to optimize database queries in Django Framework that is using select_related và fetch_related

Prepare

  • First you need a Django project. Then create an app named books with the models as follows:

  • Here I created 3 models. Author and Book are 1-n relations. Book and Category are relations
  • Next add a decorator function to measure function execution time and the number of queries so that we can compare the efficiency before and after optimization.

Select_related

Question

  • Now, I have a function that retrieves the names of all the books in the database and the author names of each book. Usually we do the following:

  • I tested this function and produced the following result:

  • Now try to analyze:
    • The system will first query the database and retrieve the information of all the books. Here is a query.
    • Next iterate all the newly acquired books. Add information to the books variable. and query to retrieve author name. Each loop is 1 query. As in the above example, my database has 5 books, so the loop will execute 5 queries.
    • The total function will be 1 + 5 = 6 queries as the result above.
  • So where does the above function matter? Suppose now your database has 100 books, so in the above way, the system will have to perform 101 queries. And imagine if you had thousands of book rights? Obviously, it will significantly affect the performance of the system is not it.

Solution

  • To fix the above problem, Django provides us with the select_related function. This function was born to prevent misuse of queries via a relationship, creating too many queries to the database.
  • The select_related function works by JOIN the fields of the related tables. Therefore, select_related retrieves related objects in the same database query.
  • select_related only used for one-to-one relationships or one-to-many that contain foreign-key
  • Rewrite the above function with select_related :

  • Result :

  • Can clearly see the difference. The number of queries decreased from 6 to 1. Because select_related only queries once. Get all book information and join related tables to Queryset. So then if you use relation with queryset, it will automatically retrieve the data in Queryset without querying the database again.

Prefetch_related

  • If select_related is used for one-to-one relationships or one-to-many that contain foreign-key , prefetch_related is used for many-to-many relationships and one-to-many .
  • prefetch_related does a separate lookup for each relationship.
  • For example: I want to retrieve all categories. Each category will contain the full names of the books of that category. Usually we do the following:

  • Result

  • The problem with this function is similar to the example in the select_related section, each time the category.books.all() runs, it generates a query to the DB.
  • Try replacing prefetch_related to see the difference.

  • Result

  • With prefetch_related , you will only need 2 queries to get all the above information.
  • The Category.objects.prefetch_related('books') function takes information of all categories and prefetches related books into Queryset Cache. Whenever the category.books.all() function is called, instead of querying the DB to get information, it looks in the Queryset cache.

Conclude

  • The problems may seem simple, but sometimes it's not simple at all, right?
  • Hopefully, through this article, everyone can take a closer look at the code. Code optimization is essential for every developer.

References

Share the news now

Source : Viblo