Window Functions in MySQL, Advanced and extremely useful (Part II).

Tram Ho

Hi everyone, it’s me again. In the previous section, I introduced you to Window Functions Part I. If it is unclear what it is, everyone should read it again first, to grasp the definition and key words, avoid the eyes of the letter O but A because in this section, I will mainly practice with Window Functions .


The function name says it all, numbering the records according to the order of each partition. For example: numbering students by class, for example, try to think about what to do if you do not use Window Functions , it is quite hard =)) And when using Row_number() , it is short.

You see, "PARTITION BY s.class_id" divided the table into 3 blocks based on class_id and counted in turn the records corresponding to the block – window of the same record. Simple, right.


Rank ranking, yes. Really is in the process of work, his very ĐƯỢC Customer requests retrieve data for investigation, tracking, rating awarded rank Typically, bla bla. . . Ranking here is not simply just ordering it out, but you have to take the rank and rank it according to what criteria. Tired. Continuing with the above example, I add the column score_AVG the average score of the students r rank them offline.

Simple, right, but a query like this connit it also works, I need to get the ranking position. There, try it …

Too fast, too dangerous, right? =))) But the same class ranked together do * damn what. Ranking by class … OK, right away

Twelve, it sounds delicious, right, but have you noticed the case that if the ranks are equal, then will the next rank miss? for example vs class_id = 1 , there are 2 people ranked 1 and 3 but no one ranked 2 and 4. Not really, it will be useful in some problems. In response from AZ, please go to the next function.

III. dense_rank ()

In the previous post we said, plus a little Spoil in the above function, it is similar to the Rank() function, but the rank order will be in turn, not from 1 -> 3 as on. Continue with the above example. Ranking students by grade

Delicious yet !.

IV. LAG () and LEAD ()

These two functions are quite useful for tracking a company’s growth, monthly revenue, or student progress year-by-year by subject. By comparing the current row with the previous rows – LAG () or the following rows – LEAD (). Full syntax:

  • LAG (expr [, N [, default]])

The function returns the value of the previous n rows since the current row, if there is no row, it returns the default value. In case of leaving N & default blank, default N = 1 and default = NULL.

  • LEAD (expr [, N [, default]])

Similar to the LAG function, but returns the value of the following row from the current row. Let’s test our mọc sừng year by year.

From the table above, based on LAG_DIFF , every year I grow horns until 2018 onwards, I have found my true love. =))


In general, these 3 functions are the same two functions above, also to return the value of 1 row, different from the position of that rotten row.

  • 2 previous functions (LAG – LEAD), then return the value of n rows before / after from the current row
  • These 3 functions then return the terminal position or any partition – window based on ORDER BY order

The syntax is simple:

  • FIRST_VALUE (expr)
  • LAST_VALUE (expr)
  • NTH_VALUE (exp, n)

The rest of the functions I find are quite rare, if you want to learn, you can read the dochere .

The problem x10 salary

Just kidding everyone, although you don’t get a salary, you will be appreciated if you can solve this problem (Personally, I think – maybe it’s still easy for everyone, don’t laugh at me) This is exactly what happened to my project: (I will change the problem but still keep the main idea to ensure the security of the project) Summary : 1 user’s wallet table has more than 60 million records . Each user has many accounts (can be up to several hundred thousand records 1 user) with different amounts, the accounts have priority, that is to use up this account to be used to another account. For each transaction, you have to deduct money from accounts (made sure to check enough money). And 1 user transact many times an hour. Please solve and optimize the query to solve the above problem.

* Example data for people to visualize:

  • QuanLx user has 10 records, totaling $ 561.3
  • A transaction requires $ 250
  • -> Get at least the number of wallets records to complete the above transaction, note deducting according to account’s STT for each user.

Think about it before scrolling down to see the suggestions.

Solution : It seems lonely at first glance, right? every query takes out all the user’s wallet in order and then For subtracting money gradually, when it is enough, it will break . Looks ok, but remember, the table has up to 60 million records, each user can have more than 100k records. For example, a transaction only requires 1 record to pull out all the records, like carrying a buffalo scalpel to chop chicken . Not good: Need to get enough records to handle it !!

Method 1: This way I did it before I learned about Window Functions


Method 2: Use Window Functions


Both methods above match the results, do not see the short but simple query, but in fact my problem has to join 3-4 tables to get the original table for everyone, not available like that. Where, I just demo the main idea.

Problem 2: (outside) This article I am doing in the project as well. In summary, when playing Baracat (betting) 1 user can win-lose-draw (there are 3 doors: Player-Banker-Tie, the rules can be found on GG), now the Customer want to create an event format: rewarding TOP of the highest consecutive winners , TOP the smallest consecutive losers … (There are many targets for 1 user to be ranked very much, whichever is the minimum play amount game, which door to bet … If you say all the criteria to rank, you will be stunned). For easy visualization, I will convert to a post format that we still often do algorithms.

Find the longest consecutive sequence of positive numbers (0 then skip), but instead of an array, this is the database, and not a sequence, but a lot of ranges corresponding to many users. But, let’s just develop from base first😄Sample data for everyone to imagine (for 1 user)

The answer is 4 , everyone try to see if it comes out =)) If you have any questions, please comment. we exchange together The detailed solution I will upload later (Or everyone can ping me)

Thank you to everyone who followed my post! Have a nice working day!

Related document . . .

Share the news now

Source : Viblo