Hi everyone, it’s me again. In the previous section, I introduced you to Window Functions
Part I. If it is unclear what it is, everyone should read it again first, to grasp the definition and key words, avoid the eyes of the letter O but A because in this section, I will mainly practice with Window Functions
.
I. ROW_NUMBER ()
The function name says it all, numbering the records according to the order of each partition. For example: numbering students by class, for example, try to think about what to do if you do not use Window Functions
, it is quite hard =)) And when using Row_number()
, it is short.
1 2 3 4 5 6 |
SELECT row_number <span class="token punctuation">(</span> <span class="token punctuation">)</span> over <span class="token punctuation">(</span> PARTITION BY s <span class="token punctuation">.</span> class_id ORDER BY s <span class="token punctuation">.</span> name <span class="token punctuation">)</span> AS STT <span class="token punctuation">,</span> c <span class="token punctuation">.</span> name AS class_name <span class="token punctuation">,</span> s <span class="token punctuation">.</span> <span class="token operator">*</span> FROM students AS s JOIN classes AS c ON s <span class="token punctuation">.</span> class_id <span class="token operator">=</span> c <span class="token punctuation">.</span> <span class="token builtin">id</span> |
You see, "PARTITION BY s.class_id"
divided the table into 3 blocks based on class_id
and counted in turn the records corresponding to the block – window of the same record. Simple, right.
II. RANK ()
Rank ranking, yes. Really is in the process of work, his very ĐƯỢC
Customer requests retrieve data for investigation, tracking, rating awarded rank Typically, bla bla. . . Ranking here is not simply just ordering it out, but you have to take the rank and rank it according to what criteria. Tired. Continuing with the above example, I add the column score_AVG
the average score of the students r rank them offline.
1 2 3 |
Select * from students order by score_AVG DESC |
Simple, right, but a query like this connit
it also works, I need to get the ranking position. There, try it …
1 2 |
Select *, rank() over(order by score_AVG DESC) as ranking from students |
Too fast, too dangerous, right? =))) But the same class ranked together do * damn what. Ranking by class … OK, right away
1 2 3 4 5 6 |
SELECT c <span class="token punctuation">.</span> name <span class="token punctuation">,</span> s <span class="token punctuation">.</span> <span class="token operator">*</span> <span class="token punctuation">,</span> rank <span class="token punctuation">(</span> <span class="token punctuation">)</span> over <span class="token punctuation">(</span> PARTITION BY s <span class="token punctuation">.</span> class_id ORDER BY s <span class="token punctuation">.</span> score_AVG DESC <span class="token punctuation">)</span> AS ranking FROM students AS s JOIN classes AS c ON s <span class="token punctuation">.</span> class_id <span class="token operator">=</span> c <span class="token punctuation">.</span> <span class="token builtin">id</span> |
Twelve, it sounds delicious, right, but have you noticed the case that if the ranks are equal, then will the next rank miss? for example vs class_id = 1
, there are 2 people ranked 1 and 3 but no one ranked 2 and 4. Not really, it will be useful in some problems. In response from AZ, please go to the next function.
III. dense_rank ()
In the previous post we said, plus a little Spoil
in the above function, it is similar to the Rank()
function, but the rank order will be in turn, not from 1 -> 3 as on. Continue with the above example. Ranking students by grade
1 2 3 4 5 6 |
SELECT c <span class="token punctuation">.</span> name <span class="token punctuation">,</span> s <span class="token punctuation">.</span> <span class="token operator">*</span> <span class="token punctuation">,</span> dense_rank <span class="token punctuation">(</span> <span class="token punctuation">)</span> over <span class="token punctuation">(</span> PARTITION BY s <span class="token punctuation">.</span> class_id ORDER BY s <span class="token punctuation">.</span> score_AVG DESC <span class="token punctuation">)</span> AS ranking FROM students AS s JOIN classes AS c ON s <span class="token punctuation">.</span> class_id <span class="token operator">=</span> c <span class="token punctuation">.</span> <span class="token builtin">id</span> |
Delicious yet !.
IV. LAG () and LEAD ()
These two functions are quite useful for tracking a company’s growth, monthly revenue, or student progress year-by-year by subject. By comparing the current row with the previous rows – LAG () or the following rows – LEAD (). Full syntax:
- LAG (expr [, N [, default]])
The function returns the value of the previous n rows since the current row, if there is no row, it returns the default value. In case of leaving N & default blank, default N = 1 and default = NULL.
- LEAD (expr [, N [, default]])
Similar to the LAG function, but returns the value of the following row from the current row. Let’s test our mọc sừng
year by year.
1 2 3 4 5 6 7 |
SELECT *, LAG(heighs) OVER(PARTITION BY name ORDER BY years ASC) AS LAG_heigh, (heighs - LAG(heighs) OVER(PARTITION BY name ORDER BY years ASC)) AS LAG_DIFF, LEAD(heighs) OVER(PARTITION BY name ORDER BY years ASC) AS LEAD_heigh, LAG(heighs, 2, -1) OVER(PARTITION BY name ORDER BY years ASC) AS LAG_heigh2 FROM heigh_infos |
From the table above, based on LAG_DIFF
, every year I grow horns until 2018 onwards, I have found my true love. =))
V FIRST_VALUE () & LAST_VALUE () & NTH_VALUE ()
In general, these 3 functions are the same two functions above, also to return the value of 1 row, different from the position of that rotten row.
- 2 previous functions (LAG – LEAD), then return the value of n rows before / after from the current row
- These 3 functions then return the terminal position or any partition – window based on ORDER BY order
The syntax is simple:
- FIRST_VALUE (expr)
- LAST_VALUE (expr)
- NTH_VALUE (exp, n)
1 2 3 4 5 6 |
SELECT <span class="token operator">*</span> <span class="token punctuation">,</span> FIRST_VALUE <span class="token punctuation">(</span> heighs <span class="token punctuation">)</span> OVER <span class="token punctuation">(</span> PARTITION BY name ORDER BY heighs ASC <span class="token punctuation">)</span> AS min_heigh <span class="token punctuation">,</span> LAST_VALUE <span class="token punctuation">(</span> heighs <span class="token punctuation">)</span> OVER <span class="token punctuation">(</span> PARTITION BY name ORDER BY heighs ASC <span class="token punctuation">)</span> AS max_heigh <span class="token punctuation">,</span> NTH_VALUE <span class="token punctuation">(</span> heighs <span class="token punctuation">,</span> <span class="token number">3</span> <span class="token punctuation">)</span> OVER <span class="token punctuation">(</span> PARTITION BY name ORDER BY years ASC <span class="token punctuation">)</span> AS random_heigh FROM heigh_infos |
The rest of the functions I find are quite rare, if you want to learn, you can read the dochere .
The problem x10 salary
Just kidding everyone, although you don’t get a salary, you will be appreciated if you can solve this problem (Personally, I think – maybe it’s still easy for everyone, don’t laugh at me) This is exactly what happened to my project: (I will change the problem but still keep the main idea to ensure the security of the project) Summary : 1 user’s wallet table has more than 60 million records . Each user has many accounts (can be up to several hundred thousand records 1 user) with different amounts, the accounts have priority, that is to use up this account to be used to another account. For each transaction, you have to deduct money from accounts (made sure to check enough money). And 1 user transact many times an hour. Please solve and optimize the query to solve the above problem.
* Example data for people to visualize:
QuanLx
user has 10 records, totaling $ 561.3- A transaction requires $ 250
- -> Get at least the number of
wallets
records to complete the above transaction, note deducting according to account’sSTT
for each user.
Think about it before scrolling down to see the suggestions.
Solution : It seems lonely at first glance, right? every query takes out all the user’s wallet in order and then For subtracting money gradually, when it is enough, it will break . Looks ok, but remember, the table has up to 60 million records, each user can have more than 100k records. For example, a transaction only requires 1 record to pull out all the records, like carrying a buffalo scalpel to chop chicken . Not good: Need to get enough records to handle it !!
Method 1: This way I did it before I learned about Window Functions
1 2 3 4 5 6 7 8 9 |
SET @amount = 250.0; SELECT *, @amount:=(@amount-value) FROM (SELECT * FROM wallets ORDER BY STT ASC) AS tmp WHERE @amount >= 0; |
result:
Method 2: Use Window Functions
1 2 3 4 5 6 7 |
SET @amount = 0.0; SELECT *, @amount:=amount from (SELECT *, SUM(value) over(PARTITION BY user_name ORDER BY STT ASC) AS amount FROM wallets) AS tmp WHERE @amount < 250 |
Result:
Both methods above match the results, do not see the short but simple query, but in fact my problem has to join 3-4 tables to get the original table for everyone, not available like that. Where, I just demo the main idea.
Problem 2: (outside) This article I am doing in the project as well. In summary, when playing Baracat (betting) 1 user can win-lose-draw (there are 3 doors: Player-Banker-Tie, the rules can be found on GG), now the Customer want to create an event format: rewarding TOP of the highest consecutive winners , TOP the smallest consecutive losers … (There are many targets for 1 user to be ranked very much, whichever is the minimum play amount game, which door to bet … If you say all the criteria to rank, you will be stunned). For easy visualization, I will convert to a post format that we still often do algorithms.
Find the longest consecutive sequence of positive numbers (0 then skip), but instead of an array, this is the database, and not a sequence, but a lot of ranges corresponding to many users. But, let’s just develop from base first
The answer is 4 , everyone try to see if it comes out =)) If you have any questions, please comment. we exchange together The detailed solution I will upload later (Or everyone can ping me)
Thank you to everyone who followed my post! Have a nice working day!
Related document
https://dev.mysql.com/doc/refman/8.0/en/window-function-descriptions.html .
https://dev.mysql.com/doc/refman/8.0/en/window-functions-usage.html .
https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html .