Window Functions in MySQL, Advanced and extremely useful (Part II).

Tram Ho

3 years ago

Hi everyone, it’s me again. In the previous section, I introduced you to Window Functions Part I. If it is unclear what it is, everyone should read it again first, to grasp the definition and key words, avoid the eyes of the letter O but A because in this section, I will mainly practice with Window Functions .

I. ROW_NUMBER ()

The function name says it all, numbering the records according to the order of each partition. For example: numbering students by class, for example, try to think about what to do if you do not use Window Functions , it is quite hard =)) And when using Row_number() , it is short.

SELECT row_number <span class="token punctuation">(</span> <span class="token punctuation">)</span> over <span class="token punctuation">(</span> PARTITION BY s <span class="token punctuation">.</span> class_id ORDER BY s <span class="token punctuation">.</span> name <span class="token punctuation">)</span> AS STT <span class="token punctuation">,</span>
       c <span class="token punctuation">.</span> name AS class_name <span class="token punctuation">,</span>
       s <span class="token punctuation">.</span> <span class="token operator">*</span>
FROM students AS s
JOIN classes AS c ON s <span class="token punctuation">.</span> class_id <span class="token operator">=</span> c <span class="token punctuation">.</span> <span class="token builtin">id</span>

SELECT row_number ( ) over ( PARTITION BY s . class_id ORDER BY s . name ) AS STT ,

c . name AS class_name ,

s . *

FROM students AS s

JOIN classes AS c ON s . class_id = c . id

You see, "PARTITION BY s.class_id" divided the table into 3 blocks based on class_id and counted in turn the records corresponding to the block – window of the same record. Simple, right.

II. RANK ()

Rank ranking, yes. Really is in the process of work, his very ĐƯỢC Customer requests retrieve data for investigation, tracking, rating awarded rank Typically, bla bla. . . Ranking here is not simply just ordering it out, but you have to take the rank and rank it according to what criteria. Tired. Continuing with the above example, I add the column score_AVG the average score of the students r rank them offline.

Select * from students
order by score_AVG DESC

Select * from students

order by score_AVG DESC

Simple, right, but a query like this connit it also works, I need to get the ranking position. There, try it …

Select *, rank() over(order by score_AVG DESC) as ranking from students

1 2	Select *, rank() over(order by score_AVG DESC) as ranking from students

Too fast, too dangerous, right? =))) But the same class ranked together do * damn what. Ranking by class … OK, right away

SELECT c <span class="token punctuation">.</span> name <span class="token punctuation">,</span>
       s <span class="token punctuation">.</span> <span class="token operator">*</span> <span class="token punctuation">,</span>
       rank <span class="token punctuation">(</span> <span class="token punctuation">)</span> over <span class="token punctuation">(</span> PARTITION BY s <span class="token punctuation">.</span> class_id ORDER BY s <span class="token punctuation">.</span> score_AVG DESC <span class="token punctuation">)</span> AS ranking
FROM students AS s
JOIN classes AS c ON s <span class="token punctuation">.</span> class_id <span class="token operator">=</span> c <span class="token punctuation">.</span> <span class="token builtin">id</span>

SELECT c . name ,

s . * ,

rank ( ) over ( PARTITION BY s . class_id ORDER BY s . score_AVG DESC ) AS ranking

FROM students AS s

JOIN classes AS c ON s . class_id = c . id

Twelve, it sounds delicious, right, but have you noticed the case that if the ranks are equal, then will the next rank miss? for example vs class_id = 1 , there are 2 people ranked 1 and 3 but no one ranked 2 and 4. Not really, it will be useful in some problems. In response from AZ, please go to the next function.

III. dense_rank ()

In the previous post we said, plus a little Spoil in the above function, it is similar to the Rank() function, but the rank order will be in turn, not from 1 -> 3 as on. Continue with the above example. Ranking students by grade

SELECT c <span class="token punctuation">.</span> name <span class="token punctuation">,</span>
       s <span class="token punctuation">.</span> <span class="token operator">*</span> <span class="token punctuation">,</span>
       dense_rank <span class="token punctuation">(</span> <span class="token punctuation">)</span> over <span class="token punctuation">(</span> PARTITION BY s <span class="token punctuation">.</span> class_id  ORDER BY s <span class="token punctuation">.</span> score_AVG DESC <span class="token punctuation">)</span> AS ranking
FROM students AS s
JOIN classes AS c ON s <span class="token punctuation">.</span> class_id <span class="token operator">=</span> c <span class="token punctuation">.</span> <span class="token builtin">id</span>

SELECT c . name ,

s . * ,

dense_rank ( ) over ( PARTITION BY s . class_id ORDER BY s . score_AVG DESC ) AS ranking

FROM students AS s

JOIN classes AS c ON s . class_id = c . id

Delicious yet !.

IV. LAG () and LEAD ()

These two functions are quite useful for tracking a company’s growth, monthly revenue, or student progress year-by-year by subject. By comparing the current row with the previous rows – LAG () or the following rows – LEAD (). Full syntax:

LAG (expr [, N [, default]])

The function returns the value of the previous n rows since the current row, if there is no row, it returns the default value. In case of leaving N & default blank, default N = 1 and default = NULL.

LEAD (expr [, N [, default]])

Similar to the LAG function, but returns the value of the following row from the current row. Let’s test our mọc sừng year by year.

SELECT *,
       LAG(heighs) OVER(PARTITION BY name ORDER BY years ASC) AS LAG_heigh,
       (heighs - LAG(heighs) OVER(PARTITION BY name ORDER BY years ASC)) AS LAG_DIFF,
       LEAD(heighs) OVER(PARTITION BY name ORDER BY years ASC) AS LEAD_heigh,
       LAG(heighs, 2, -1) OVER(PARTITION BY name ORDER BY years ASC) AS LAG_heigh2
FROM heigh_infos

SELECT *,

LAG(heighs) OVER(PARTITION BY name ORDER BY years ASC) AS LAG_heigh,

(heighs - LAG(heighs) OVER(PARTITION BY name ORDER BY years ASC)) AS LAG_DIFF,

LEAD(heighs) OVER(PARTITION BY name ORDER BY years ASC) AS LEAD_heigh,

LAG(heighs, 2, -1) OVER(PARTITION BY name ORDER BY years ASC) AS LAG_heigh2

FROM heigh_infos

From the table above, based on LAG_DIFF , every year I grow horns until 2018 onwards, I have found my true love. =))

V FIRST_VALUE () & LAST_VALUE () & NTH_VALUE ()

In general, these 3 functions are the same two functions above, also to return the value of 1 row, different from the position of that rotten row.

2 previous functions (LAG – LEAD), then return the value of n rows before / after from the current row
These 3 functions then return the terminal position or any partition – window based on ORDER BY order

The syntax is simple:

FIRST_VALUE (expr)
LAST_VALUE (expr)
NTH_VALUE (exp, n)

SELECT <span class="token operator">*</span> <span class="token punctuation">,</span>
       FIRST_VALUE <span class="token punctuation">(</span> heighs <span class="token punctuation">)</span> OVER <span class="token punctuation">(</span> PARTITION BY name ORDER BY heighs ASC <span class="token punctuation">)</span> AS min_heigh <span class="token punctuation">,</span>
       LAST_VALUE <span class="token punctuation">(</span> heighs <span class="token punctuation">)</span> OVER <span class="token punctuation">(</span> PARTITION BY name ORDER BY heighs ASC <span class="token punctuation">)</span> AS max_heigh <span class="token punctuation">,</span>
       NTH_VALUE <span class="token punctuation">(</span> heighs <span class="token punctuation">,</span> <span class="token number">3</span> <span class="token punctuation">)</span> OVER <span class="token punctuation">(</span> PARTITION BY name ORDER BY years ASC <span class="token punctuation">)</span> AS random_heigh
FROM heigh_infos

SELECT * ,

FIRST_VALUE ( heighs ) OVER ( PARTITION BY name ORDER BY heighs ASC ) AS min_heigh ,

LAST_VALUE ( heighs ) OVER ( PARTITION BY name ORDER BY heighs ASC ) AS max_heigh ,

NTH_VALUE ( heighs , 3 ) OVER ( PARTITION BY name ORDER BY years ASC ) AS random_heigh

FROM heigh_infos

The rest of the functions I find are quite rare, if you want to learn, you can read the dochere .

The problem x10 salary

Just kidding everyone, although you don’t get a salary, you will be appreciated if you can solve this problem (Personally, I think – maybe it’s still easy for everyone, don’t laugh at me) This is exactly what happened to my project: (I will change the problem but still keep the main idea to ensure the security of the project) Summary : 1 user’s wallet table has more than 60 million records . Each user has many accounts (can be up to several hundred thousand records 1 user) with different amounts, the accounts have priority, that is to use up this account to be used to another account. For each transaction, you have to deduct money from accounts (made sure to check enough money). And 1 user transact many times an hour. Please solve and optimize the query to solve the above problem.

* Example data for people to visualize:

QuanLx user has 10 records, totaling $ 561.3
A transaction requires $ 250
-> Get at least the number of wallets records to complete the above transaction, note deducting according to account’s STT for each user.

Think about it before scrolling down to see the suggestions.

Solution : It seems lonely at first glance, right? every query takes out all the user’s wallet in order and then For subtracting money gradually, when it is enough, it will break . Looks ok, but remember, the table has up to 60 million records, each user can have more than 100k records. For example, a transaction only requires 1 record to pull out all the records, like carrying a buffalo scalpel to chop chicken . Not good: Need to get enough records to handle it !!

Method 1: This way I did it before I learned about Window Functions

SET @amount = 250.0;

SELECT *, @amount:=(@amount-value)
FROM
  (SELECT *
   FROM wallets
   ORDER BY STT ASC) AS tmp
WHERE @amount &gt;= 0;

SET @amount = 250.0;

SELECT *, @amount:=(@amount-value)

FROM

(SELECT *

FROM wallets

ORDER BY STT ASC) AS tmp

WHERE @amount >= 0;

result:

Method 2: Use Window Functions

SET @amount = 0.0;

SELECT *, @amount:=amount from
  (SELECT *, SUM(value) over(PARTITION BY user_name ORDER BY STT ASC) AS amount
   FROM wallets) AS tmp
WHERE @amount &lt; 250

SET @amount = 0.0;

SELECT *, @amount:=amount from

(SELECT *, SUM(value) over(PARTITION BY user_name ORDER BY STT ASC) AS amount

FROM wallets) AS tmp

WHERE @amount < 250

Result:

Both methods above match the results, do not see the short but simple query, but in fact my problem has to join 3-4 tables to get the original table for everyone, not available like that. Where, I just demo the main idea.

Problem 2: (outside) This article I am doing in the project as well. In summary, when playing Baracat (betting) 1 user can win-lose-draw (there are 3 doors: Player-Banker-Tie, the rules can be found on GG), now the Customer want to create an event format: rewarding TOP of the highest consecutive winners , TOP the smallest consecutive losers … (There are many targets for 1 user to be ranked very much, whichever is the minimum play amount game, which door to bet … If you say all the criteria to rank, you will be stunned). For easy visualization, I will convert to a post format that we still often do algorithms.

Find the longest consecutive sequence of positive numbers (0 then skip), but instead of an array, this is the database, and not a sequence, but a lot of ranges corresponding to many users. But, let’s just develop from base firstSample data for everyone to imagine (for 1 user)

The answer is 4 , everyone try to see if it comes out =)) If you have any questions, please comment. we exchange together The detailed solution I will upload later (Or everyone can ping me)

Thank you to everyone who followed my post! Have a nice working day!

I. ROW_NUMBER ()

II. RANK ()

III. dense_rank ()

IV. LAG () and LEAD ()

V FIRST_VALUE () & LAST_VALUE () & NTH_VALUE ()

The problem x10 salary

Related document