head-attention-mechanism
https://data-science-blog.com/blog/2021/04/07/multi-head-attention-mechanism/
Mathematical explanation of the head attention mechanism.
The Illustrated Transformers
http://jalammar.github.io/illustrated-transformer/
Pretty good explanation, there’s a visual illustration of the Transformer’s probability mechanism
The Attention Mechanism from Scratch
https://machinelearningmastery.com/the-attention-mechanism-from-scratch/
Mathematical explanation, with example code of python, quite close to practice, after reading it is difficult to understand for newcomers
Getting meaning from text: self-attention step-by-step video
https://peltarion.com/blog/data-science/self-attention-video
The explanation is also quite easy to understand, there is a step-by-step video tutorial, and the visual model looks pretty good too
How Attention works in Deep Learning: understanding the attention mechanism in sequence models
Explain step-by-step the evolution of models to reach the Attention method, so that we can have a better insight into the roots and development steps