[Recommended material] self attention in Transformer, clearly explained

This is an article which I found useful when attempting to understand the multi-head-attention structure in Transformer.

Multi-head attention mechanism: “queries”, “keys”, and “values,” over and over again

你可能感兴趣的:(transformer,深度学习,人工智能)