[Recommended material] self attention in Transformer, clearly explained
ThisisanarticlewhichIfoundusefulwhenattemptingtounderstandthemulti-head-attentionstructureinTransformer.Multi-headattentionmechanism:“queries”,“keys”,and“values,”overandoveragain