学习笔记 各种注意力机制 MSA, W-MSA, Local Attention,Stride Attention, ...
ComputerVision里面的Self-attentionHeadqueries,keys和values的计算方式queries,keys和values是输入IN×CI^{N\timesC}IN×C通过全连接层得到,具体如下:queriesQN×dk=IN×CWQN×dkQ^{N\timesd_k}=I^{N\timesC}W^{N\timesd_k}_QQN×dk=IN×CWQN×dkkey