Feed-Forward Layers

论文解读:Do You Even Need Attention?

开源代码:Do You Even Need Attention?

        由于我还没接触transformer框架(占个坑,后面补充),故仅对文中提到的Feed-Forward层进行分析,后续再对该开源代码的整体进行分析。 文中提到了应用在patch维度上的前馈层取代了 vision transformer 中的注意层,由此产生的架构只是一系列以交替方式应用于patch和特征维度的前馈层。

Feed-Forward Layers_第1张图片

        在图中可以看到多层时,Feed-Forward层交替处理Features和Patches,同时LinearBlock网络中采用残差结构,网络展开如下:

LinearBlock(
      (mlp1): Mlp(
        (fc1): Linear(in_features=192, out_features=768, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=768, out_features=192, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )
      (norm1): LayerNorm((192,), eps=1e-06, elementwise_affine=True)
      (mlp2): Mlp(
        (fc1): Linear(in_features=197, out_features=788, bias=True)
        (act): GELU()
        (fc2): Linear(in_features=788, out_features=197, bias=True)
        (drop): Dropout(p=0.0, inplace=False)
      )

        分析得到FeedForward的网络结构图如下所示: 

Feed-Forward Layers_第2张图片 

Feed-Forward Layers_第3张图片

        LinearBlock模块设计的源码:

class LinearBlock(nn.Module):

    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
                 drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, num_tokens=197):
        super().__init__()

        # First stage
        self.mlp1 = Mlp(in_features=dim, hidden_features=int(dim * mlp_ratio), act_layer=act_layer, drop=drop)
        self.norm1 = norm_layer(dim)

        # Second stage
        self.mlp2 = Mlp(in_features=num_tokens, hidden_features=int(
            num_tokens * mlp_ratio), act_layer=act_layer, drop=drop)
        self.norm2 = norm_layer(num_tokens)

        # Dropout (or a variant)
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()

    def forward(self, x):
        x = x + self.drop_path(self.mlp1(self.norm1(x)))
        x = x.transpose(-2, -1)
        x = x + self.drop_path(self.mlp2(self.norm2(x)))
        x = x.transpose(-2, -1)
        return x

 

 

你可能感兴趣的:(PyTorch-MLP,深度学习,机器学习,人工智能)