几种Transformer+CNN(U-net)网络

一. 对比

U-Net Transformer
优点 融合深层语义信息和高精度特征所含信息 提取全局信息
不足 无法对距离较远的特征的上下文关系进行建模 缺少局部细节处的信息

二. 网络

1. TransUNet

文章:TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation
几种Transformer+CNN(U-net)网络_第1张图片

2. TransFuse

文章:TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation
之前的工作大多数将U-Net网络中的卷积层替换为transformer,或对两者进行级联。本文提出了一种将二者融合的策略。几种Transformer+CNN(U-net)网络_第2张图片

3. U-Transformer

文章:U-Net Transformer: Self and Cross Attention for Medical Image Segmentation
U-Transformer通过使用MHSA和MHCA实现了self-attention和cross-attention的构建。

U-Transformer models long-range contextual interactions and spatial dependencies by using two types of attention modules: Multi-Head Self-Attention (MHSA) and Multi-Head Cross- Attention (MHCA).

  • MHSA module (self-attention):提取图片中长距离的的结构性特征。

The MHSA module is designed to extract long range structural information from the images.

  • MHCA module (cross-attention):滤除无关信息,highlight对结果重要的特征。

The idea behind the MHCA module is to turn off irrelevant or noisy areas from the skip connection features and highlight regions that present a signifi- cant interest for the application.

几种Transformer+CNN(U-net)网络_第3张图片

几种Transformer+CNN(U-net)网络_第4张图片几种Transformer+CNN(U-net)网络_第5张图片

你可能感兴趣的:(Transformer,计算机视觉)