摘要关键点:
1.现有限制:U-Net 架构中卷积运算的位置限制了其捕捉长距离依赖性的性能
2.解决限制:提出了一种基于 Transformer 的 U-Net 架构,用 Swin Transformer 模块取代 CNN 块来捕获局部和全局表示
3.网络模型:Att-SwinU-Net,一种基于注意力的 Swin U-Net 扩展
4.关键点:设计跳跃连接路径来提高网络的特征重用性
5.改进:在跳跃连接路径中使用的经典连接操作中加入注意机制
研究人员开发了 U-Net 的多种扩展,包括 H-DenseUNet [4]、BCDU-Net [3] 和 U-Net++ [5]。这些扩展捕获从编码器到解码器的更多语义信息,并将其直接聚合到解码器路径的块间特征映射中,以减轻精细细节特征的损失。
[3] Reza Azad, Maryam Asadi-Aghbolaghi, Mahmood Fathy, and Sergio Escalera, “Bi-directional convlstmu-net with densely connected convolutions,” in Pro-ceedings of the IEEE/CVF International Conferenceon Computer Vision (ICCV) Workshops, Oct 2019, pp.406–415.
[4] Xiaomeng Li, Hao Chen, Xiaojuan Qi, Qi Dou, ChiWing Fu, and Pheng-Ann Heng, “H-denseunet: hybrid densely connected unet for liver and tumor segmenta-tion from ct volumes,” IEEE transactions on medical imaging, vol. 37, no. 12, pp. 2663–2674, 2018.
[5] Zongwei Zhou, Md Mahfuzur Rahman Siddiquee, Nima Tajbakhsh, and Jianming Liang, “Unet++: A nested unet architecture for medical image segmentation,” in Deep learning in medical image analysis and multi-modal learning for clinical decision support, pp. 3–11.Springer, 2018.
[6] Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R Zaiane, and Martin Jagersand, “U2net: Going deeper with nested u-structure for salientobject detection,” Pattern Recognition, vol. 106, pp.107404, 2020.
基于 CNN 的医学图像分割方法的共同特征是,由于局部感受野,全局表示较弱。为了解决这个问题,提出了一种自注意力机制。
[7] Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, et al., “Attention u-net: Learning where to lookfor the pancreas,” arXiv preprint arXiv:1804.03999,2018.
通过施加在编码器路径中计算的注意权重来突出显示信息标记,进一步增强纯基于transform的方法的跳跃连接路径中的操作所使用的经典串联。还提出了一种跨上下文注意方法来重新校准提取的特征集。
[8] Jieneng Chen, Yongyi Lu, Qihang Yu, Xiangde Luo, Ehsan Adeli, Yan Wang, Le Lu, Alan L Yuille, and Yuyin Zhou, “Transunet: Transformers make strong en-coders for medical image segmentation,” arXiv preprint arXiv:2102.04306, 2021.[9] Reza Azad, Mohammad T AL-Antary, Moein Heidari, and Dorit Merhof, “Transnorm: Transformer provides a strong spatial normalization mechanism for a deep seg-mentation model,” IEEE Access, 2022.
[10] Hu Cao, Yueyue Wang, Joy Chen, Dongsheng Jiang, Xiaopeng Zhang, Qi Tian, and Manning Wang, “Swinunet: Unet-like pure transformer for medical image seg-mentation,” arXiv preprint arXiv:2105.05537, 2021.
[11] Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo,“Swin transformer: Hierarchical vision transformer us-ing shifted windows,” in ICCV, 2021, pp. 10012–10022.
[12] Azad Reza, Heidari Moein, Wu Yuli, and Merhof Dorit, “Contextual attention network: Transformer meets u-net,” in MICCAI International Workshop on Machine Learning in Medical Imaging. Springer, 2022.
====================================================================================
纯小白,请大佬批评指正!!!!