自己总结的sci写作句型~~词汇~~

如题。只是自己使用,能帮到大家最好,看不懂也不要cue,本来也是打算自己看的~

  • Its performance surpasses the previous state-of-the-art by a large (significant) margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones.

  • gnConv can serve as a plug-and-play module to improve various vision Transformers and convolution-based models.

  • HorNet also shows favorable scalability to more training data and a larger model size.

  • Apart from the effectiveness in visual encoders, we also show gnConv can be applied to task-specific decoders and consistently improve dense prediction performance with less computation. Our results demonstrate that gnConv can be a new basic module for visual modeling that effectively combines the merits of both vision Transformers and CNNs.

  • The emergence of Transformer-based architectures [16, 52, 42] greatly challenges the dominance of CNNs. By combining some successful designs in CNN architectures and the new self-attention mechanism, vision Transformers have shown leading performance on various vision tasks such as …

  • Some efforts have been made to improve the CNN architectures by learning from the new designs in vision Transformers.

  • While previous work has successfully migrated …, a higher-order spatial interaction mechanism has not been studied. We show that …

  • Transformers lack some of the inductive biases inherent to CNNs, such as translation equivariance and locality, and therefore do not generalize well when trained on insufficient amounts of data.

  • However, ViT achieves inferior performance to CNNs when trained from scratch on a midsize dataset like ImageNet. We find
    it is because: 1) the simple tokenization of input images fails to model the important local structure such as edges and lines among neighboring pixels, leading to low training sample efficiency; 2) the redundant attention backbone design of ViT leads to limited feature richness for fixed computation budgets and limited training samples.

你可能感兴趣的:(学习方法,计算机视觉)