开源预训练框架 MMPRETRAIN官方文档(模型概要)

在此页面中,我们列出了我们支持的所有算法。您可以点击链接跳转到对应的型号页面。

All supported algorithms

  • [Algorithm] MobileNetV2: Inverted Residuals and Linear Bottlenecks (1 ckpts)

  • [Algorithm] Searching for MobileNetV3 (6 ckpts)

  • [Algorithm] Deep Residual Learning for Image Recognition (22 ckpts)

  • [Algorithm] Res2Net: A New Multi-scale Backbone Architecture (3 ckpts)

  • [Algorithm] Aggregated Residual Transformations for Deep Neural Networks (4 ckpts)

  • [Algorithm] Squeeze-and-Excitation Networks (2 ckpts)

  • [Algorithm] ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (1 ckpts)

  • [Algorithm] ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design (1 ckpts)

  • [Algorithm] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows (12 ckpts)

  • [Algorithm] Very Deep Convolutional Networks for Large-Scale Image Recognition (8 ckpts)

  • [Algorithm] RepVGG: Making VGG-style ConvNets Great Again (12 ckpts)

  • [Algorithm] Transformer in Transformer (1 ckpts)

  • [Algorithm] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (4 ckpts)

  • [Algorithm] Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet (3 ckpts)

  • [Algorithm] TinyViT: Fast Pretraining Distillation for Small Vision Transformers (8 ckpts)

  • [Algorithm] MLP-Mixer: An all-MLP Architecture for Vision (2 ckpts)

  • [Algorithm] Conformer: Local Features Coupling Global Representations for Visual Recognition (4 ckpts)

  • [Algorithm] Designing Network Design Spaces (8 ckpts)

  • [Algorithm] Training data-efficient image transformers & distillation through attention (9 ckpts)

  • [Algorithm] Twins: Revisiting the Design of Spatial Attention in Vision Transformers (6 ckpts)

  • [Algorithm] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (33 ckpts)

  • [Algorithm] A ConvNet for the 2020s (24 ckpts)

  • [Algorithm] Deep High-Resolution Representation Learning for Visual Recognition (9 ckpts)

  • [Algorithm] RepMLP: Re-parameterizing Convolutions into Fully-connected Layers for Image Recognition (2 ckpts)

  • [Algorithm] Wide Residual Networks (3 ckpts)

  • [Algorithm] Visual Attention Network (4 ckpts)

  • [Algorithm] CSPNet: A New Backbone that can Enhance Learning Capability of CNN (3 ckpts)

  • [Algorithm] Patches Are All You Need? (3 ckpts)

  • [Algorithm] Densely Connected Convolutional Networks (4 ckpts)

  • [Algorithm] MetaFormer is Actually What You Need for Vision (5 ckpts)

  • [Algorithm] Rethinking the Inception Architecture for Computer Vision (1 ckpts)

  • [Algorithm] MViTv2: Improved Multiscale Vision Transformers for Classification and Detection (4 ckpts)

  • [Algorithm] EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications (6 ckpts)

  • [Algorithm] An Improved One millisecond Mobile Backbone (5 ckpts)

  • [Algorithm] EfficientFormer: Vision Transformers at MobileNet Speed (3 ckpts)

  • [Algorithm] Swin Transformer V2: Scaling Up Capacity and Resolution (12 ckpts)

  • [Algorithm] DeiT III: Revenge of the ViT (16 ckpts)

  • [Algorithm] HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions (6 ckpts)

  • [Algorithm] MobileViT Light-weight, General-purpose, and Mobile-friendly Vision Transformer (3 ckpts)

  • [Algorithm] DaViT: Dual Attention Vision Transformers (3 ckpts)

  • [Algorithm] Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs (6 ckpts)

  • [Algorithm] Residual Attention: A Simple but Effective Method for Multi-Label Recognition (1 ckpts)

  • [Algorithm] BEiT: BERT Pre-Training of Image Transformers (3 ckpts)

  • [Algorithm] BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers (3 ckpts)

  • [Algorithm] EVA: Exploring the Limits of Masked Visual Representation Learning at Scale (14 ckpts)

  • [Algorithm] Reversible Vision Transformers (2 ckpts)

  • [Algorithm] Learning Transferable Visual Models From Natural Language Supervision (14 ckpts)

  • [Algorithm] MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning (2 ckpts)

  • [Algorithm] EfficientNetV2: Smaller Models and Faster Training (15 ckpts)

  • [Algorithm] Co-designing and Scaling ConvNets with Masked Autoencoders (26 ckpts)

  • [Algorithm] LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference (5 ckpts)

  • [Algorithm] Vision GNN: An Image is Worth Graph of Nodes (7 ckpts)

  • [Algorithm] ArcFace: Additive Angular Margin Loss for Deep Face Recognition (1 ckpts)

  • [Algorithm] XCiT: Cross-Covariance Image Transformers (42 ckpts)

  • [Algorithm] Bootstrap your own latent: A new approach to self-supervised Learning (2 ckpts)

  • [Algorithm] Dense contrastive learning for self-supervised visual pre-training (2 ckpts)

  • [Algorithm] Improved Baselines with Momentum Contrastive Learning (2 ckpts)

  • [Algorithm] An Empirical Study of Training Self-Supervised Vision Transformers (13 ckpts)

  • [Algorithm] A simple framework for contrastive learning of visual representations (4 ckpts)

  • [Algorithm] Exploring simple siamese representation learning (4 ckpts)

  • [Algorithm] Unsupervised Learning of Visual Features by Contrasting Cluster Assignments (2 ckpts)

  • [Algorithm] Masked Autoencoders Are Scalable Vision Learners (11 ckpts)

  • [Algorithm] SimMIM: A Simple Framework for Masked Image Modeling (6 ckpts)

  • [Algorithm] Barlow Twins: Self-Supervised Learning via Redundancy Reduction (2 ckpts)

  • [Algorithm] Context Autoencoder for Self-Supervised Representation Learning (2 ckpts)

  • [Algorithm] Masked Feature Prediction for Self-Supervised Visual Pre-Training (2 ckpts)

  • [Algorithm] MILAN: Masked Image Pretraining on Language Assisted Representation (3 ckpts)

  • [Algorithm] OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework (4 ckpts)

  • [Algorithm] RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer (10 ckpts)

  • [Algorithm] Segment Anything (3 ckpts)

  • [Algorithm] Grounded Language-Image Pre-training (2 ckpts)

  • [Algorithm] EVA-02: A Visual Representation for Neon Genesis (11 ckpts)

  • [Algorithm] DINOv2: Learning Robust Visual Features without Supervision (4 ckpts)

  • [Algorithm] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (5 ckpts)

  • [Algorithm] Flamingo: a Visual Language Model for Few-Shot Learning (1 ckpts)

  • [Algorithm] BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models (2 ckpts)

  • [Algorithm] Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese (4 ckpts)

  • [Algorithm] Integrally Pre-Trained Transformer Pyramid Networks (0 ckpts)

  • [Algorithm] HiViT: A Simple and More Efficient Design of Hierarchical Vision Transformer (0 ckpts)

  • [Algorithm] Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling (4 ckpts)

  • [Algorithm] MiniGPT-4: Enhancing Vision-language Understanding with Advanced Large Language Models (1 ckpts)

  • [Algorithm] Visual Instruction Tuning (0 ckpts)

  • [Algorithm] Otter: A Multi-Modal Model with In-Context Instruction Tuning (1 ckpts)

你可能感兴趣的:(笔记)