[1] An Image Patch is a Wave: Quantum Inspired Vision MLP(图像补丁是波浪:量子启发的视觉 MLP)
paper | code | code
[2] A ConvNet for the 2020s
paper | code
解读:“文艺复兴” ConvNet卷土重来,压过Transformer!FAIR重新设计纯卷积新架构
[1] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding(用于 3D 点云理解的自监督跨模态对比学习)
keywords: Self-Supervised Learning, Contrastive Learning, 3D Point Cloud, Representation Learning, Cross-Modal Learning
paper | code
[2] A Unified Query-based Paradigm for Point Cloud Understanding(一种基于统一查询的点云理解范式)
paper
[3] X -Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning(使用 Transformer 进行 3D 密集字幕的跨模式知识迁移)
keywords:Image Captioning and Dense Captioning(图像字幕/密集字幕);Knowledge distillation(知识蒸馏);Transformer;3D Vision(三维视觉)
paper
[4] CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields(文本和图像驱动的神经辐射场操作)
keywords: NeRF, Image Generation and Manipulation, Language-Image Pre-Training (CLIP)
paper | code
[1] MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video(用于视频中 3D 人体姿势估计的 Seq2seq 混合时空编码器)
keywords:3D Human Pose Estimation, Transformer
paper
[2] H4D: Human 4D Modeling by Learning Neural Compositional Representation(通过学习神经组合表示进行人体 4D 建模)
keywords: 4D Representation(4D 表征),Human Body Estimation(人体姿态估计),Fine-grained Human Reconstruction(细粒度人体重建)
paper
[3] Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation(学习用于多人姿势估计的局部-全局上下文适应)
keywords:Top-Down Pose Estimation(从上至下姿态估计), Limb-based Grouping, Direct Regression
paper
[1] Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding(增量transformer结构增强图像修复与掩蔽位置编码)
keywords: Image Inpainting, Transformer, Image Generation
paper | code
[1] DN-DETR: Accelerate DETR Training by Introducing Query DeNoising(通过引入查询去噪加速 DETR 训练)
keywords: Detection Transformer
paper | code
[1] HairCLIP: Design Your Hair by Text and Reference Image(通过文本和参考图像设计你的头发)
keywords: Language-Image Pre-Training (CLIP), Generative Adversarial Networks
paper | project
[2] Vision-Language Pre-Training with Triple Contrastive Learning(三重对比学习的视觉语言预训练)
keywords: Vision-language representation learning, Contrastive Learning
paper | code
[1] Crafting Better Contrastive Views for Siamese Representation Learning(为连体表示学习制作更好的对比视图)
paper | code
[1] OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion(通过几何感知融合进行 360 度单目深度估计)
keywords: monocular depth estimation(单目深度估计),transformer
paper
[1] Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation(弱监督语义分割的类重新激活图)
paper | code
[1] Colar: Effective and Efficient Online Action Detection by Consulting Exemplars(通过咨询示例进行有效且高效的在线动作检测)
keywords:Online action detection(在线动作检测)
paper
[1] Protecting Celebrities with Identity Consistency Transformer(使用身份一致性transformer保护名人)
paper
[1] Targeted Supervised Contrastive Learning for Long-Tailed Recognition(用于长尾识别的有针对性的监督对比学习)
keywords: Long-Tailed Recognition(长尾识别), Contrastive Learning(对比学习)
paper