西红柿炒番茄31

【CVPR2023】最新论文新鲜出炉，快来食用吧...

CVPR2023最新信息及论文下载

官网链接：https://cvpr.thecvf.com/Conferences/2023

2D目标检测(2D Object Detection)
视频目标检测(Video Object Detection)
3D目标检测(3D Object Detection)
人物交互检测(HOI Detection)
伪装目标检测(Camouflaged Object Detection)
旋转目标检测(Rotation Object Detection)
显著性目标检测(Saliency Object Detection)
关键点检测(Keypoint Detection)
车道线检测(Lane Detection)
边缘检测(Edge Detection)
消失点检测(Vanishing Point Detection)
异常检测(Anomaly Detection)

2. 分割(Segmentation)

图像分割(Image Segmentation)
全景分割(Panoptic Segmentation)
语义分割(Semantic Segmentation)
实例分割(Instance Segmentation)
超像素(Superpixel)
视频目标分割(Video Object Segmentation)
抠图(Matting)
密集预测(Dense Prediction)

3. 图像处理(Image Processing)

超分辨率(Super Resolution)
图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)
图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)
图像去噪/去模糊/去雨去雾(Image Denoising)
图像编辑/图像修复(Image Edit/Image Inpainting)
图像翻译(Image Translation)
图像质量评估(Image Quality Assessment)
风格迁移(Style Transfer)
图像配准(Image Registration)

4. 视频处理(Video Processing)

视频编辑(Video Editing)
视频生成/视频合成(Video Generation/Video Synthesis)
视频超分(Video Super-Resolution)

5. 估计(Estimation)

光流/运动估计(Flow/Motion Estimation)
深度估计(Depth Estimation)
人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)
手势估计(Gesture Estimation)

6. 图像&视频检索/(Image&Video Retrieval/Video Understanding)

行为识别/行为识别/动作识别/检测/分割(Action/Activity Recognition)
行人重识别/检测(Re-Identification/Detection)
图像/视频字幕(Image/Video Caption)

7. 人脸(Face)

人脸识别/检测(Facial Recognition/Detection)
人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)
人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

8. 三维视觉(3D Vision)

点云(Point Cloud)
三维重建(3D Reconstruction)
场景重建/视图合成/新视角合成(Novel View Synthesis)

9. 目标跟踪(Object Tracking)

10. 医学影像(Medical Imaging)

11. 文本检测/识别/理解(Text Detection/Recognition/Understanding)

12. 遥感图像(Remote Sensing Image)

13. GAN/生成式/对抗式(GAN/Generative/Adversarial)

14. 图像生成/图像合成(Image Generation/Image Synthesis)

15. 场景图(Scene Graph

场景图生成(Scene Graph Generation)
场景图预测(Scene Graph Prediction)
场景图理解(Scene Graph Understanding)

16. 视觉定位/位姿估计(Visual Localization/Pose Estimation)

17. 视觉推理/视觉问答(Visual Reasoning/VQA)

18. 视觉预测(Vision-based Prediction)

19. 神经网络结构设计(Neural Network Structure Design)

CNN
Transformer
图神经网络(GNN)
神经网络架构搜索(NAS)
MLP

20. 神经网络可解释性(Neural Network Interpretability)

21. 数据集(Dataset)

22. 数据处理(Data Processing)

数据增广(Data Augmentation)
归一化/正则化(Batch Normalization)
图像聚类(Image Clustering)
图像压缩(Image Compression)

23. 图像特征提取与匹配(Image feature extraction and matching)

24. 视觉表征学习(Visual Representation Learning)

25. 模型训练/泛化(Model Training/Generalization)

噪声标签(Noisy Label)
长尾分布(Long-Tailed Distribution)

26. 模型压缩(Model Compression)

知识蒸馏(Knowledge Distillation)
剪枝(Pruning)
量化(Quantization)

27. 模型评估(Model Evaluation)

28. 图像分类(Image Classification)

29. 图像计数(Image Counting)

30. 机器人(Robotic)

31. 半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning)

32. 多模态学习(Multi-Modal Learning)

视听学习(Audio-visual Learning)
视觉-语言（Vision-language）

33. 主动学习(Active Learning)

34. 小样本学习/零样本学习(Few-shot/Zero-shot Learning)

35. 持续学习(Continual Learning/Life-long Learning)

36. 迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

37. 度量学习(Metric Learning)

38. 对比学习(Contrastive Learning)

39. 增量学习(Incremental Learning)

40. 强化学习(Reinforcement Learning)

41. 元学习(Meta Learning)

42. 联邦学习(Federated Learning)

43. 自动驾驶(automatic driving)

其他

检测

2D目标检测(2D Object Detection)

[7]NeRF-RPN: A general framework for object detection in NeRFs

paper

[6]Detecting Everything in the Open World: Towards Universal Object Detection

paper

[5]Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

paper

[4]CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

paper

[3]Enhanced Training of Query-Based Object Detection via Selective Query Recollection

paper | code

[2]DETRs with Hybrid Matching

paper | code

[1]YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors(YOLOv7)

paper | code

视频目标检测(Video Object Detection)

[1]SCOTCH and SODA: A Transformer Video Shadow Detection Framework

paper

3D目标检测(3D object detection)

[22]Neural Part Priors: Learning to Optimize Part-Based Object Completion in RGB-D Scans

paper

[21]itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection

paper

[20]Omni3D: A Large Benchmark and Model for 3D Object Detection in the Wild

paper | code

[19]FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection

paper | code

[18]NS3D: Neuro-Symbolic Grounding of 3D Objects and Relations

paper

[17]Benchmarking Robustness of 3D Object Detection to Common Corruptions in Autonomous Driving

paper

[16]VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

paper | code

[15]OcTr: Octree-based Transformer for 3D Object Detection

paper

[14]MonoATT: Online Monocular 3D Object Detection with Adaptive Token Transformer

paper

[13]CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

paper | code

[12]Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency

paper

[11]AeDet: Azimuth-invariant Multi-view 3D Object Detection

paper | code

[10]Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection

paper

[9]PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

paper | code

[8]MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences

paper

[7]Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View

paper

[6]X3KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

paper

[5]Virtual Sparse Convolution for Multimodal 3D Object Detection

paper | code

[4]MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

paper | code

[3]Uni3D: A Unified Baseline for Multi-dataset 3D Object Detection

paper | code

[2]LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion

paper | code

[1]ConQueR: Query Contrast Voxel-DETR for 3D Object Detection(3D 目标检测的Query Contrast Voxel-DETR)
paper | code

人物交互检测(HOI Detection)

[2]Category Query Learning for Human-Object Interaction Classification

paper

[1]Detecting Human-Object Contact in Images

paper

伪装目标检测(Camouflaged Object Detection)

旋转目标检测(Rotation Object Detection)

显著性目标检测(Saliency Object Detection)

[2]Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings

paper

[1]Texture-guided Saliency Distilling for Unsupervised Salient Object Detection

paper | code

关键点检测(Keypoint Detection)

车道线检测(Lane Detection)

[1]BEV-LaneDet: a Simple and Effective 3D Lane Detection Baseline

paper

边缘检测(Edge Detection)

[2]The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector

paper | code

[1]Iterative Next Boundary Detection for Instance Segmentation of Tree Rings in Microscopy Images of Shrub Cross Sections

paper | code

消失点检测(Vanishing Point Detection)

异常检测(Anomaly Detection)

[8]SQUID: Deep Feature In-Painting for Unsupervised Anomaly Detection

paper

[7]Normalizing Flow based Feature Synthesis for Outlier-Aware Object Detection

paper

[6]Hierarchical Semantic Contrast for Scene-aware Video Anomaly Detection

paper

[5]DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection

paper

[4]Diversity-Measurable Anomaly Detection

paper

[3]Block Selection Method for Using Feature Norm in Out-of-distribution Detection

paper

[2]Lossy Compression for Robust Unsupervised Time-Series Anomaly Detection

paper

[1]Multimodal Industrial Anomaly Detection via Hybrid Fusion

paper | code

分割(Segmentation)

图像分割(Image Segmentation)

[3]Focused and Collaborative Feedback Integration for Interactive Image Segmentation

paper | code

[2]MP-Former: Mask-Piloted Transformer for Image Segmentation

paper | code

[1]Interactive Segmentation as Gaussian Process Classification

paper

全景分割(Panoptic Segmentation)

[2]UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration

paper

[1]Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

paper

语义分割(Semantic Segmentation)

[20]LaserMix for Semi-Supervised LiDAR Semantic Segmentation

paper | code

[19]Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation

paper | code

[18]Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

paper | code

[17]Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation

paper | code

[16]Reliability in Semantic Segmentation: Are We on the Right Track?

paper | code

[15]Generative Semantic Segmentation

paper | code

[14]Novel Class Discovery for 3D Point Cloud Semantic Segmentation

paper | code

[13]MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving

paper | code

[12]Side Adapter Network for Open-Vocabulary Semantic Segmentation

paper | code

[11]Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes

paper

[10]Token Contrast for Weakly-Supervised Semantic Segmentation

paper | code

[9]Delivering Arbitrary-Modal Semantic Segmentation

paper | code

[8]Out-of-Candidate Rectification for Weakly Supervised Semantic Segmentation

paper

[7]Open-Vocabulary Semantic Segmentation with Mask-adapted CLIP

paper | code

[6]Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos

paper | code

[5]SCPNet: Semantic Scene Completion on Point Cloud

paper

[4]On Calibrating Semantic Segmentation Models: Analyses and An Algorithm

paper

[3]Learning Open-vocabulary Semantic Segmentation Models From Natural Language Supervision

paper

[2]Revisiting Weak-to-Strong Consistency in Semi-Supervised Semantic Segmentation

paper | code

[1]Foundation Model Drives Weakly Incremental Learning for Semantic Segmentation

paper

实例分割(Instance Segmentation)

[7]A Generalized Framework for Video Instance Segmentation

paper | code

[6]FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation

paper

[5]SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation

paper | code

[4]DynaMask: Dynamic Mask Selection for Instance Segmentation

paper | code

[3]Beyond mAP: Towards better evaluation of instance segmentation

paper

[2]ISBNet: a 3D Point Cloud Instance Segmentation Network with Instance-aware Sampling and Box-aware Dynamic Convolution

paper

[1]PolyFormer: Referring Image Segmentation as Sequential Polygon Generation(PolyFormer：将图像分割表述为顺序多边形生成)

paper

超像素(Superpixel)

视频目标分割(Video Object Segmentation)

[4]Two-shot Video Object Segmentation

paper

[3]Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation

paper

[2]MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation

paper

[1]InstMove: Instance Motion for Object-centric Video Segmentation

paper | code

抠图(Matting)

密集预测(Dense Prediction)

[2]One-to-Few Label Assignment for End-to-End Dense Detection

paper | code

[1]DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction

paper

视频处理(Video Processing)

[6]A Unified Pyramid Recurrent Network for Video Frame Interpolation

paper

[5]Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior

paper | code

[4]Blind Video Deflickering by Neural Filtering with a Flawed Atlas

paper | code

[3]Learning Transferable Spatiotemporal Representations from Natural Script Knowledge

paper | code

[2]UV Volumes for Real-time Rendering of Editable Free-view Human Performance

paper | code

[1]Exploring Discontinuity for Video Frame Interpolation

[paper]([2202.07291] Exploring Discontinuity for Video Frame Interpolation (arxiv.org))

视频编辑(Video Editing)

[3]Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding

paper

[2]Text-Visual Prompting for Efficient 2D Temporal Video Grounding

paper

[1]Extracting Motion and Appearance via Inter-Frame Attention for Efficient Video Frame Interpolation

paper | code

视频生成/视频合成(Video Generation/Video Synthesis)

[7]Towards End-to-End Generative Modeling of Long Videos with Memory-Efficient Bidirectional Transformers

paper | code

[6]Conditional Image-to-Video Generation with Latent Flow Diffusion Models

paper

[5]3D Cinemagraphy from a Single Image

paper

[4]VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation

paper | code

[3]MOSO: Decomposing MOtion, Scene and Object for Video Prediction

paper | code

[2]SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation

paper | code

[1]Video Probabilistic Diffusion Models in Projected Latent Space(投影潜在空间中的视频概率扩散模型)

paper | project

视频超分(Video Super-Resolution)

[2]Structured Sparsity Learning for Efficient Video Super-Resolution

paper

[1]Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting

paper

估计(Estimation)

光流/运动估计(Optical Flow/Motion Estimation)

[2]DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling

paper

[1]Rethinking Optical Flow from Geometric Matching Consistent Perspective

paper | code

深度估计(Depth Estimation)

[5]SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates

paper

[4]PlaneDepth: Self-supervised Depth Estimation via Orthogonal Planes

paper | code

[3]HRDFuse: Monocular 360°Depth Estimation by Collaboratively Learning Holistic-with-Regional Depth Distributions

paper

[2]Fully Self-Supervised Depth Estimation from Defocus Clue

paper | code

[1] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

paper | code

人体解析/人体姿态估计(Human Parsing/Human Pose Estimation)

[11]Self-Correctable and Adaptable Inference for Generalizable Human Pose Estimation

paper

[10]3D Human Mesh Estimation from Virtual Markers

paper

[9]Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation

paper

[8]Rigidity-Aware Detection for 6D Object Pose Estimation

paper

[7]Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video

paper

[6]Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer

paper

[5]TexPose: Neural Texture Learning for Self-Supervised 6D Object Pose Estimation

paper

[4]Trajectory-Aware Body Interaction Transformer for Multi-Person Pose Forecasting

paper

[3]PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation

paper

[2]DistilPose: Tokenized Pose Regression with Heatmap Distillation

paper

[1]Relightable Neural Human Assets from Multi-view Gradient Illuminations(来自多视图渐变照明的可照明神经人类资产)

paper

手势估计(Gesture Estimation)

[5]Bringing Inputs to Shared Domains for 3D Interacting Hands Recovery in the Wild

paper

[4]Natural Language-Assisted Sign Language Recognition

paper | code

[3]CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

paper | code

[2]Diverse 3D Hand Gesture Prediction from Body Dynamics by Bilateral Hand Disentanglement

paper

[1]Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action Recognition from Egocentric RGB Videos

paper | code

图像处理(Image Processing)

[3]Exploiting Unlabelled Photos for Stronger Fine-Grained SBIR

paper

[2]PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment

paper

[1]DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation

paper | code

超分辨率(Super Resolution)

[6]Activating More Pixels in Image Super-Resolution Transformer

paper | code

[5]Super-Resolution Neural Operator

paper | code

[4]Local Implicit Normalizing Flow for Arbitrary-Scale Image Super-Resolution

paper

[3]Perception-Oriented Single Image Super-Resolution using Optimal Objective Estimation

paper | code

[2]N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution

paper | code

[1]Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild(野外鲁棒图像超分辨率的去噪扩散概率模型)

paper | project

图像复原/图像增强/图像重建(Image Restoration/Image Reconstruction)

[13]CLIP for All Things Zero-Shot Sketch-Based Image Retrieval, Fine-Grained or Not

paper

[12]Instant Volumetric Head Avatars

paper

[11]Contrastive Semi-supervised Learning for Underwater Image Restoration via Reliable Bank

paper | code

[10]ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction

paper | code

[9]Masked Image Modeling with Local Multi-Scale Reconstruction

paper | code

[8]Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective

paper | code

[7]DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration

paper

[6]Robust Unsupervised StyleGAN Image Restoration

paper

[5]Raw Image Reconstruction with Learned Compact Metadata

paper

[4]Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

paper | code

[3]Imagic: Text-Based Real Image Editing with Diffusion Models

paper | project

[2]High-resolution image reconstruction with latent diffusion models from human brain activity

paper | project

[1]Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models

paper

图像去阴影/去反射(Image Shadow Removal/Image Reflection Removal)

[1]LightPainter: Interactive Portrait Relighting with Freehand Scribble

paper

图像去噪/去模糊/去雨去雾(Image Denoising)

[6]Masked Image Training for Generalizable Deep Image Denoising

paper | code

[5]Learning A Sparse Transformer Network for Effective Image Deraining

paper | code

[4]Uncertainty-Aware Unsupervised Image Deblurring with Deep Residual Prior

paper

[3]Polarized Color Image Denoising using Pocoformer

paper

[2]Blur Interpolation Transformer for Real-World Motion from Blur

paper | code

[1]Structured Kernel Estimation for Photon-Limited Deconvolution

paper | code

图像编辑/图像修复(Image Edit/Inpainting)

[6]SIEDOB: Semantic Image Editing by Disentangling Object and Background

paper | code

[5]CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing

paper

[4]SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model

paper

[3]Interactive Cartoonization with Controllable Perceptual Factors

paper

[2]Delving StyleGAN Inversion for Image Editing: A Foundation Latent Space Viewpoint

paper | code

[1]LANIT: Language-Driven Image-to-Image Translation for Unlabeled Data

paper | code

图像翻译(Image Translation)

图像质量评估(Image Quality Assessment)

[2]CR-FIQA: Face Image Quality Assessment by Learning Sample Relative Classifiability

paper

[1]Quality-aware Pre-trained Models for Blind Image Quality Assessment

paper

风格迁移(Style Transfer)

[3]Neural Preset for Color Style Transfer

paper | code

[2]StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields

paper

[1]Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN

paper | code

图像配准(Image Registration)

[1]Indescribable Multi-modal Spatial Evaluator

paper | code

人脸(Face)

人脸识别/检测(Facial Recognition/Detection)

[3]Sibling-Attack: Rethinking Transferable Adversarial Attacks against Face Recognition

paper

[2]Local Region Perception and Relationship Learning Combined with Feature Fusion for Facial Action Unit Detection

paper

[1]Multi Modal Facial Expression Recognition with Transformer-Based Fusion Networks and Dynamic Sampling

paper

人脸生成/合成/重建/编辑(Face Generation/Face Synthesis/Face Reconstruction/Face Editing)

[7]SunStage: Portrait Reconstruction and Relighting using the Sun as a Light Stage

paper

[6]MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation

paper | code

[5]NeuFace: Realistic 3D Neural Face Rendering from Multi-view Images

paper

[4]Graphics Capsule: Learning Hierarchical 3D Face Representations from 2D Images

paper

[3]Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation

paper | code

[2]A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images

paper

[1]MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation(MetaPortrait：具有快速个性化适应的身份保持谈话头像生成)

paper | code

人脸伪造/反欺骗(Face Forgery/Face Anti-Spoofing)

[3]Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment

paper

[2]Implicit Identity Leakage: The Stumbling Block to Improving Deepfake Detection Generalization

paper | code

[1]Physical-World Optical Adversarial Attacks on 3D Face Recognition

paper

目标跟踪(Object Tracking)

[6]MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

paper

[5]Visual Prompt Multi-Modal Tracking

paper | code

[4]Observation-Centric SORT: Rethinking SORT for Robust Multi-Object Tracking

paper | code

[3]Focus On Details: Online Multi-object Tracking with Diverse Fine-grained Representation

paper

[2]Referring Multi-Object Tracking

paper

[1]Simple Cues Lead to a Strong Multi-Object Tracker

paper

图像&视频检索/视频理解(Image&Video Retrieval/Video Understanding)

[10]Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?

paper | code

[9]NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory

paper

[8]Aligning Step-by-Step Instructional Diagrams to Video Demonstrations

paper

[7]Query-Dependent Video Representation for Moment Retrieval and Highlight Detection

paper

[6]Cross-Modal Implicit Relation Reasoning and Aligning for Text-to-Image Person Retrieval

paper | code

[5]Dual-path Adaptation from Image to Video Transformers

paper | code

[4]Data-Free Sketch-Based Image Retrieval

paper

[3]DAA: A Delta Age AdaIN operation for age estimation via binary code transformer

paper

[2]VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval

paper | code

[1]Towards Fast Adaptation of Pretrained Contrastive Models for Multi-channel Video-Language Retrieval

paper

行为识别/动作识别/检测/分割/定位(Action/Activity Recognition)

[8]Box-Level Active Detection

paper

[7]Actionlet-Dependent Contrastive Learning for Unsupervised Skeleton-Based Action Recognition

paper

[6]Open Set Action Recognition via Multi-Label Evidential Learning

paper

[5]Video Test-Time Adaptation for Action Recognition

paper

[4]Post-Processing Temporal Action Detection

paper

[3]TriDet: Temporal Action Detection with Relative Boundary Modeling

paper | code

[2]Learning Discriminative Representations for Skeleton Based Action Recognition

paper

[1]Continuous Sign Language Recognition with Correlation Network

paper | code

行人重识别/检测(Re-Identification/Detection)

[2]TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification

paper | code

[1]MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID

paper | code

图像/视频字幕(Image/Video Caption)

[5]Text with Knowledge Graph Augmented Transformer for Video Captioning

paper

[4]Dual-Stream Transformer for Generic Event Boundary Captioning

paper | code

[3]ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing

paper | code

[2]Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

paper

[1]Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning

paper | code

医学影像(Medical Imaging)

[7]RepMode: Learning to Re-parameterize Diverse Experts for Subcellular Structure Prediction

paper | code

[6]Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation

paper | code

[5]Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification

paper

[4]Neuron Structure Modeling for Generalizable Remote Physiological Measurement

paper | code

[3]Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses

paper | code

[2]Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images

paper | code

[1]Label-Free Liver Tumor Segmentation

paper | code

文本检测/识别/理解(Text Detection/Recognition/Understanding)

[6]Images Speak in Images: A Generalist Painter for In-Context Visual Learning

paper | code

[5]Context De-confounded Emotion Recognition

paper

[4]Joint Visual Grounding and Tracking with Natural Language Specification

paper

[3]Unifying Vision, Text, and Layout for Universal Document Processing

paper

[2]Improving Table Structure Recognition with Visual-Alignment Sequential Coordinate Modeling

paper

[1]DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting

paper | code

遥感图像(Remote Sensing Image)

GAN/生成式/对抗式(GAN/Generative/Adversarial)

[7]Fine-Grained Face Swapping via Regional GAN Inversion

paper

[6]Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models

paper

[5]Graph Transformer GANs for Graph-Constrained House Generation

paper

[4]Improving GAN Training via Feature Space Shrinkage

paper | code

[3]Adversarial Attack with Raindrops

paper

[2]T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations

paper | project

[1]Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

paper | project

图像生成/图像合成(Image Generation/Image Synthesis)

[22]All are Worth Words: A ViT Backbone for Diffusion Models

paper | code

[21]Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

paper | code

[20]Shifted Diffusion for Text-to-image Generation

paper | code

[19]Towards Practical Plug-and-Play Diffusion Models

paper

[18]Efficient Scale-Invariant Generator with Column-Row Entangled Pixel Synthesis

paper

[17]Wavelet Diffusion Models are fast and scalable Image Generators

paper | code

[16]Learning 3D-aware Image Synthesis with Unknown Pose Distribution

paper

[15]Picture that Sketch: Photorealistic Image Generation from Abstract Sketches

paper

[14]3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process

paper | code

[13]A Dynamic Multi-Scale Voxel Flow Network for Video Prediction

paper | code

[12]Regularized Vector Quantization for Tokenized Image Synthesis

paper

[11]SpaText: Spatio-Textual Representation for Controllable Image Generation

paper

[10]Unifying Layout Generation with a Decoupled Diffusion Model

paper

[9]Scaling up GANs for Text-to-Image Synthesis

paper

[8]Inversion-Based Style Transfer with Diffusion Models

paper | code

[7]Perspective Fields for Single Image Camera Calibration

paper

[6]VGFlow: Visibility guided Flow Network for Human Reposing

paper

[5]DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

paper | code

[4]Progressive Open Space Expansion for Open-Set Model Attribution

paper | code

[3]Person Image Synthesis via Denoising Diffusion Model

paper

[2]Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models(使用预训练的 2D 扩散模型解决 3D 逆问题)

paper

[1]Parallel Diffusion Models of Operator and Image for Blind Inverse Problems(盲反问题算子和图像的并行扩散模型)

paper

三维视觉(3D Vision)

[2]Learning a 3D Morphable Face Reflectance Model from Low-cost Data

paper | code

[1]Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

paper | code

点云(Point Cloud)

[15]CLIP2: Contrastive Language-Image-Point Pretraining from Real-World Point Cloud Data

paper

[14]Unsupervised Deep Probabilistic Approach for Partial Point Cloud Registration

paper | code

[13]Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration

paper | code

[12]Controllable Mesh Generation Through Sparse Latent Point Diffusion Models

paper

[11]Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis

paper | code

[10]Rotation-Invariant Transformer for Point Cloud Matching

paper

[9]GraVoS: Voxel Selection for 3D Point-Cloud Detection

paper

[8]DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets

paper | code

[7]PointCert: Point Cloud Classification with Deterministic Certified Robustness Guarantees

paper

[6]ACL-SPC: Adaptive Closed-Loop system for Self-Supervised Point Cloud Completion

paper | code

[5]DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization

paper

[4]Frequency-Modulated Point Cloud Rendering with Easy Editing

paper

[3]Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

paper

[2]ProxyFormer: Proxy Alignment Assisted Point Cloud Completion with Missing Part Sensitive Transformer

paper | code

[1]Point Cloud Forecasting as a Proxy for 4D Occupancy Forecasting

paper | code

三维重建(3D Reconstruction)

[25]HexPlane: A Fast Representation for Dynamic Scenes

paper

[24]Seeing Through the Glass: Neural 3D Reconstruction of Object Inside a Transparent Container

paper

[23]BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

paper

[22]Structured 3D Features for Reconstructing Controllable Avatars

paper

[21]PanoHead: Geometry-Aware 3D Full-Head Synthesis in 360∘

paper

[20]Transforming Radiance Field with Lipschitz Network for Photorealistic 3D Scene Stylization

paper

[19]TAPS3D: Text-Guided 3D Textured Shape Generation from Pseudo Supervision

paper | code

[18]MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

paper | code

[17]PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision

paper

[16]SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation

paper | code

[15]Masked Wavelet Representation for Compact Neural Radiance Fields

paper

[14]Decoupling Human and Camera Motion from Videos in the Wild

paper

[13]Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

paper

[12]NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images

paper

[11]Shape, Pose, and Appearance from a Single Image via Bootstrapped Radiance Field Inversion

paper | code

[10]MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices

paper | code

[9]Unsupervised 3D Shape Reconstruction by Part Retrieval and Assembly

paper

[8]NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction

paper

[7]HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling

paper

[6]MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision

paper

[4]Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness

paper | code

[3]Im2Hands: Learning Attentive Implicit Representation of Interacting Two-Hand Shapes

paper | code

[2]ECON: Explicit Clothed humans Obtained from Normals

paper | code

[1]Structured 3D Features for Reconstructing Relightable and Animatable Avatars

paper | project

场景重建/视图合成/新视角合成(Novel View Synthesis)

[32]Magic3D: High-Resolution Text-to-3D Content Creation

paper

[31]DiffRF: Rendering-Guided 3D Radiance Field Diffusion

paper

[30]Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization

paper | code

[29]Interactive Segmentation of Radiance Fields

paper

[28]MAIR: Multi-view Attention Inverse Rendering with 3D Spatially-Varying Lighting Estimation

paper

[27]GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images

paper

[26]Progressively Optimized Local Radiance Fields for Robust View Synthesis

paper

[25]ABLE-NeRF: Attention-Based Rendering with Learnable Embeddings for Neural Radiance Field

paper

[24]HandNeRF: Neural Radiance Fields for Animatable Interacting Hands

paper

[23]Grid-guided Neural Radiance Fields for Large Urban Scenes

paper

[22]EventNeRF: Neural Radiance Fields from a Single Colour Event Camera

paper

[21]SPARF: Neural Radiance Fields from Sparse and Noisy Poses

paper

[20]RUST: Latent Neural Scene Representations from Unposed Imagery

paper

[19]SINE: Semantic-driven Image-based NeRF Editing with Prior-guided Editing Field

paper

[18]ShadowNeuS: Neural SDF Reconstruction by Shadow Ray Supervision

paper | code

[17]Balanced Spherical Grid for Egocentric View Synthesis

paper | code

[16]Semantic Ray: Learning a Generalizable Semantic Field with Cross-Reprojection Attention

paper

[15]MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures

paper | code

[14]Robust Dynamic Radiance Fields

paper

[13]I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs

paper

[12]Learning Detailed Radiance Manifolds for High-Fidelity and 3D-Consistent Portrait Synthesis from Monocular Image

paper

[11]Nerflets: Local Radiance Fields for Efficient Structure-Aware 3D Scene Representation from 2D Supervision

paper

[10]Local-to-Global Registration for Bundle-Adjusting Neural Radiance Fields

paper

[9]DP-NeRF: Deblurred Neural Radiance Field with Physical Scene Priors

paper | code

[8]SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

paper

[7]3D Video Loops from Asynchronous Input

paper | code

[6]NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer

paper | code

[5]NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation

paper

[4]Renderable Neural Radiance Map for Visual Navigation

paper

[3]Real-Time Neural Light Field on Mobile Devices

paper | project

[2]Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures

paper | code

[1]NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

paper | project

模型压缩(Model Compression)

[1]Neural Video Compression with Diverse Contexts

paper | code

知识蒸馏(Knowledge Distillation)

[3]Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation

paper

[2]Generic-to-Specific Distillation of Masked Autoencoders

paper | code

[1]CLIPPING: Distilling CLIP-based Models for Video-Language Understanding(CLIPPING：为视频语言理解提炼基于 CLIP 的模型)

paper

剪枝(Pruning)

[2]CP3: Channel Pruning Plug-in for Point-based Networks

paper

[1]DepGraph: Towards Any Structural Pruning

paper | code

量化(Quantization)

[4]Hard Sample Matters a Lot in Zero-Shot Quantization

paper

[3]Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective

paper

[2]Post-training Quantization on Diffusion Models

paper | code

[1]Adaptive Data-Free Quantization

paper | code

神经网络结构设计(Neural Network Structure Design)

[6]LINe: Out-of-Distribution Detection by Leveraging Important Neurons

paper

[5]Towards Scalable Neural Representation for Diverse Videos

paper

[4]Boundary Unlearning

paper

[3]Equiangular Basis Vectors

paper | code

[2]LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs

paper | code

[1]Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks

paper | code

CNN

[5]Randomized Adversarial Training via Taylor Expansion

paper | code

[4]Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations

paper | code

[3]DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network

paper | code

[2]Demystify Transformers & Convolutions in Modern Image Deep Networks

paper | code

[1]InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

paper | code

Transformer

[15]CAT: LoCalization and IdentificAtion Cascade Detection Transformer for Open-World Object Detection

paper | code

[14]Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers

paper

[13]POTTER: Pooling Attention Transformer for Efficient Human Mesh Recovery

paper

[12]FeatER: An Efficient Network for Human Reconstruction via Feature Map-Based TransformER

paper

[11]Spherical Transformer for LiDAR-based 3D Recognition

paper | code

[10]MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models

paper | code

[9]Top-Down Visual Attention from Analysis by Synthesis

paper

[8]BiFormer: Vision Transformer with Bi-Level Routing Attention

paper | code

[7]Making Vision Transformers Efficient from A Token Sparsification View

paper

[6]Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves

paper

[5]Learning Imbalanced Data with Vision Transformers

paper | code

[4]SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency

paper

[3]Masked Jigsaw Puzzle: A Versatile Position Embedding for Vision Transformers

paper | code

[2]Lite DETR : An Interleaved Multi-Scale Encoder for Efficient DETR

paper | code

[1]Integrally Pre-Trained Transformer Pyramid Networks

paper | code

图神经网络(GNN)

[2]Turning Strengths into Weaknesses: A Certified Robustness Inspired Attack Framework against Graph Neural Networks

paper

[1]From Node Interaction to Hop Interaction: New Effective and Scalable Graph Learning Paradigm

paper

神经网络架构搜索(NAS)

[3]Polynomial Implicit Neural Representations For Large Diverse Datasets

paper | code

[2]PA&DA: Jointly Sampling PAth and DAta for Consistent NAS

paper | code

[1]Stitchable Neural Networks(可缝合神经网络)

paper | code

MLP

[1]ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization

paper | code

MAE

[1]Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
paper | code

数据处理(Data Processing)

[1]TINC: Tree-structured Implicit Neural Compression

paper | code

数据增广(Data Augmentation)

归一化/正则化(Batch Normalization)

[1]Masked Images Are Counterfactual Samples for Robust Fine-tuning

paper

图像聚类(Image Clustering)

[1]On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering

paper | code

图像压缩(Image Compression)

[1]Context-Based Trit-Plane Coding for Progressive Image Compression

paper | code

模型训练/泛化(Model Training/Generalization)

[16]Generalist: Decoupling Natural and Robust Generalization

paper

[15]Feature Separation and Recalibration for Adversarial Robustness

paper

[14]Enhancing Multiple Reliability Measures via Nuisance-extended Information Bottleneck

paper

[13]FlexiViT: One Model for All Patch Sizes

paper | code

[12]Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization

paper | code

[11]Improving Generalization with Domain Convex Game

paper

[10]TWINS: A Fine-Tuning Framework for Improved Transferability of Adversarial Robustness and Generalization

paper | code

[9]An Extended Study of Human-like Behavior under Adversarial Training

paper

[8]Sharpness-Aware Gradient Matching for Domain Generalization

paper | code

[7]HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining

paper

[6]Universal Instance Perception as Object Discovery and Retrieval

paper | code

[5]Practical Network Acceleration with Tiny Sets

paper | code

[4]Towards Bridging the Performance Gaps of Joint Energy-based Models

paper | code

[3]DropKey

paper

[2]Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization

paper

[1]DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks

paper

噪声标签(Noisy Label)

[2]Fine-Grained Classification with Noisy Labels

paper

[1]Combating noisy labels in object detection datasets

paper

长尾分布(Long-Tailed Distribution)

[1]Curvature-Balanced Feature Manifold Learning for Long-Tailed Classification

paper

图像特征提取与匹配(Image feature extraction and matching)

[3]Referring Image Matting

paper | code

[2]Iterative Geometry Encoding Volume for Stereo Matching

paper | code

[1]Modality-Agnostic Debiasing for Single Domain Generalization

paper

视觉表征学习(Visual Representation Learning)

[12]Masked Scene Contrast: A Scalable Framework for Unsupervised 3D Representation Learning

paper

[11]CrOC: Cross-View Online Clustering for Dense Visual Representation Learning

paper | code

[10]Masked Motion Encoding for Self-Supervised Video Representation Learning

paper | code

[9]Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos

paper | code

[8]MARLIN: Masked Autoencoder for facial video Representation LearnINg

paper | code

[7]Hierarchical discriminative learning improves visual representations of biomedical microscopy

paper

[6]Fine-tuned CLIP Models are Efficient Video Learners

paper | code

[5]Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

paper | code

[4]Open-Set Representation Learning through Combinatorial Embedding

paper

[3]NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction

paper

[2]Stare at What You See: Masked Image Modeling without Reconstruction

paper | code

[1]Switchable Representation Learning Framework with Self-compatibility

paper

模型评估(Model Evaluation)

[2]Physically Adversarial Infrared Patches with Learnable Shapes and Locations

paper

[1]TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

paper | code

多模态学习(Multi-Modal Learning)

[13]CLIP2Scene: Towards Label-Efficient 3D Scene Understanding by CLIP

paper | code

[12]MaPLe: Multi-modal Prompt Learning

paper | code

[11]Decoupled Multimodal Distilling for Emotion Recognition

paper

[10]MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

paper | code

[9]BiCro: Noisy Correspondence Rectification for Multi-modality Data via Bi-directional Cross-modal Similarity Consistency

paper | code

[8]Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos

paper | code

[7]Emotional Reaction Intensity Estimation Based on Multimodal Data

paper

[6]Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers

paper

[5]Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

paper

[4]Multimodal Prompting with Missing Modalities for Visual Recognition

paper | code

[3]Align and Attend: Multimodal Summarization with Dual Contrastive Losses

paper | code

[2]Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information(通过最大化多模态互信息实现一体化预训练)

paper | code

[1]Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks(Uni-Perceiver v2：用于大规模视觉和视觉语言任务的通才模型)

paper | code

视听学习(Audio-visual Learning)

[6]Egocentric Audio-Visual Object Localization

paper | code

[5]Learning Audio-Visual Source Localization via False Negative Aware Contrastive Learning

paper

[4]Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

paper

[3]Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

paper | code

[2]CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective

paper | code

[1]A Light Weight Model for Active Speaker Detection

paper | code

视觉-语言（Vision-language）

[18]MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model

paper | code

[17]Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning

paper

[16]Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

paper | code

[15]Test of Time: Instilling Video-Language Models with a Sense of Time

paper | code

[14]Accelerating Vision-Language Pretraining with Free Language Modeling

paper

[13]Task Residual for Tuning Vision-Language Models

paper | code

[12]MAGVLT: Masked Generative Vision-and-Language Transformer

paper

[11]Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

paper | code

[10]Lana: A Language-Capable Navigator for Instruction Following and Generation

paper | code

[9]FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks

paper | code

[8]Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding

paper

[7]Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

paper

[6]Connecting Vision and Language with Video Localized Narratives

paper | code

[5]Policy Adaptation from Foundation Model Feedback

paper

[4]Open-vocabulary Attribute Detection

paper

[3]Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training

paper

[2]Turning a CLIP Model into a Scene Text Detector

paper | code

[1]GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods

paper

视觉预测(Vision-based Prediction)

[4]TBP-Former: Learning Temporal Bird’s-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving

paper

[3]Intention-Conditioned Long-Term Human Egocentric Action Forecasting

paper | code

[2]Computational Choreography using Human Motion Synthesis

paper

[1]IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction

paper

数据集(Dataset)

[15]GAPartNet: Cross-Category Domain-Generalizable Object Perception and Manipulation via Generalizable and Actionable Parts

paper

[14]ARKitTrack: A New Diverse Dataset for Tracking Using Mobile RGB-D Data

paper

[13]Fantastic Breaks: A Dataset of Paired 3D Scans of Real-World Broken Objects and Their Complete Counterparts

paper

[12]A Bag-of-Prototypes Representation for Dataset-Level Applications

paper

[11]Music-Driven Group Choreography

paper

[10]RaBit: Parametric Modeling of 3D Biped Cartoon Characters with a Topological-consistent Dataset

paper

[9]Backdoor Defense via Adaptively Splitting Poisoned Dataset

paper | code

[8]Learning a Practical SDR-to-HDRTV Up-conversion using New Dataset and Degradation Models

paper | code

[7]SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments

paper | code

[6]A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others

paper | code

[5]MVImgNet: A Large-scale Dataset of Multi-view Images

paper

[4]Spring: A High-Resolution High-Detail Dataset and Benchmark for Scene Flow, Optical Flow and Stereo

paper

[3]CUDA: Convolution-based Unlearnable Datasets

paper

[2]V2V4Real: A Real-world Large-scale Dataset for Vehicle-to-Vehicle Cooperative Perception

paper

[1]Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes

paper

主动学习(Active Learning)

小样本学习/零样本学习(Few-shot Learning/Zero-shot Learning)

[8]CF-Font: Content Fusion for Few-shot Font Generation

paper

[7]DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection

paper | code

[6]Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings

paper | code

[5]Bi-directional Distribution Alignment for Transductive Zero-Shot Learning

paper | code

[4]Zero-Shot Text-to-Parameter Translation for Game Character Auto-Creation

paper

[3]Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners

paper | code

[2]NIFF: Alleviating Forgetting in Generalized Few-Shot Object Detection via Neural Instance Feature Forging

paper

[1]FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization

paper | code

持续学习(Continual Learning/Life-long Learning)

[2]Computationally Budgeted Continual Learning: What Does Matter?

paper | code

[1]Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

paper | code

场景图(Scene Graph)

[1]Probabilistic Debiasing of Scene Graphs

paper | code

场景图生成(Scene Graph Generation)

[1]Prototype-based Embedding Network for Scene Graph Generation

paper

场景图预测(Scene Graph Prediction)

场景图理解(Scene Graph Understanding)

[2]SceneTrilogy: On Human Scene-Sketch and its Complementarity with Photo and Text

paper

[1]PLA: Language-Driven Open-Vocabulary 3D Scene Understanding

paper | code

视觉定位/位姿估计(Visual Localization/Pose Estimation)

[5]Human Pose as Compositional Tokens

paper

[4]Data-efficient Large Scale Place Recognition with Graded Similarity Supervision

paper | code

[3]PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers

paper

[2]StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition

paper

[1]PyramidFlow: High-Resolution Defect Contrastive Localization using Pyramid Normalizing Flow

paper

视觉推理/视觉问答(Visual Reasoning/VQA)

[7]3D Concept Learning and Reasoning from Multi-View Images

paper

[6]Abstract Visual Reasoning: An Algebraic Approach for Solving Raven’s Progressive Matrices

paper | code

[5]Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning

paper | code

[4]Generative Bias for Robust Visual Question Answering

paper

[3]MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering

paper | code

[2]Prompting Large Language Models with Answer Heuristics for Knowledge-based Visual Question Answering

paper | code

[1]From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models

paper | code

图像分类(Image Classification)

[4]Rawgment: Noise-Accounted RAW Augmentation Enables Recognition in a Wide Variety of Environments

paper

[3]Semantic Prompt for Few-Shot Image Recognition

paper

[2]Boosting Verified Training for Robust Image Classifications via Abstraction

paper | code

[1]I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification(I2MVFormer：用于零样本图像分类的大型语言模型生成的多视图文档监督)

paper

迁移学习/domain/自适应(Transfer Learning/Domain Adaptation)

[11]Deep Frequency Filtering for Domain Generalization

paper

[10]Semi-Supervised Domain Adaptation with Source Label Adaptation

paper | code

[9]Unsupervised Continual Semantic Adaptation through Neural Rendering

paper

[8]MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation

paper | code

[7]Patch-Mix Transformer for Unsupervised Domain Adaptation: A Game Perspective

paper

[6]Manipulating Transfer Learning for Property Inference

paper | code

[5]Trainable Projected Gradient Method for Robust Fine-tuning

paper

[4]DA-DETR: Domain Adaptive Detection Transformer with Information Fusion

paper

[3]Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection

paper | code

[2]Guiding Pseudo-labels with Uncertainty Estimation for Source-free Unsupervised Domain Adaptation

paper | code

[1]Adaptive Assignment for Geometry Aware Local Feature Matching

paper

度量学习(Metric Learning)

对比学习(Contrastive Learning)

[8]PromptCAL: Contrastive Affinity Learning via Auxiliary Prompts for Generalized Novel Category Discovery

paper | code

[7]Best of Both Worlds: Multimodal Contrastive Learning with Tabular and Imaging Data

paper

[6]Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

paper

[5]Positive-Augmented Constrastive Learning for Image and Video Captioning Evaluation

paper | code

[4]MaskCon: Masked Contrastive Learning for Coarse-Labelled Dataset

paper | code

[3]CiCo: Domain-Aware Sign Language Retrieval via Cross-Lingual Contrastive Learning

paper | code

[2]Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation

paper | code

[1]Twin Contrastive Learning with Noisy Labels

paper | code

增量学习(Incremental Learning)

[2]Class-Incremental Exemplar Compression for Class-Incremental Learning

paper

[1]Dense Network Expansion for Class Incremental Learning

paper

强化学习(Reinforcement Learning)

[3]Open-World Multi-Task Control Through Goal-Aware Representation Learning and Adaptive Horizon Prediction

paper | code

[2]ProphNet: Efficient Agent-Centric Motion Forecasting with Anchor-Informed Proposals

paper

[1]EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

paper | code

元学习(Meta Learning)

[1]A Meta-Learning Approach to Predicting Performance and Data Requirements

paper

机器人(Robotic)

[2]Efficient Map Sparsification Based on 2D and 3D Discretized Grids

paper

[1]PyPose: A Library for Robot Learning with Physics-based Optimization(PyPose：基于物理优化的机器人学习库)

paper | code

半监督学习/弱监督学习/无监督学习/自监督学习(Self-supervised Learning/Semi-supervised Learning)

[21]Can’t Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders

paper

[20]Conflict-Based Cross-View Consistency for Semi-Supervised Semantic Segmentation

paper | code

[19]ProtoCon: Pseudo-label Refinement via Online Clustering and Prototypical Consistency for Efficient Semi-supervised Learning

paper

[18]Exploring Structured Semantic Prior for Multi Label Recognition with Incomplete Labels

paper

[17]Self-Supervised Learning for Multimodal Non-Rigid 3D Shape Matching

paper | code

[16]Boosting Semi-Supervised Learning by Exploiting All Unlabeled Data

paper

[15]Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning

paper

[14]Correlational Image Modeling for Self-Supervised Visual Pre-Training

paper

[13]Extracting Class Activation Maps from Non-Discriminative Features as well

paper | code

[12]TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation

paper | code

[11]LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding

paper

[10]MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection

paper | code

[9]Semi-supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination

paper

[8]Non-Contrastive Unsupervised Learning of Physiological Signals from Video

paper

[7]Learning Common Rationale to Improve Self-Supervised Representation for Fine-Grained Visual Recognition Problems

paper | code

[6]Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models

paper

[5]The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training

paper | code

[4]Three Guidelines You Should Know for Universally Slimmable Self-Supervised Learning

paper | code

[3]Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors

paper

[2]Siamese Image Modeling for Self-Supervised Vision Representation Learning

paper | code

[1]Cut and Learn for Unsupervised Object Detection and Instance Segmentation

paper | project

神经网络可解释性(Neural Network Interpretability)

[3]OCTET: Object-aware Counterfactual Explanations

paper | code

[2]Don’t Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis

paper

[1]SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries(SplineCam：深度网络几何和决策边界的精确可视化和表征)

paper | code

图像计数(Image Counting)

[1]Zero-shot Object Counting

paper

联邦学习(Federated Learning)

[3]Make Landscape Flatter in Differentially Private Federated Learning

paper

[2]STDLens: Model Hijacking-resilient Federated Learning for Object Detection

paper | code

[1]Re-thinking Federated Active Learning based on Inter-class Diversity

paper | code

自动驾驶(automatic driving)

[1]BEVFormer v2: Adapting Modern Image Backbones to Bird’s-Eye-View Recognition via Perspective Supervision(BEVFormer v2：通过透视监督使现代图像主干适应鸟瞰图识别)

paper

其他

[57]Level-S2fM: Structure from Motion on Neural Level Set of Implicit Surfaces

paper

[56]FeatureBooster: Boosting Feature Descriptors with a Lightweight Neural Network

paper

[55]ARO-Net: Learning Implicit Fields from Anchored Radial Observations

paper | code

[54]Unknown Sniffer for Object Detection: Don’t Turn a Blind Eye to Unknown Objects

paper

[53]Robust Test-Time Adaptation in Dynamic Scenarios

paper

[52]LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction

paper

[51]Doubly Right Object Recognition: A Why Prompt for Visual Rationales

paper

[50]CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

paper

[49]Marching-Primitives: Shape Abstraction from Signed Distance Function

paper

[48]Modeling Inter-Class and Intra-Class Constraints in Novel Class Discovery

paper | code

[47]ActMAD: Activation Matching to Align Distributions for Test-Time-Training

paper | code

[46]Robust Mean Teacher for Continual and Gradual Test-Time Adaptation

paper | code

[45]Planning-oriented Autonomous Driving

paper | code

[44]Explicit Visual Prompting for Low-Level Structure Segmentations

paper | code

[43]Leapfrog Diffusion Model for Stochastic Trajectory Prediction

paper | code

[42]Feature Alignment and Uniformity for Test Time Adaptation

paper

[41]Attribute-preserving Face Dataset Anonymization via Latent Code Optimization

paper | code

[40]Fix the Noise: Disentangling Source Feature for Controllable Domain Translation

paper | code

[39]Effective Ambiguity Attack Against Passport-based DNN Intellectual Property Protection Schemes through Fully Connected Layer Substitution

paper

[38]Visibility Constrained Wide-band Illumination Spectrum Design for Seeing-in-the-Dark

paper | code

[37]Learning a Depth Covariance Function

paper

[36]VecFontSDF: Learning to Reconstruct and Synthesize High-quality Vector Fonts via Signed Distance Functions

paper

[35]Dense Distinct Query for End-to-End Object Detection

paper | code

[34]Facial Affective Analysis based on MAE and Multi-modal Information for 5th ABAW Competition

paper

[33]Partial Network Cloning

paper | code

[32]Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection

paper | code

[31]Adversarial Counterfactual Visual Explanations

paper | code

[3-]A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation

paper | code

[29]Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation

paper | code

[28]Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry

paper | code

[27]Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations

paper | code

[26]Backdoor Defense via Deconfounded Representation Learning

paper | code

[25]Label Information Bottleneck for Label Enhancement

paper

[24]LayoutDM: Discrete Diffusion Model for Controllable Layout Generation

paper | code

[23]Diversity-Aware Meta Visual Prompting

paper | code

[22]ABAW: Valence-Arousal Estimation, Expression Recognition, Action Unit Detection & Emotional Reaction Intensity Estimation Challenges

paper

[21]Visual Exemplar Driven Task-Prompting for Unified Perception in Autonomous Driving

paper

[20]UniHCP: A Unified Model for Human-Centric Perceptions

paper | code

[19]Where We Are and What We’re Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes

paper

[18]Revisiting Rotation Averaging: Uncertainties and Robust Losses

paper | code

[17]3D-Aware Object Goal Navigation via Simultaneous Exploration and Identification

paper

[16]Phase-Shifting Coder: Predicting Accurate Orientation in Oriented Object Detection

paper | code

[15]Understanding and Improving Visual Prompting: A Label-Mapping Perspective

paper | code

[14]vMAP: Vectorised Object Mapping for Neural Field SLAM

paper | code

[13]EcoTTA: Memory-Efficient Continual Test-time Adaptation via Self-distilled Regularization

paper

[12]Upcycling Models under Domain and Category Shift

paper | code

[11]Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images

paper | code

[10]Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies

paper

[9]Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples

paper | code

[8]Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

paper

[7]Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation

paper

[6]Physical-World Optical Adversarial Attacks on 3D Face Recognition

paper

[5]Improving Cross-Modal Retrieval with Set of Diverse Embeddings

paper

[4]Neural Video Compression with Diverse Contexts

paper | code

[3]Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger

paper

[2]Single Image Backdoor Inversion via Robust Smoothed Classifiers

paper | code

[1]Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision

paper | code

3. CVPR2023 论文解读汇总

1.CVPR2023｜打破对MIM（掩码图像建模）的数据缩放能力的误解！

2.CVPR 2023｜基于CLIP的微调新范式！训练速度和性能均创新高！

3.CVPR 2023｜浙大提出全归一化流模型PyramidFlow：高分辨率缺陷异常定位新范式

4.CVPR 2023｜大脑视觉信号被Stable Diffusion复现图像！“人类的谋略和谎言不存在了”

5.CVPR 2023｜港科大 DA-BEV: 3D目标检测新 SOTA，一种强大的深度信息挖掘方法

6.CVPR 23｜表征学习超MAE，谷歌等提出MAGE：无监督图像生成超越 Latent Diffusion

7.CVPR2023｜不好意思我要加速度了！FasterNet：更高FLOPS才是更快更强的底气

8.CVPR 2023｜大模型流行之下，SN-Net给出一份独特的答卷

9.CVPR 2023｜结合特征金字塔结构的自监督学习 iTPNs

10.CVPR 2023｜SQR：对于训练DETR-family目标检测的探索和思考

11.CVPR 2023｜COCO新纪录65.4mAP！InternImage：注入新机制，扩展DCNv3，探索视觉大模型

12.CVPR 2023｜YOLOv7强势收录！时隔6年，YOLOv系列再登CVPR！

13.CVPR 2023｜谷歌提出Imagic：扩散模型只用文字就能PS照片了！

14.CVPR 2023｜Lite DETR：计算量减少60%！高效交错多尺度编码器

15.CVPR 2023｜白翔团队新作：借助CLIP完成场景文字检测

16.CVPR’23｜即插即用系列！一种轻量高效的自注意力机制助力图像恢复网络问鼎 SOTA

17.CVPR 2023｜英伟达提出VoxFromer: 单目3D语义场景补全新SOTA

18.CVPR 2023｜EMA-VFI: 基于帧间注意力提取运动和外观信息的高效视频插帧

原文链接：https://github.com/extreme-assistant/CVPR2023-Paper-Code-Interpretation/blob/master/CVPR2023.md?plain=1

你可能感兴趣的:(计算机视觉,目标检测,深度学习)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
【目标检测数据集】卡车数据集1073张VOC+YOLO格式熬夜写代码的平头哥∰ 目标检测 YOLO 人工智能
数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：1073标注数量(xml文件个数)：1073标注数量(txt文件个数)：1073标注类别数：1标注类别名称:["truck"]每个类别标注的框数：truck框数=1120总框数：1120使用标注工具：labelImg标注
番茄西红柿叶子病害分类数据集12882张11类别 futureflsl 数据集分类数据挖掘人工智能
数据集类型：图像分类用，不可用于目标检测无标注文件数据集格式：仅仅包含jpg图片，每个类别文件夹下面存放着对应图片图片数量(jpg文件个数)：12882分类类别数：11类别名称:["Bacterial_Spot_Bacteria","Early_Blight_Fungus","Healthy","Late_Blight_Water_Mold","Leaf_Mold_Fungus","Powdery
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
[数据集][目标检测]汽车头部尾部检测数据集VOC+YOLO格式5319张3类别 FL1623863129 数据集目标检测汽车 YOLO
数据集制作单位：未来自主研究中心(FIRC)版权单位：未来自主研究中心(FIRC)版权声明：数据集仅仅供个人使用，不得在未授权情况下挂淘宝、咸鱼等交易网站公开售卖,由此引发的法律责任需自行承担数据集格式：PascalVOC格式+YOLO格式(不包含分割路径的txt文件，仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件)图片数量(jpg文件个数)：5319标注数量(xml文件
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
深度学习-点击率预估-研究论文2024-09-14速读 sp_fyf_2024 深度学习人工智能
深度学习-点击率预估-研究论文2024-09-14速读1.DeepTargetSessionInterestNetworkforClick-ThroughRatePredictionHZhong,JMa,XDuan,SGu,JYao-2024InternationalJointConferenceonNeuralNetworks,2024深度目标会话兴趣网络用于点击率预测摘要：这篇文章提出了一种新
计算机视觉中，Pooling的作用 Wils0nEdwards 计算机视觉人工智能
在计算机视觉中，Pooling（池化）是一种常见的操作，主要用于卷积神经网络（CNN）中。它通过对特征图进行下采样，减少数据的空间维度，同时保留重要的特征信息。Pooling的作用可以归纳为以下几个方面：1.降低计算复杂度与内存需求Pooling操作通过对特征图进行下采样，减少了特征图的空间分辨率（例如，高度和宽度）。这意味着网络需要处理的数据量会减少，从而降低了计算量和内存需求。这对大型神经网络
OpenCV图像处理技术（Python）——入门森屿_ opencv
©FuXianjun.AllRightsReserved.OpenCV入门图像作为人类感知世界的视觉基础，是人类获取信息、表达信息的重要手段，OpenCV作为一个开源的计算机视觉库，它包括几百个易用的图像成像和视觉函数，既可以用于学术研究，也可用于工业邻域，它于1999年由因特尔的GaryBradski启动，OpenCV库主要由C和C++语言编写，它可以在多个操作系统上运行。1.1图像处理基本操作
损失函数与反向传播 Star_. PyTorch pytorch 深度学习 python
损失函数定义与作用损失函数(lossfunction)在深度学习领域是用来计算搭建模型预测的输出值和真实值之间的误差。1.损失函数越小越好2.计算实际输出与目标之间的差距3.为更新输出提供依据（反向传播)常见的损失函数回归常见的损失函数有：均方差（MeanSquaredError，MSE）、平均绝对误差（MeanAbsoluteErrorLoss，MAE）、HuberLoss是一种将MSE与MAE
【深度学习】训练过程中一个OOM的问题，太难查了 weixin_40293999 深度学习深度学习人工智能
现象：各位大佬又遇到过ubuntu的这个问题么？现象是在训练过程中，ssh上不去了，能ping通，没死机，但是ubunutu的pc侧的显示器，鼠标啥都不好用了。只能重启。问题原因：OOM了95G，尼玛！！！！pytorch爆内存了，然后journald假死了，在journald被watchdog干掉之后，系统就崩溃了。这种规模的爆内存一般，即使被oomkill了，也要卡半天的，确实会这样，能不能配
CV、NLP、数据控掘推荐、量化海的那边- AI算法自然语言处理人工智能
下面是对CV（计算机视觉）、NLP（自然语言处理）、数据挖掘推荐和量化的简要概述及其应用领域的介绍：1.CV（计算机视觉，ComputerVision）定义：计算机视觉是一门让计算机能够从图像或视频中提取有用信息，并做出决策的学科。它通过模拟人类的视觉系统来识别、处理和理解视觉信息。主要任务：图像分类：识别图像中的物体并分类，比如猫、狗、车等。目标检测：在图像或视频中定位并识别多个对象，如人脸检测
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
机器学习VS深度学习 nfgo 机器学习
机器学习（MachineLearning,ML）和深度学习（DeepLearning,DL）是人工智能（AI）的两个子领域，它们有许多相似之处，但在技术实现和应用范围上也有显著区别。下面从几个方面对两者进行区分：1.概念层面机器学习：是让计算机通过算法从数据中自动学习和改进的技术。它依赖于手动设计的特征和数学模型来进行学习，常用的模型有决策树、支持向量机、线性回归等。深度学习：是机器学习的一个子领
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
深度学习-13-小语言模型之SmolLM的使用皮皮冰燃深度学习深度学习
文章附录1SmolLM概述1.1SmolLM简介1.2下载模型2运行2.1在CPU/GPU/多GPU上运行模型2.2使用torch.bfloat162.3通过位和字节的量化版本3应用示例4问题及解决4.1attention_mask和pad_token_id报错4.2max_new_tokens=205参考附录1SmolLM概述1.1SmolLM简介SmolLM是一系列尖端小型语言模型，提供三种规
基于深度学习的农作物病害检测 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的农作物病害检测利用卷积神经网络（CNN）、生成对抗网络（GAN）、Transformer等深度学习技术，自动识别和分类农作物的病害，帮助农业工作者提高作物管理效率、减少损失。1.农作物病害检测的挑战病害种类繁多：农作物病害的类型多样，不同病害在同一作物上的表现差异很大，同时同一种病害在不同生长阶段的症状也可能不同。环境影响：天气、光照、湿度等外部环境因素会影响农作物的表现，使得病害检
基于深度学习的文本引导的图像编辑 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的文本引导的图像编辑（Text-GuidedImageEditing）是一种通过自然语言文本指令对图像进行编辑或修改的技术。它结合了图像生成和自然语言处理（NLP）的最新进展，使用户能够通过描述性文本对图像内容进行精确的调整和操控。1.文本引导的图像编辑的挑战文本和图像之间的对齐：如何将文本中的语义信息准确地映射到图像中的特定区域或元素是一个关键挑战。这涉及到多模态数据的对齐和理解。编
深度学习--对抗生成网络（GAN, Generative Adversarial Network） Ambition_LAO 深度学习生成对抗网络
对抗生成网络（GAN,GenerativeAdversarialNetwork）是一种深度学习模型，由IanGoodfellow等人在2014年提出。GAN主要用于生成数据，通过两个神经网络相互对抗，来生成以假乱真的新数据。以下是对GAN的详细阐述，包括其概念、作用、核心要点、实现过程、代码实现和适用场景。1.概念GAN由两个神经网络组成：生成器（Generator）和判别器（Discrimina
深度学习：怎么看pth文件的参数奥利给少年深度学习人工智能
.pth文件是PyTorch模型的权重文件，它通常包含了训练好的模型的参数。要查看或使用这个文件，你可以按照以下步骤操作：1.确保你有模型的定义你需要有创建这个.pth文件时所用的模型的代码。这意味着你需要有模型的类定义和架构。2.加载模型权重使用PyTorch的load_state_dict方法来加载权重。这里是如何操作的：importtorchimporttorch.nnasnn#定义模型结构
chatgpt赋能python：如何在Python中安装Keras库？ turensu ChatGpt python chatgpt keras 计算机
如何在Python中安装Keras库？Keras是一个简单易用的神经网络库，由FrançoisChollet编写。它在Python编程语言中实现了深度学习的功能，可以使您更轻松地构建和试验不同类型的神经网络。如果您是一名Python开发人员，肯定会想知道如何在您的Python项目中安装Keras库。在本文中，我们将向您展示如何安装和配置Keras库。步骤1：安装Python要使用Keras库，您需
如何理解深度学习的训练过程奋斗的草莓熊深度学习人工智能 python scikit-learn virtualenv numpy pandas
文章目录1.训练是干什么？2.预训练模型进行训练，主要更改的是预训练模型的什么东西？1.训练是干什么？以yolov5为例子，训练的目的是把一组输入猫狗图像放到神经网络中，得到一个输出模型，这个模型下次可以直接用来识别哪个是猫，哪个是狗2.预训练模型进行训练，主要更改的是预训练模型的什么东西？超参数（Hyperparameters）：这是模型结构中定义的参数，比如：卷积核大小（kernel_size
Keras深度学习框架入门及实战指南司莹嫣Maude
Keras深度学习框架入门及实战指南keraskeras-team/keras:是一个基于Python的深度学习库，它没有使用数据库。适合用于深度学习任务的开发和实现，特别是对于需要使用Python深度学习库的场景。特点是深度学习库、Python、无数据库。项目地址:https://gitcode.com/gh_mirrors/ke/keras一、项目介绍Keras简介Keras是一款高级神经网络
深度学习驱动的车牌识别：技术演进与未来挑战逼子歌深度学习车牌识别神经网络字符识别 YOLO 卷积神经网络
一、引言1.1研究背景在当今社会，智能交通系统的发展日益重要，而车牌识别作为其关键组成部分，发挥着至关重要的作用。车牌识别技术广泛应用于交通管理、停车场管理、安防监控等领域。在交通管理中，它可以用于车辆识别、交通违法监控和车流统计等，提高交通管理的效率和准确性。在停车场管理中，实现车辆的自动识别和收费，提升管理和服务水平。在安防监控领域，可用于追踪嫌疑人及犯罪行为。深度学习的出现为车牌识别带来了重
java封装继承多态等麦田的设计者 java eclipse jvm c encapsulatopn
最近一段时间看了很多的视频却忘记总结了，现在只能想到什么写什么了，希望能起到一个回忆巩固的作用。 1、final关键字译为：最终的 &
F5与集群的区别 bijian1013 weblogic 集群 F5
http请求配置不是通过集群，而是F5；集群是weblogic容器的，如果是ejb接口是通过集群。 F5同集群的差别，主要还是会话复制的问题，F5一把是分发http请求用的，因为http都是无状态的服务，无需关注会话问题，类似
LeetCode[Math] - #7 Reverse Integer Cwind java 题解 Math LeetCode Algorithm
原题链接：#7 Reverse Integer 要求：按位反转输入的数字例1：输入 x = 123, 返回 321 例2：输入 x = -123, 返回 -321 难度：简单分析：对于一般情况，首先保存输入数字的符号，然后每次取输入的末位（x%10）作为输出的高位（result = result*10 + x%10）即可。但
BufferedOutputStream 周凡杨
首先说一下这个大批量，是指有上千万的数据量。例子：有一张短信历史表，其数据有上千万条数据，要进行数据备份到文本文件，就是执行如下SQL然后将结果集写入到文件中！ select t.msisd
linux下模拟按键输入和鼠标被触发 linux
查看/dev/input/eventX是什么类型的事件， cat /proc/bus/input/devices 设备有着自己特殊的按键键码，我需要将一些标准的按键，比如0－9，X－Z等模拟成标准按键，比如KEY_0,KEY-Z等，所以需要用到按键模拟，具体方法就是操作/dev/input/event1文件，向它写入个input_event结构体就可以模拟按键的输入了。 linux/in
ContentProvider初体验肆无忌惮_ ContentProvider
ContentProvider在安卓开发中非常重要。与Activity，Service，BroadcastReceiver并称安卓组件四大天王。在android中的作用是用来对外共享数据。因为安卓程序的数据库文件存放在data/data/packagename里面，这里面的文件默认都是私有的，别的程序无法访问。如果QQ游戏想访问手机QQ的帐号信息一键登录，那么就需要使用内容提供者COnte
关于Spring MVC项目（maven）中通过fileupload上传文件 843977358 mybatis spring mvc 修改头像上传文件 upload
Spring MVC 中通过fileupload上传文件，其中项目使用maven管理。 1.上传文件首先需要的是导入相关支持jar包：commons-fileupload.jar,commons-io.jar 因为我是用的maven管理项目，所以要在pom文件中配置（每个人的jar包位置根据实际情况定） <!-- 文件上传 start by zhangyd-c --&g
使用svnkit api，纯java操作svn，实现svn提交，更新等操作 aigo svnkit
原文：http://blog.csdn.net/hardwin/article/details/7963318 import java.io.File; import org.apache.log4j.Logger; import org.tmatesoft.svn.core.SVNCommitInfo; import org.tmateso
对比浏览器，casperjs，httpclient的Header信息 alleni123 爬虫 crawler header
@Override protected void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { String type=req.getParameter("type"); Enumeration es=re
java.io操作 DataInputStream和DataOutputStream基本数据流百合不是茶 java 流
1，java中如果不保存整个对象，只保存类中的属性，那么我们可以使用本篇文章中的方法，如果要保存整个对象先将类实例化后面的文章将详细写到 2，DataInputStream 是java.io包中一个数据输入流允许应用程序以与机器无关方式从底层输入流中读取基本 Java 数据类型。应用程序可以使用数据输出流写入稍后由数据输入流读取的数据。
车辆保险理赔案例 bijian1013 车险
理赔案例：一货运车，运输公司为车辆购买了机动车商业险和交强险，也买了安全生产责任险，运输一车烟花爆竹，在行驶途中发生爆炸，出现车毁、货损、司机亡、炸死一路人、炸毁一间民宅等惨剧，针对这几种情况，该如何赔付。赔付建议和方案：客户所买交强险在这里不起作用，因为交强险的赔付前提是：“机动车发生道路交通意外事故”；如果是交通意外事故引发的爆炸，则优先适用交强险条款进行赔付，不足的部分由商业
学习Spring必学的Java基础知识(5)—注解 bijian1013 java spring
文章来源：http://www.iteye.com/topic/1123823，整理在我的博客有两个目的：一个是原文确实很不错，通俗易懂，督促自已将博主的这一系列关于Spring文章都学完；另一个原因是为免原文被博主删除，在此记录，方便以后查找阅读。有必要对
【Struts2一】Struts2 Hello World bit1129 Hello world
Struts2 Hello World应用的基本步骤创建Struts2的Hello World应用，包括如下几步： 1.配置web.xml 2.创建Action 3.创建struts.xml，配置Action 4.启动web server，通过浏览器访问配置web.xml <?xml version="1.0" encoding="
【Avro二】Avro RPC框架 bit1129 rpc
1. Avro RPC简介 1.1. RPC RPC逻辑上分为二层，一是传输层，负责网络通信；二是协议层，将数据按照一定协议格式打包和解包从序列化方式来看，Apache Thrift 和Google的Protocol Buffers和Avro应该是属于同一个级别的框架，都能跨语言，性能优秀，数据精简，但是Avro的动态模式（不用生成代码，而且性能很好）这个特点让人非常喜欢，比较适合R
lua　set get cookie ronin47 lua cookie
lua: local access_token = ngx.var.cookie_SGAccessToken if access_token then ngx.header["Set-Cookie"] = "SGAccessToken="..access_token.."; path=/;Max-Age=3000" end
java-打印不大于N的质数 bylijinnan java
public class PrimeNumber { /** * 寻找不大于N的质数 */ public static void main(String[] args) { int n=100; PrimeNumber pn=new PrimeNumber(); pn.printPrimeNumber(n); System.out.print
Spring源码学习-PropertyPlaceholderHelper bylijinnan java spring
今天在看Spring 3.0.0.RELEASE的源码，发现PropertyPlaceholderHelper的一个bug 当时觉得奇怪，上网一搜，果然是个bug，不过早就有人发现了，且已经修复：详见： http://forum.spring.io/forum/spring-projects/container/88107-propertyplaceholderhelper-bug
[逻辑与拓扑]布尔逻辑与拓扑结构的结合会产生什么? comsci 拓扑
如果我们已经在一个工作流的节点中嵌入了可以进行逻辑推理的代码,那么成百上千个这样的节点如果组成一个拓扑网络,而这个网络是可以自动遍历的,非线性的拓扑计算模型和节点内部的布尔逻辑处理的结合,会产生什么样的结果呢? 是否可以形成一种新的模糊语言识别和处理模型呢? 大家有兴趣可以试试,用软件搞这些有个好处,就是花钱比较少,就算不成
ITEYE 都换百度推广了 cuisuqiang Google AdSense 百度推广广告外快
以前ITEYE的广告都是谷歌的Google AdSense，现在都换成百度推广了。为什么个人博客设置里面还是Google AdSense呢？都知道Google AdSense不好申请，这在ITEYE上也不是讨论了一两天了，强烈建议ITEYE换掉Google AdSense。至少，用一个好申请的吧。什么时候能从ITEYE上来点外快，哪怕少点
新浪微博技术架构分析 dalan_123 新浪微博架构
新浪微博在短短一年时间内从零发展到五千万用户，我们的基层架构也发展了几个版本。第一版就是是非常快的，我们可以非常快的实现我们的模块。我们看一下技术特点，微博这个产品从架构上来分析，它需要解决的是发表和订阅的问题。我们第一版采用的是推的消息模式，假如说我们一个明星用户他有10万个粉丝，那就是说用户发表一条微博的时候，我们把这个微博消息攒成10万份，这样就是很简单了，第一版的架构实际上就是这两行字。第
玩转ARP攻击 dcj3sjt126com r
我写这片文章只是想让你明白深刻理解某一协议的好处。高手免看。如果有人利用这片文章所做的一切事情，盖不负责。网上关于ARP的资料已经很多了，就不用我都说了。用某一位高手的话来说，“我们能做的事情很多，唯一受限制的是我们的创造力和想象力”。 ARP也是如此。以下讨论的机子有一个要攻击的机子：10.5.4.178 硬件地址：52:54:4C:98
PHP编码规范 dcj3sjt126com 编码规范
一、文件格式 1. 对于只含有 php 代码的文件，我们将在文件结尾处忽略掉 "?>" 。这是为了防止多余的空格或者其它字符影响到代码。例如：<?php$foo = 'foo';2. 缩进应该能够反映出代码的逻辑结果，尽量使用四个空格，禁止使用制表符TAB，因为这样能够保证有跨客户端编程器软件的灵活性。例
linux 脱机管理（nohup） eksliang linux nohup nohup
脱机管理 nohup 转载请出自出处：http://eksliang.iteye.com/blog/2166699 nohup可以让你在脱机或者注销系统后，还能够让工作继续进行。他的语法如下 nohup [命令与参数] --在终端机前台工作 nohup [命令与参数] & --在终端机后台工作但是这个命令需要注意的是，nohup并不支持bash的内置命令，所
BusinessObjects Enterprise Java SDK greemranqq java BO SAP Crystal Reports
最近项目用到oracle_ADF 从SAP/BO 上调用水晶报表，资料比较少，我做一个简单的分享，给和我一样的新手提供更多的便利。首先，我是尝试用JAVA JSP 去访问的。官方API：http://devlibrary.businessobjects.com/BusinessObjectsxi/en/en/BOE_SDK/boesdk_ja
系统负载剧变下的管控策略 iamzhongyong 高并发
假如目前的系统有100台机器，能够支撑每天1亿的点击量（这个就简单比喻一下），然后系统流量剧变了要，我如何应对，系统有那些策略可以处理，这里总结了一下之前的一些做法。 1、水平扩展这个最容易理解，加机器，这样的话对于系统刚刚开始的伸缩性设计要求比较高，能够非常灵活的添加机器，来应对流量的变化。 2、系统分组假如系统服务的业务不同，有优先级高的，有优先级低的，那就让不同的业务调用提前分组
BitTorrent DHT 协议中文翻译 justjavac bit
前言做了一个磁力链接和BT种子的搜索引擎 {Magnet & Torrent}，因此把 DHT 协议重新看了一遍。 BEP: 5Title: DHT ProtocolVersion: 3dec52cb3ae103ce22358e3894b31cad47a6f22bLast-Modified: Tue Apr 2 16:51:45 2013 -070
Ubuntu下Java环境的搭建 macroli java 工作 ubuntu
配置命令：　　$sudo apt-get install ubuntu-restricted-extras 　　再运行如下命令：　　$sudo apt-get install sun-java6-jdk 　　待安装完毕后选择默认Java. 　　$sudo update- alternatives --config java 　　安装过程提示选择，输入“2”即可，然后按回车键确定。
js字符串转日期（兼容IE所有版本） qiaolevip TO Date String IE
/** * 字符串转时间（yyyy-MM-dd HH:mm:ss） * result （分钟） */ stringToDate : function(fDate){ var fullDate = fDate.split(" ")[0].split("-"); var fullTime = fDate.split("
【数据挖掘学习】关联规则算法Apriori的学习与SQL简单实现购物篮分析 superlxw1234 sql 数据挖掘关联规则
关联规则挖掘用于寻找给定数据集中项之间的有趣的关联或相关关系。关联规则揭示了数据项间的未知的依赖关系，根据所挖掘的关联关系，可以从一个数据对象的信息来推断另一个数据对象的信息。例如购物篮分析。牛奶 ⇒ 面包 [支持度：3%，置信度：40%] 支持度3%：意味3%顾客同时购买牛奶和面包。置信度40%：意味购买牛奶的顾客40%也购买面包。规则的支持度和置信度是两个规则兴
Spring 5.0 的系统需求，期待你的反馈 wiselyman spring
Spring 5.0将在2016年发布。Spring5.0将支持JDK 9。 Spring 5.0的特性计划还在工作中，请保持关注，所以作者希望从使用者得到关于Spring 5.0系统需求方面的反馈。