计算机视觉论文-2021-06-04

本专栏是计算机视觉方向论文收集积累,时间:2021年6月4日,来源:paper digest

欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!

直达笔记地址:机器学习手推笔记(GitHub地址)

1, TITLE: SMURF: SeMantic and Linguistic UndeRstanding Fusion for Caption Evaluation Via Typicality Analysis
AUTHORS: Joshua Feinglass ; Yezhou Yang
CATEGORY: cs.CL [cs.CL, cs.CV]
HIGHLIGHT: We introduce "typicality", a new formulation of evaluation rooted in information theory, which is uniquely suited for problems lacking a definite ground truth.

2, TITLE: Towards Urban Scenes Understanding Through Polarization Cues
AUTHORS: Marc Blanchon ; D�sir� Sidib� ; Olivier Morel ; Ralph Seulin ; Fabrice Meriaudeau
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a two-axis pipeline based on polarization indices to analyze dynamic urban scenes.

3, TITLE: Less Is More: Sparse Sampling for Dense Reaction Predictions
AUTHORS: Kezhou Lin ; Xiaohan Wang ; Zhedong Zheng ; Linchao Zhu ; Yi Yang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this report, we present our method for 2021 Evoked Expression from Videos Challenge.

4, TITLE: Spline Positional Encoding for Learning 3D Implicit Signed Distance Fields
AUTHORS: Peng-Shuai Wang ; Yang Liu ; Yu-Qi Yang ; Xin Tong
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: In this paper, we propose a novel positional encoding scheme, called Spline Positional Encoding, to map the input coordinates to a high dimensional space before passing them to MLPs, for helping to recover 3D signed distance fields with fine-scale geometric details from unorganized 3D point clouds.

5, TITLE: GMAIR: Unsupervised Object Detection Based on Spatial Attention and Gaussian Mixture
AUTHORS: Weijin Zhu ; Yao Shen ; Linfeng Yu ; Lizeth Patricia Aguirre Sanchez
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents a framework, GMAIR, for unsupervised object detection.

6, TITLE: When Vision Transformers Outperform ResNets Without Pretraining or Strong Data Augmentations
AUTHORS: Xiangning Chen ; Cho-Jui Hsieh ; Boqing Gong
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: Hence, this paper investigates ViTs and MLP-Mixers from the lens of loss geometry, intending to improve the models' data efficiency at training and generalization at inference.

7, TITLE: SSMD: Semi-Supervised Medical Image Detection with Adaptive Consistency and Heterogeneous Perturbation
AUTHORS: HONG-YU ZHOU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel Semi-Supervised Medical image Detector (SSMD).

8, TITLE: NTIRE 2021 Challenge on High Dynamic Range Imaging: Dataset, Methods and Results
AUTHORS: Eduardo P�rez-Pellitero ; Sibi Catley-Chandar ; Ale? Leonardis ; Radu Timofte
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: This manuscript focuses on the newly introduced dataset, the proposed methods and their results.

9, TITLE: Attention-Guided Supervised Contrastive Learning for Semantic Segmentation
AUTHORS: HO HIN LEE et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this paper, we propose an attention-guided supervised contrastive learning approach to highlight a single semantic object every time as the target.

10, TITLE: You Never Cluster Alone
AUTHORS: YUMING SHEN et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we extend the mainstream contrastive learning paradigm to a cluster-level scheme, where all the data subjected to the same cluster contribute to a unified representation that encodes the context of each data group.

11, TITLE: DeepCompress: Efficient Point Cloud Geometry Compression
AUTHORS: Ryan Killea ; Yun Li ; Saeed Bastani ; Paul McLachlan
CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG, eess.IV]
HIGHLIGHT: We propose a more efficient deep learning-based encoder architecture for point clouds compression that incorporates principles from established 3D object detection and image compression architectures.

12, TITLE: Barbershop: GAN-based Image Compositing Using Segmentation Masks
AUTHORS: Peihao Zhu ; Rameen Abdal ; John Femiani ; Peter Wonka
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: We present a novel solution to image blending, particularly for the problem of hairstyle transfer, based on GAN-inversion.

13, TITLE: ProtoRes: Proto-Residual Architecture for Deep Modeling of Human Pose
AUTHORS: Boris N. Oreshkin ; Florent Bocquelet ; F�lix H. Harvey ; Bay Raitt ; Dominic Laflamme
CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG]
HIGHLIGHT: To solve this problem, we propose a novel neural architecture that combines residual connections with prototype encoding of a partially specified pose to create a new complete pose from the learned latent space. Furthermore, we introduce two new datasets representing the static human pose modeling problem, based on high-quality human motion capture data, which will be released publicly along with model code.

14, TITLE: Unsharp Mask Guided Filtering
AUTHORS: Zenglin Shi ; Yunlu Chen ; Efstratios Gavves ; Pascal Mettes ; Cees G. M. Snoek
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: The goal of this paper is guided image filtering, which emphasizes the importance of structure transfer during filtering by means of an additional guidance image.

15, TITLE: A Comparison for Anti-noise Robustness of Deep Learning Classification Methods on A Tiny Object Image Dataset: from Convolutional Neural Network to Visual Transformer and Performer
AUTHORS: AO CHEN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: A Comparison for Anti-noise Robustness of Deep Learning Classification Methods on A Tiny Object Image Dataset: from Convolutional Neural Network to Visual Transformer and Performer

16, TITLE: Multi-Scale Feature Aggregation By Cross-Scale Pixel-to-Region Relation Operation for Semantic Segmentation
AUTHORS: YECHAO BAI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we aim to enable the low-level feature to aggregate the complementary context from adjacent high-level feature maps by a cross-scale pixel-to-region relation operation.

17, TITLE: Learning to Select: A Fully Attentive Approach for Novel Object Captioning
AUTHORS: Marco Cagrandi ; Marcella Cornia ; Matteo Stefanini ; Lorenzo Baraldi ; Rita Cucchiara
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: In this paper, we present a novel approach for NOC that learns to select the most relevant objects of an image, regardless of their adherence to the training set, and to constrain the generative process of a language model accordingly.

18, TITLE: APES: Audiovisual Person Search in Untrimmed Video
AUTHORS: JUAN LEON ALCAZAR et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present the Audiovisual Person Search dataset (APES), a new dataset composed of untrimmed videos whose audio (voices) and visual (faces) streams are densely annotated. To showcase the potential of our new dataset, we propose an audiovisual baseline and benchmark for person retrieval.

19, TITLE: Noise Doesn't Lie: Towards Universal Detection of Deep Inpainting
AUTHORS: ANG LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we make the first attempt towards universal detection of deep inpainting, where the detection network can generalize well when detecting different deep inpainting methods.

20, TITLE: Deconfounded Video Moment Retrieval with Causal Intervention
AUTHORS: Xun Yang ; Fuli Feng ; Wei Ji ; Meng Wang ; Tat-Seng Chua
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To fill the research gap, we propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction.

21, TITLE: E2E-VLP: End-to-End Vision-Language Pre-training Enhanced By Visual Learning
AUTHORS: HAIYANG XU et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.CL]
HIGHLIGHT: In this paper, we propose the first end-to-end vision-language pre-trained model for both V+L understanding and generation, namely E2E-VLP, where we build a unified Transformer framework to jointly learn visual representation, and semantic alignments between image and text.

22, TITLE: Semantic Palette: Guiding Scene Generation with Class Proportions
AUTHORS: Guillaume Le Moing ; Tuan-Hung Vu ; Himalaya Jain ; Patrick P�rez ; Matthieu Cord
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose to condition layout generation as well for higher semantic control: given a vector of class proportions, we generate layouts with matching composition.

23, TITLE: Cross-Domain First Person Audio-Visual Action Recognition Through Relative Norm Alignment
AUTHORS: Mirco Planamente ; Chiara Plizzari ; Emanuele Alberti ; Barbara Caputo
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose to leverage over the intrinsic complementary nature of audio-visual signals to learn a representation that works well on data seen during training, while being able to generalize across different domains.

24, TITLE: Self-Supervised Learning of Event-Based Optical Flow with Spiking Neural Networks
AUTHORS: Federico Paredes-Vall�s ; Jesse Hagenaars ; Guido de Croon
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this article, we focus on the self-supervised learning problem of optical flow estimation from event-based camera inputs, and investigate the changes that are necessary to the state-of-the-art ANN training pipeline in order to successfully tackle it with SNNs.

25, TITLE: Robust Reference-based Super-Resolution Via C2-Matching
AUTHORS: Yuming Jiang ; Kelvin C. K. Chan ; Xintao Wang ; Chen Change Loy ; Ziwei Liu
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: To tackle these challenges, we propose C2-Matching in this work, which produces explicit robust matching crossing transformation and resolution.

26, TITLE: NeRFactor: Neural Factorization of Shape and Reflectance Under An Unknown Illumination
AUTHORS: XIUMING ZHANG et. al.
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: We address the problem of recovering the shape and spatially-varying reflectance of an object from posed multi-view images of the object illuminated by one unknown lighting condition.

27, TITLE: Multiscale Domain Adaptive YOLO for Cross-Domain Object Detection
AUTHORS: Mazin Hnewa ; Hayder Radha
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we introduce a novel MultiScale Domain Adaptive YOLO (MS-DAYOLO) framework that employs multiple domain adaptation paths and corresponding domain classifiers at different scales of the recently introduced YOLOv4 object detector to generate domain-invariant features.

28, TITLE: DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
AUTHORS: YONGMING RAO et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: Based on this observation, we propose a dynamic token sparsification framework to prune redundant tokens progressively and dynamically based on the input.

29, TITLE: Domain Adaptation for Facial Expression Classifier Via Domain Discrimination and Gradient Reversal
AUTHORS: Kamil Akhmetov
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We propose a new architecture for the task of FER and examine the impact of domain discrimination loss regularization on the learning process.

30, TITLE: Single Image Depth Estimation Using Wavelet Decomposition
AUTHORS: Micha�l Ramamonjisoa ; Michael Firman ; Jamie Watson ; Vincent Lepetit ; Daniyar Turmukhambetov
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a novel method for predicting accurate depths from monocular images with high efficiency.

31, TITLE: Generalized Domain Adaptation
AUTHORS: Yu Mitsuzumi ; Go Irie ; Daiki Ikami ; Takashi Shibata
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we give a general representation of UDA problems, named Generalized Domain Adaptation (GDA).

32, TITLE: Personalizing Pre-trained Models
AUTHORS: MINA KHAN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We developed a technique, called Multi-label Weight Imprinting (MWI), for multi-label, continual, and few-shot learning, and CLIPPER uses MWI with image representations from CLIP.

33, TITLE: Neural Actor: Neural Free-view Synthesis of Human Actors with Pose Control
AUTHORS: LINGJIE LIU et. al.
CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG]
HIGHLIGHT: We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses.

34, TITLE: Anticipative Video Transformer
AUTHORS: Rohit Girdhar ; Kristen Grauman
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.MM]
HIGHLIGHT: We propose Anticipative Video Transformer (AVT), an end-to-end attention-based video modeling architecture that attends to the previously observed video in order to anticipate future actions.

35, TITLE: Imperceptible Adversarial Examples for Fake Image Detection
AUTHORS: QUANYU LIAO et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we propose a novel method to disrupt the fake image detection by determining key pixels to a fake image detector and attacking only the key pixels, which results in the $L_0$ and the $L_2$ norms of adversarial perturbations much less than those of existing works.

36, TITLE: Learning High-Precision Bounding Box for Rotated Object Detection Via Kullback-Leibler Divergence
AUTHORS: XUE YANG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: Taking the perspective that horizontal detection is a special case for rotated object detection, in this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology, in terms of the relation between rotation and horizontal detection.

37, TITLE: Container: Context Aggregation Network
AUTHORS: Peng Gao ; Jiasen Lu ; Hongsheng Li ; Roozbeh Mottaghi ; Aniruddha Kembhavi
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present the \model (CONText AggregatIon NEtwoRk), a general-purpose building block for multi-head context aggregation that can exploit long-range interactions \emph{a la} Transformers while still exploiting the inductive bias of the local convolution operation leading to faster convergence speeds, often seen in CNNs.

38, TITLE: Transferable Adversarial Examples for Anchor Free Object Detection
AUTHORS: QUANYU LIAO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present the first adversarial attack on anchor-free object detectors.

39, TITLE: CT-Net: Channel Tensorization Network for Video Classification
AUTHORS: Kunchang Li ; Xianhang Li ; Yali Wang ; Jun Wang ; Yu Qiao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: For this reason, we propose a concise and novel Channel Tensorization Network (CT-Net), by treating the channel dimension of input feature as a multiplication of K sub-dimensions.

40, TITLE: Exploring Memorization in Adversarial Training
AUTHORS: YINPENG DONG et. al.
CATEGORY: cs.LG [cs.LG, cs.CV, stat.ML]
HIGHLIGHT: In this paper, we investigate the memorization effect in adversarial training (AT) for promoting a deeper understanding of capacity, convergence, generalization, and especially robust overfitting of adversarially trained classifiers.

41, TITLE: Grounding Complex Navigational Instructions Using Scene Graphs
AUTHORS: Michiel de Jong ; Satyapriya Krishna ; Anuva Agarwal
CATEGORY: cs.LG [cs.LG, cs.CL, cs.CV]
HIGHLIGHT: We adapt the CLEVR visual question answering dataset to generate complex natural language navigation instructions and accompanying scene graphs, yielding an environment-agnostic supervised dataset.

42, TITLE: One Representation to Rule Them All: Identifying Out-of-Support Examples in Few-shot Learning with Generic Representations
AUTHORS: HENRY KVINGE et. al.
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV, math.MG]
HIGHLIGHT: In this paper we describe this challenge of identifying what we term 'out-of-support' (OOS) examples.

43, TITLE: PDPGD: Primal-Dual Proximal Gradient Descent Adversarial Attack
AUTHORS: Alexander Matyasko ; Lap-Pui Chau
CATEGORY: cs.LG [cs.LG, cs.CR, cs.CV]
HIGHLIGHT: In this work, we introduce a fast, general and accurate adversarial attack that optimises the original non-convex constrained minimisation problem.

44, TITLE: Partial Graph Reasoning for Neural Network Regularization
AUTHORS: TIANGE XIANG et. al.
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: Toward betterdescriptions of latent representations, we present DropGraph that learns regularization function by constructing a stand-alone graph from the backbone features.

45, TITLE: LLC: Accurate, Multi-purpose Learnt Low-dimensional Binary Codes
AUTHORS: ADITYA KUSUPATI et. al.
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this work, we propose a novel method for Learning Low-dimensional binary Codes (LLC) for instances as well as classes.

46, TITLE: Not All Knowledge Is Created Equal
AUTHORS: ZIYUN LI et. al.
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: Therefore, we focus on studying selective MKD and highlight its importance in this work.

47, TITLE: Improving The Transferability of Adversarial Examples with New Iteration Framework and Input Dropout
AUTHORS: PENGFEI XIE et. al.
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this paper, we propose a new gradient iteration framework, which redefines the relationship between the above three.

48, TITLE: Convolutional Neural Network(CNN/ConvNet) in Stock Price Movement Prediction
AUTHORS: Kunal Bhardwaj
CATEGORY: cs.NE [cs.NE, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: In this paper, I have tried to use a specific type of Neural Network known as Convolutional Neural Network(CNN/ConvNet) in the stock market.

49, TITLE: Simultaneous Multi-View Object Recognition and Grasping in Open-Ended Domains
AUTHORS: Hamidreza Kasaei ; Sha Luo ; Remo Sasso ; Mohammadreza Kasaei
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: To address this problem, we propose a deep learning architecture with augmented memory capacities to handle open-ended object recognition and grasping simultaneously.

50, TITLE: Fast Improvement of TEM Image with Low-dose Electrons By Deep Learning
AUTHORS: Hiroyasu Katsuno ; Yuki Kimura ; Tomoya Yamazaki ; Ichigaku Takigawa
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: Using a dataset that includes short-exposure images and long-exposure images, we develop a pipeline for processed short-exposure images, based on end-to-end training.

51, TITLE: Machine Learning Based Texture Analysis of Patella from X-Rays for Detecting Patellofemoral Osteoarthritis
AUTHORS: Neslihan Bayramoglu ; Miika T. Nieminen ; Simo Saarakkala
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: We present the first study that analyses patellar bone texture for diagnosing PFOA.

52, TITLE: Pathology-Aware Generative Adversarial Networks for Medical Image Augmentation
AUTHORS: Changhee Han
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In terms of interpolation, the GAN-based medical image augmentation is reliable because medical modalities can display the human body's strong anatomical consistency at fixed position while clearly reflecting inter-subject variability; thus, we propose to use noise-to-image GANs (e.g., random noise samples to diverse pathological images) for (i) medical Data Augmentation (DA) and (ii) physician training.

53, TITLE: Robotic Inspection and 3D GPR-based Reconstruction for Underground Utilities
AUTHORS: Jinglun Feng ; Liang Yang ; Jiang Biao ; Jizhong Xiao
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To address the above challenges, this paper presents a novel robotic system to collect GPR data, interpret GPR data, localize the underground utilities, reconstruct and visualize the underground objects' dense point cloud model in a user-friendly manner.

54, TITLE: Noisy Labels Are Treasure: Mean-Teacher-Assisted Confident Learning for Hepatic Vessel Segmentation
AUTHORS: ZHE XU et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To address this issue, we propose a novel mean-teacher-assisted confident learning framework to robustly exploit the noisy labeled data for the challenging hepatic vessel segmentation task.

55, TITLE: Effort-free Automated Skeletal Abnormality Detection of Rat Fetuses on Whole-body Micro-CT Scans
AUTHORS: Akihiro Fukuda ; Changhee Han ; Kazumi Hakamada
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: Therefore, we propose various bone feature engineering techniques to thoroughly automate the skeletal localization/labeling/abnormality detection of rat fetuses on whole-body micro-CT scans with minimum effort.

56, TITLE: Advances in Classifying The Stages of Diabetic Retinopathy Using Convolutional Neural Networks in Low Memory Edge Devices
AUTHORS: Aditya Jyoti Paul
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG, 68T45, 68T10, 68T07, 68U10, I.2.10; I.4.8; I.5.1; J.3; I.4.1; K.4.2]
HIGHLIGHT: Advances in Classifying The Stages of Diabetic Retinopathy Using Convolutional Neural Networks in Low Memory Edge Devices

57, TITLE: Denoising and Optical and SAR Image Classifications Based on Feature Extraction and Sparse Representation
AUTHORS: Battula Balnarsaiah ; G Rajitha
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: This paper presents a method for denoising, feature extraction and compares classifications of Optical and SAR images.

你可能感兴趣的:(CVPaper,人工智能,计算机视觉)