本专栏是计算机视觉方向论文收集积累,时间:2021年6月11日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: Data Augmentation to Improve Robustness of Image Captioning Solutions
AUTHORS: Shashank Bujimalla ; Mahesh Subedar ; Omesh Tickoo
CATEGORY: cs.CL [cs.CL, cs.CV]
HIGHLIGHT: In this paper, we study the impact of motion blur, a common quality flaw in real world images, on a state-of-the-art two-stage image captioning solution, and notice a degradation in solution performance as blur intensity increases.
2, TITLE: Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate
AUTHORS: Austin Botelho ; Bertie Vidgen ; Scott A. Hale
CATEGORY: cs.CL [cs.CL, cs.CV, cs.CY, cs.LG]
HIGHLIGHT: We show that both text- and visual- enrichment improves model performance, with the multimodal model (0.771) outperforming other models' F1 scores (0.544, 0.737, and 0.754).
3, TITLE: ImaginE: An Imagination-Based Automatic Evaluation Metric for Natural Language Generation
AUTHORS: Wanrong Zhu ; Xin Eric Wang ; An Yan ; Miguel Eckstein ; William Yang Wang
CATEGORY: cs.CL [cs.CL, cs.AI, cs.CV]
HIGHLIGHT: In this work, we propose ImaginE, an imagination-based automatic evaluation metric for natural language generation.
4, TITLE: An Adaptive Origin-Destination Flows Cluster-detecting Method to Identify Urban Mobility Trends
AUTHORS: MENGYUAN FANG et. al.
CATEGORY: cs.CG [cs.CG, cs.CV]
HIGHLIGHT: To address these limitations, in this paper, we proposed a novel OD flows cluster-detecting method based on the OPTICS algorithm which can identify OD flow clusters with various aggregation scales.
5, TITLE: Improving White-box Robustness of Pre-processing Defenses Via Joint Adversarial Training
AUTHORS: DAWEI ZHOU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Motivated by above analyses, we propose a method called Joint Adversarial Training based Pre-processing (JATP) defense.
6, TITLE: RLCorrector: Reinforced Proofreading for Connectomics Image Segmentation
AUTHORS: Khoa Tuan Nguyen ; Ganghee Jang ; Won-ki Jeong
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Herein, we propose a fully automatic proofreading method based on reinforcement learning.
7, TITLE: Multi-resolution Outlier Pooling for Sorghum Classification
AUTHORS: CHAO REN et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we introduce the Sorghum-100 dataset, a large dataset of RGB imagery of sorghum captured by a state-of-the-art gantry system, a multi-resolution network architecture that learns both global and fine-grained features on the crops, and a new global pooling strategy called Dynamic Outlier Pooling which outperforms standard global pooling strategies on this task.
8, TITLE: The 2021 Hotel-ID to Combat Human Trafficking Competition Dataset
AUTHORS: Rashmi Kamath ; Greg Rolwes ; Samuel Black ; Abby Stylianou
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Here, we present the 2021 Hotel-ID dataset to help raise awareness for this problem and generate novel approaches.
9, TITLE: A Dataset And Benchmark Of Underwater Object Detection For Robot Picking
AUTHORS: CHONGWEI LIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Towards these challenges we introduce a dataset, Detecting Underwater Objects (DUO), and a corresponding benchmark, based on the collection and re-annotation of all relevant datasets.
10, TITLE: Distribution-Aware Semantics-Oriented Pseudo-label for Imbalanced Semi-Supervised Learning
AUTHORS: Youngtaek Oh ; Dong-Jin Kim ; In So Kweon
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: To this end, we propose a general pseudo-labeling framework to address the bias motivated by this observation.
11, TITLE: Unsupervised Video Person Re-identification Via Noise and Hard Frame Aware Clustering
AUTHORS: Pengyu Xie ; Xin Xu ; Zheng Wang ; Toshihiko Yamasaki
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper proposes a Noise and Hard frame Aware Clustering (NHAC) method.
12, TITLE: Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter
AUTHORS: TIANWEI WANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a simple, elegant and effective paradigm called Implicit Feature Alignment (IFA), which can be easily integrated into current text recognizers, resulting in a novel inference mechanism called IFAinference.
13, TITLE: FetReg: Placental Vessel Segmentation and Registration in Fetoscopy Challenge Dataset
AUTHORS: SOPHIA BANO et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, eess.IV]
HIGHLIGHT: In this paper, we provide an overview of the FetReg dataset, challenge tasks, evaluation metrics and baseline methods for both segmentation and registration. Through the Fetoscopic Placental Vessel Segmentation and Registration (FetReg) challenge, we present a large-scale multi-centre dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms for the fetal environment with a focus on creating drift-free mosaics from long duration fetoscopy videos.
14, TITLE: What Does Rotation Prediction Tell Us About Classifier Accuracy Under Varying Testing Environments?
AUTHORS: Weijian Deng ; Stephen Gould ; Liang Zheng
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we train semantic classification and rotation prediction in a multi-task way.
15, TITLE: Cross-Modal Discrete Representation Learning
AUTHORS: ALEXANDER H. LIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work we present a self-supervised learning framework that is able to learn a representation that captures finer levels of granularity across different modalities such as concepts or events represented by visual objects or spoken words.
16, TITLE: Self-Supervised 3D Hand Pose Estimation from Monocular RGB Via Contrastive Learning
AUTHORS: Adrian Spurr ; Aneesh Dahiya ; Xucong Zhang ; Xi Wang ; Otmar Hilliges
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this issue, we propose an equivariant contrastive objective and demonstrate its effectiveness in the context of 3D hand pose estimation.
17, TITLE: AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection
AUTHORS: Hongsong Wang ; Shengcai Liao ; Ling Shao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address these limitations, we propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training into a unified framework.
18, TITLE: Adversarial Motion Modelling Helps Semi-supervised Hand Pose Estimation
AUTHORS: Adrian Spurr ; Pavlo Molchanov ; Umar Iqbal ; Jan Kautz ; Otmar Hilliges
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Embracing this challenge, we propose to combine ideas from adversarial training and motion modelling to tap into unlabeled videos.
19, TITLE: Very Compact Clusters with Structural Regularization Via Similarity and Connectivity
AUTHORS: Xin Ma ; Won Hwa Kim
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this regard, we propose an end-to-end deep clustering algorithm, i.e., Very Compact Clusters (VCC), for the general datasets, which takes advantage of distributions of local relationships of samples near the boundary of clusters, so that they can be properly separated and pulled to cluster centers to form compact clusters.
20, TITLE: Revisiting Point Cloud Shape Classification with A Simple and Effective Baseline
AUTHORS: Ankit Goyal ; Hei Law ; Bowei Liu ; Alejandro Newell ; Jia Deng
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We study the key ingredients of this progress and uncover two critical results.
21, TITLE: CAT: Cross Attention in Vision Transformer
AUTHORS: HEZHENG LIN et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we propose a new attention mechanism in Transformer termed Cross Attention, which alternates attention inner the image patch instead of the whole image to capture local information and apply attention between image patches which are divided from single-channel feature maps capture global information.
22, TITLE: Adaptive Streaming Perception Using Deep Reinforcement Learning
AUTHORS: Anurag Ghosh ; Akshay Nambi ; Aditya Singh ; Harish YVS ; Tanuja Ganu
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: To this end, we describe a new approach based on deep reinforcement learning to learn these tradeoffs at runtime for streaming perception.
23, TITLE: Visual Sensor Pose Optimisation Using Rendering-based Visibility Models for Robust Cooperative Perception
AUTHORS: Eduardo Arnold ; Sajjad Mozaffari ; Mehrdad Dianati ; Paul Jennings
CATEGORY: cs.CV [cs.CV, cs.MA]
HIGHLIGHT: This paper proposes two novel sensor pose optimisation methods, based on gradient-ascent and Integer Programming techniques, which maximise the visibility of multiple target objects in cluttered environments.
24, TITLE: Validation of Simulation-Based Testing: Bypassing Domain Shift with Label-to-Image Synthesis
AUTHORS: JULIA ROSENZWEIG et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We propose a novel framework consisting of a generative label-to-image synthesis model together with different transferability measures to inspect to what extent we can transfer testing results of semantic segmentation models from synthetic data to equivalent real-life data.
25, TITLE: Curiously Effective Features for Image Quality Prediction
AUTHORS: S�ren Becker ; Thomas Wiegand ; Sebastian Bosse
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In contrast to this, we find feature extractors constructed from random noise to be sufficient to learn a linear regression model whose quality predictions reach high correlations with human visual quality ratings, on par with a model with learned features.
26, TITLE: Plan2Scene: Converting Floorplans to 3D Scenes
AUTHORS: Madhawa Vidanapathirana ; Qirui Wu ; Yasutaka Furukawa ; Angel X. Chang ; Manolis Savva
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: We address the task of converting a floorplan and a set of associated photos of a residence into a textured 3D mesh model, a task which we call Plan2Scene. To train and evaluate our system we create indoor surface texture datasets, and augment a dataset of floorplans and photos from prior work with rectified surface crops and additional annotations.
27, TITLE: Face Mask Detection Using Convolution Neural Network
AUTHORS: Riya Shah Rutva Shah
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: This paper proposes a method to detect the face mask is put on or not for offices, or any other work place with a lot of people coming to work.
28, TITLE: Keeping Your Eye on The Ball: Trajectory Attention in Video Transformers
AUTHORS: MANDELA PATRICK et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we propose a new drop-in block for video transformers -- trajectory attention -- that aggregates information along implicitly determined motion paths.
29, TITLE: Implicit-PDF: Non-Parametric Representation of Probability Distributions on The Rotation Manifold
AUTHORS: Kieran Murphy ; Carlos Esteves ; Varun Jampani ; Srikumar Ramalingam ; Ameesh Makadia
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we introduce a method to estimate arbitrary, non-parametric distributions on SO(3). This is the most general way of representing distributions on manifolds, and to showcase the rich expressive power, we introduce a dataset of challenging symmetric and nearly-symmetric objects.
30, TITLE: DUET: Detection Utilizing Enhancement for Text in Scanned or Captured Documents
AUTHORS: EUN-SOO JUNG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: We present a novel deep neural model for text detection in document images.
31, TITLE: To The Point: Correspondence-driven Monocular 3D Category Reconstruction
AUTHORS: Filippos Kokkinos ; Iasonas Kokkinos
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present To The Point (TTP), a method for reconstructing 3D objects from a single image using 2D to 3D correspondences learned from weak supervision.
32, TITLE: Learning to See By Looking at Noise
AUTHORS: Manel Baradad ; Jonas Wulff ; Tongzhou Wang ; Phillip Isola ; Antonio Torralba
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper we go a step further and ask if we can do away with real image datasets entirely, instead learning from noise processes.
33, TITLE: Space-time Mixing Attention for Video Transformer
AUTHORS: Adrian Bulat ; Juan-Manuel Perez-Rua ; Swathikiran Sudhakaran ; Brais Martinez ; Georgios Tzimiropoulos
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this work, we propose a Video Transformer model the complexity of which scales linearly with the number of frames in the video sequence and hence induces \textit{no overhead} compared to an image-based Transformer model.
34, TITLE: MST: Masked Self-Supervised Transformer for Visual Representation
AUTHORS: ZHAOWEN LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present a novel Masked Self-supervised Transformer approach named MST, which can explicitly capture the local context of an image while preserving the global semantic information.
35, TITLE: Dynamics-Regulated Kinematic Policy for Egocentric Pose Estimation
AUTHORS: Zhengyi Luo ; Ryo Hachiuma ; Ye Yuan ; Kris Kitani
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.RO]
HIGHLIGHT: We propose a method for object-aware 3D egocentric pose estimation that tightly integrates kinematics modeling, dynamics modeling, and scene object information.
36, TITLE: Deep Neural Network Loses Attention to Adversarial Images
AUTHORS: Shashank Kotyan ; Danilo Vasconcellos Vargas
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: Adversarial algorithms have shown to be effective against neural networks for a variety of tasks.
37, TITLE: Learning By Watching
AUTHORS: Jimuyang Zhang ; Eshed Ohn-Bar
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.RO]
HIGHLIGHT: Motivated by this key insight, we propose the Learning by Watching (LbW) framework which enables learning a driving policy without requiring full knowledge of neither the state nor expert actions.
38, TITLE: Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
AUTHORS: Wouter Van Gansbeke ; Simon Vandenhende ; Stamatios Georgoulis ; Luc Van Gool
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We show that learning additional invariances -- through the use of multi-scale cropping, stronger augmentations and nearest neighbors -- improves the representations.
39, TITLE: Unsupervised Co-part Segmentation Through Assembly
AUTHORS: Qingzhe Gao ; Bin Wang ; Libin Liu ; Baoquan Chen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose an unsupervised learning approach for co-part segmentation from images.
40, TITLE: Consistent Instance False Positive Improves Fairness in Face Recognition
AUTHORS: XINGKUN XU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a false positive rate penalty loss, which mitigates face recognition bias by increasing the consistency of instance False Positive Rate (FPR).
41, TITLE: Deep Implicit Surface Point Prediction Networks
AUTHORS: RAHUL VENKATESH et. al.
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: This paper presents a novel approach that models such surfaces using a new class of implicit representations called the closest surface-point (CSP) representation.
42, TITLE: Date Estimation in The Wild of Scanned Historical Photos: An Image Retrieval Approach
AUTHORS: Adri� Molina ; Pau Riba ; Lluis Gomez ; Oriol Ramos-Terrades ; Josep Llad�s
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents a novel method for date estimation of historical photographs from archival sources.
43, TITLE: Context-Free TextSpotter for Real-Time and Mobile End-to-End Text Detection and Recognition
AUTHORS: Ryota Yoshihashi ; Tomohiro Tanaka ; Kenji Doi ; Takumi Fujino ; Naoaki Yamashita
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we propose a text-spotting method that consists of simple convolutions and a few post-processes, named Context-Free TextSpotter.
44, TITLE: SVMA: A GAN-based Model for Monocular 3D Human Pose Estimation
AUTHORS: Yicheng Deng ; Yongqi Sun ; Jiahui Zhu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present an unsupervised GAN-based model to recover 3D human pose from 2D joint locations extracted from a single image.
45, TITLE: Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification
AUTHORS: Yang Liu ; Weifeng Zhang ; Chao Xiang ; Tu Zheng ; Deng Cai
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a new method Mutual Centralized Learning (MCL) to fully affiliate the two disjoint sets of dense features in a bidirectional paradigm.
46, TITLE: Spatially Invariant Unsupervised 3D Object Segmentation with Graph Neural Networks
AUTHORS: Tianyu Wang ; Kee Siong Ng ; Miaomiao Liu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we tackle the problem of unsupervised 3D object segmentation from a point cloud without RGB information.
47, TITLE: Enforcing Morphological Information in Fully Convolutional Networks to Improve Cell Instance Segmentation in Fluorescence Microscopy Images
AUTHORS: WILLARD ZAMORA-CARDENAS et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Within this framework, we propose a novel cell instance segmentation approach based on the well-known U-Net architecture.
48, TITLE: MiDeCon: Unsupervised and Accurate Fingerprint and Minutia Quality Assessment Based on Minutia Detection Confidence
AUTHORS: Philipp Terh�rst ; Andr� Boller ; Naser Damer ; Florian Kirchbuchner ; Arjan Kuijper
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a novel concept of assessing minutia and fingerprint quality based on minutia detection confidence (MiDeCon).
49, TITLE: Match What Matters: Generative Implicit Feature Replay for Continual Learning
AUTHORS: Kevin Thandiackal ; Tiziano Portenier ; Andrea Giovannini ; Maria Gabrani ; Orcun Goksel
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Following a similar direction, we propose GenIFeR (Generative Implicit Feature Replay) for class-incremental learning.
50, TITLE: Supervising The Transfer of Reasoning Patterns in VQA
AUTHORS: Corentin Kervadec ; Christian Wolf ; Grigory Antipov ; Moez Baccouche ; Madiha Nadri
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We propose a method for knowledge transfer based on a regularization term in our loss function, supervising the sequence of required reasoning operations.
51, TITLE: Cross-domain Contrastive Learning for Unsupervised Domain Adaptation
AUTHORS: RUI WANG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this work, we build upon contrastive self-supervised learning to align features so as to reduce the domain discrepancy between training and testing sets.
52, TITLE: Pivotal Tuning for Latent-based Editing of Real Images
AUTHORS: Daniel Roich ; Ron Mokady ; Amit H. Bermano ; Daniel Cohen-Or
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present an approach to bridge this gap.
53, TITLE: Multi-Dataset Benchmarks for Masked Identification Using Contrastive Representation Learning
AUTHORS: Sachith Seneviratne ; Nuran Kasthuriaarachchi ; Sanka Rasnayaka
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: To address this unique requirement presented due to the current circumstance, we propose a set of re-purposed datasets and a benchmark for researchers to use.
54, TITLE: Tensor Feature Hallucination for Few-shot Learning
AUTHORS: Michalis Lazarou ; Yannis Avrithis ; Tania Stathaki
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We follow a different approach and investigate how a simple and straightforward synthetic data generation method can be used effectively.
55, TITLE: Progressive Stage-wise Learning for Unsupervised Feature Representation Enhancement
AUTHORS: ZEFAN LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we explore new dimensions of unsupervised learning by proposing the Progressive Stage-wise Learning (PSL) framework.
56, TITLE: Hierarchical Agglomerative Graph Clustering in Nearly-Linear Time
AUTHORS: Laxman Dhulipala ; David Eisenstat ; Jakub ??cki ; Vahab Mirrokni ; Jessica Shi
CATEGORY: cs.DS [cs.DS, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: We define an algorithmic framework for hierarchical agglomerative graph clustering that provides the first efficient $\tilde{O}(m)$ time exact algorithms for classic linkage measures, such as complete- and WPGMA-linkage, as well as other measures.
57, TITLE: Beyond BatchNorm: Towards A General Understanding of Normalization in Deep Learning
AUTHORS: Ekdeep Singh Lubana ; Robert P. Dick ; Hidenori Tanaka
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this work, we take a first step towards this goal by extending known properties of BatchNorm in randomly initialized deep neural networks (DNNs) to nine recently proposed normalization layers.
58, TITLE: Optimizing Reusable Knowledge for Continual Learning Via Metalearning
AUTHORS: Julio Hurtado ; Alain Raymond-Saez ; Alvaro Soto
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: To address this issue, we propose MetA Reusable Knowledge or MARK, a new method that fosters weight reusability instead of overwriting when learning a new task.
59, TITLE: 3D Semantic Mapping from Arthroscopy Using Out-of-distribution Pose and Depth and In-distribution Segmentation Training
AUTHORS: YAQUB JONMOHAMADI et. al.
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: In this paper, we propose the first 3D semantic mapping system from knee arthroscopy that solves the three challenges above.
60, TITLE: Quantized Conditional COT-GAN for Video Prediction
AUTHORS: Tianlin Xu ; Beatrice Acciaio
CATEGORY: stat.ML [stat.ML, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: Relying on Xu et al. (2020), the contribution of the present paper is twofold.
61, TITLE: CALTeC: Content-Adaptive Linear Tensor Completion for Collaborative Intelligence
AUTHORS: Ashiv Dhondea ; Robert A. Cohen ; Ivan V. Baji?
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper we propose a method called Content-Adaptive Linear Tensor Completion (CALTeC) to recover the missing feature data.