本专栏是计算机视觉方向论文收集积累,时间:2021年7月5日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: MMF: Multi-Task Multi-Structure Fusion for Hierarchical Image Classification
AUTHORS: Xiaoni Li ; Yucan Zhou ; Yu Zhou ; Weiping Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we consider that different label structures provide a variety of prior knowledge for category recognition, thus fusing them is helpful to achieve better hierarchical classification results.
2, TITLE: Polarized Self-Attention: Towards High-quality Pixel-wise Regression
AUTHORS: Huajun Liu ; Fuqiang Liu ; Xinyi Fan ; Dong Huang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present the Polarized Self-Attention(PSA) block that incorporates two critical designs towards high-quality pixel-wise regression: (1) Polarized filtering: keeping high internal resolution in both channel and spatial attention computation while completely collapsing input tensors along their counterpart dimensions.
3, TITLE: UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation
AUTHORS: Yunhe Gao ; Mu Zhou ; Dimitris Metaxas
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this study, we present UTNet, a simple yet powerful hybrid Transformer architecture that integrates self-attention into a convolutional neural network for enhancing medical image segmentation.
4, TITLE: HandVoxNet++: 3D Hand Shape and Pose Estimation Using Voxel-Based Neural Networks
AUTHORS: JAMEEL MALIK et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this journal extension of our previous approach presented at CVPR 2020, we gain 41.09% and 13.7% higher shape alignment accuracy on SynHand5M and HANDS19 datasets, respectively.
5, TITLE: Passing A Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech
AUTHORS: Manuel Rebol ; Christian G�tl ; Krzysztof Pietroszek
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel, data-driven technique for generating gestures directly from speech. We create a large dataset which consists of speech and corresponding gestures in a 3D human pose format from which our model learns the speaker-specific correlation.
6, TITLE: Parasitic Egg Detection and Classification in Low-cost Microscopic Images Using Transfer Learning
AUTHORS: THANAPHON SUWANNAPHONG et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we propose a CNN-based technique using transfer learning strategy to enhance the efficiency of automatic parasite classification in poor-quality microscopic images.
7, TITLE: Evaluating The Usefulness of Unsupervised Monitoring in Cultural Heritage Monuments
AUTHORS: CHARALAMPOS ZAFEIROPOULOS et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we scrutinize the effectiveness of various clustering techniques, investigating their applicability in Cultural Heritage monitoring applications.
8, TITLE: Visual Relationship Forecasting in Videos
AUTHORS: Li Mi ; Yangjun Ou ; Zhenzhong Chen
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner. To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series of spatio-temporally localized visual relation annotations in a video.
9, TITLE: Ultrasound Video Transformers for Cardiac Ejection Fraction Estimation
AUTHORS: HADRIEN REYNAUD et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a novel approach to ultrasound video analysis using a transformer architecture based on a Residual Auto-Encoder Network and a BERT model adapted for token classification.
10, TITLE: Magnification-independent Histopathological Image Classification with Similarity-based Multi-scale Embeddings
AUTHORS: Yibao Sun ; Xingru Huang ; Yaqi Wang ; Huiyu Zhou ; Qianni Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To exploit this fact, we propose an approach that learns similarity-based multi-scale embeddings (SMSE) for magnification-independent histopathological image classification.
11, TITLE: How Incomplete Is Contrastive Learning? AnInter-intra Variant Dual Representation Method ForSelf-supervised Video Recognition
AUTHORS: Lin Zhang ; Qi She ; Zhengyang Shen ; Changhu Wang
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we find that existing contrastive learning based solutions for self-supervised video recognition focus on inter-variance encoding but ignore the intra-variance existing in clips within the same video.
12, TITLE: NTIRE 2021 Multi-modal Aerial View Object Classification Challenge
AUTHORS: JERRICK LIU et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR.
13, TITLE: Unsupervised Single Image Super-resolution Under Complex Noise
AUTHORS: Zongsheng Yue ; Qian Zhao ; Jianwen Xie ; Lei Zhang ; Deyu Meng
CATEGORY: cs.CV [cs.CV, I.4.4]
HIGHLIGHT: To address these issues, this paper proposes a model-based unsupervised SISR method to deal with the general SISR task with unknown degradations.
14, TITLE: Sub-millisecond Video Synchronization of Multiple Android Smartphones
AUTHORS: Azat Akhmetyanov ; Anastasiia Kornilova ; Marsel Faizullin ; David Pozo ; Gonzalo Ferrer
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a generalized mathematical model of timestamping for Android smartphones and prove its applicability on 47 different physical devices.
15, TITLE: Comparison of End-to-end Neural Network Architectures and Data Augmentation Methods for Automatic Infant Motility Assessment Using Wearable Sensors
AUTHORS: Manu Airaksinen ; Sampsa Vanhatalo ; Okko R�s�nen
CATEGORY: cs.CV [cs.CV, cs.HC]
HIGHLIGHT: This study investigates the use of different end-to-end neural network architectures for processing infant motility data from wearable sensors.
16, TITLE: Unsupervised Image Segmentation By Mutual Information Maximization and Adversarial Regularization
AUTHORS: S. Ehsan Mirsadeghi ; Ali Royat ; Hamid Rezatofighi
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel fully unsupervised semantic segmentation method, the so-called Information Maximization and Adversarial Regularization Segmentation (InMARS).
17, TITLE: Aerial Map-Based Navigation Using Semantic Segmentation and Pattern Matching
AUTHORS: Youngjoo Kim
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: This paper proposes a novel approach to map-based navigation system for unmanned aircraft.
18, TITLE: 1st Place Solutions for UG2+ Challenge 2021 -- (Semi-)supervised Face Detection in The Low Light Condition
AUTHORS: Pengcheng Wang ; Lingqiao Ji ; Zhilong Ji ; Yuan Gao ; Xiao Liu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this technical report, we briefly introduce the solution of our team "TAL-ai" for (Semi-) supervised Face detection in the low light condition in UG2+ Challenge in CVPR 2021.
19, TITLE: Cooperative Training and Latent Space Data Augmentation for Robust Medical Image Segmentation
AUTHORS: CHEN CHEN et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, q-bio.QM]
HIGHLIGHT: In this paper, we present a cooperative framework for training image segmentation models and a latent space augmentation method for generating hard examples.
20, TITLE: Optical Braille Recognition Using Circular Hough Transform
AUTHORS: Zeba Khanam ; Atiya Usmani
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This gap has fuelled researchers to propose Optical Braille Recognition techniques to convert Braille documents to natural language.
21, TITLE: Cross-view Geo-localization with Evolving Transformer
AUTHORS: Hongji Yang ; Xiufan Lu ; Yingying Zhu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we address the problem of cross-view geo-localization, which estimates the geospatial location of a street view image by matching it with a database of geo-tagged aerial images.
22, TITLE: Mixed Supervision Learning for Whole Slide Image Classification
AUTHORS: JIAHUI LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To handle those problems, we propose a mixed supervision learning framework for super high-resolution images to effectively utilize their various labels (e.g., sufficient image-level coarse annotations and a few pixel-level fine labels).
23, TITLE: MSN: Multi-Style Network for Trajectory Prediction
AUTHORS: Conghao Wong ; Beihao Xia ; Qinmu Peng ; Xinge You
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose the Multi-Style Network (MSN) to focus on this problem by divide agents' preference styles into several hidden behavior categories adaptively and train each category's prediction network separately, therefore giving agents all styles of predictions simultaneously.
24, TITLE: Blind Image Super-Resolution Via Contrastive Representation Learning
AUTHORS: Jiahui Zhang ; Shijian Lu ; Fangneng Zhan ; Yingchen Yu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: The recent blind SR studies address this issue via degradation estimation, but they do not generalize well to multi-source degradation and cannot handle spatially variant degradation.
25, TITLE: Intrinsic Image Transfer for Illumination Manipulation
AUTHORS: Junqing Huang ; Michael Ruzhansky ; Qianying Zhang ; Haihui Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents a novel intrinsic image transfer (IIT) algorithm for illumination manipulation, which creates a local image translation between two illumination surfaces.
26, TITLE: Ensemble of Loss Functions to Improve Generalizability of Deep Metric Learning Methods
AUTHORS: Davood Zabihzadeh
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, 68T07 (primary), 68T05, 68T45 (secondary), I.2.6]
HIGHLIGHT: To address these challenges, we propose novel approaches to combine different losses built on top of a shared deep feature extractor.
27, TITLE: A Survey on Deep Learning Technique for Video Segmentation
AUTHORS: Wenguan Wang ; Tianfei Zhou ; Fatih Porikli ; David Crandall ; Luc Van Gool
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this survey, we comprehensively review two basic lines of research in this area, i.e., generic object segmentation (of unknown categories) in videos and video semantic segmentation, by introducing their respective task settings, background concepts, perceived need, development history, and main challenges.
28, TITLE: Collaborative Visual Navigation
AUTHORS: Haiyang Wang ; Wenguan Wang ; Xizhou Zhu ; Jifeng Dai ; Liwei Wang
CATEGORY: cs.CV [cs.CV, cs.AI, cs.RO]
HIGHLIGHT: To narrow this gap and emphasize the crucial role of perception in MAS, we propose a large-scale 3D dataset, CollaVN, for multi-agent visual navigation (MAVN).
29, TITLE: HO-3D_v3: Improving The Accuracy of Hand-Object Annotations of The HO-3D Dataset
AUTHORS: Shreyas Hampali ; Sayan Deb Sarkar ; Vincent Lepetit
CATEGORY: cs.CV [cs.CV, cs.HC]
HIGHLIGHT: In this report, we elaborate on the improvements to the HOnnotate method and provide evaluations to compare the accuracy of HO-3D_v2 and HO-3D_v3.
30, TITLE: Audio-visual Attentive Fusion for Continuous Emotion Recognition
AUTHORS: Su Zhang ; Yi Ding ; Ziquan Wei ; Cuntai Guan
CATEGORY: cs.CV [cs.CV, cs.MM]
HIGHLIGHT: We propose an audio-visual spatial-temporal deep neural network with: (1) a visual block containing a pretrained 2D-CNN followed by a temporal convolutional network (TCN); (2) an aural block containing several parallel TCNs; and (3) a leader-follower attentive fusion block combining the audio-visual information.
31, TITLE: Long-Short Ensemble Network for Bipolar Manic-Euthymic State Recognition Based on Wrist-worn Sensors
AUTHORS: ULYSSE C�T�-ALLARD et. al.
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: This paper proposes a new deep learning-based ensemble method leveraging long (20h) and short (5 minutes) time-intervals to discriminate between the mood-states.
32, TITLE: Mitigating Uncertainty of Classifier for Unsupervised Domain Adaptation
AUTHORS: Shanu Kumar ; Vinod Kumar Kurmi ; Praphul Singh ; Vinay P Namboodiri
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this paper, we thoroughly examine the role of a classifier in terms of matching source and target distributions.
33, TITLE: ResIST: Layer-Wise Decomposition of ResNets for Distributed Training
AUTHORS: Chen Dun ; Cameron R. Wolfe ; Christopher M. Jermaine ; Anastasios Kyrillidis
CATEGORY: cs.LG [cs.LG, cs.CV, cs.DC, math.OC]
HIGHLIGHT: We propose {\rm \texttt{ResIST}}, a novel distributed training protocol for Residual Networks (ResNets).
34, TITLE: Rapid Neural Architecture Search By Learning to Generate Graphs from Datasets
AUTHORS: Hayeon Lee ; Eunyoung Hyung ; Sung Ju Hwang
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this paper, we propose an efficient NAS framework that is trained once on a database consisting of datasets and pretrained networks and can rapidly search for a neural architecture for a novel dataset.
35, TITLE: SIMILAR: Submodular Information Measures Based Active Learning In Realistic Scenarios
AUTHORS: Suraj Kothawade ; Nathan Beck ; Krishnateja Killamsetty ; Rishabh Iyer
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this work, we propose SIMILAR (Submodular Information Measures based actIve LeARning), a unified active learning framework using recently proposed submodular information measures (SIM) as acquisition functions.
36, TITLE: Overcoming Obstructions Via Bandwidth-Limited Multi-Agent Spatial Handshaking
AUTHORS: Nathaniel Glaser ; Yen-Cheng Liu ; Junjiao Tian ; Zsolt Kira
CATEGORY: cs.RO [cs.RO, cs.CV, cs.MA]
HIGHLIGHT: In this paper, we address bandwidth-limited and obstruction-prone collaborative perception, specifically in the context of multi-agent semantic segmentation.
37, TITLE: Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions
AUTHORS: Motonari Kambara ; Komei Sugiura
CATEGORY: cs.RO [cs.RO, cs.CL, cs.CV]
HIGHLIGHT: In this paper, our aim is to augment the datasets based on a crossmodal language generation model.
38, TITLE: Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots
AUTHORS: Shintaro Ishikawa ; Komei Sugiura
CATEGORY: cs.RO [cs.RO, cs.CL, cs.CV]
HIGHLIGHT: In this paper, we propose Target-dependent UNITER, which learns the relationship between the target object and other objects directly by focusing on the relevant regions within an image, rather than the whole image.
39, TITLE: Enhancing Multi-Robot Perception Via Learned Data Association
AUTHORS: Nathaniel Glaser ; Yen-Cheng Liu ; Junjiao Tian ; Zsolt Kira
CATEGORY: cs.RO [cs.RO, cs.CV, cs.MA]
HIGHLIGHT: In this paper, we address the multi-robot collaborative perception problem, specifically in the context of multi-view infilling for distributed semantic segmentation.
40, TITLE: Simpler, Faster, Stronger: Breaking The Log-K Curse On Contrastive Learners With FlatNCE
AUTHORS: JUNYA CHEN et. al.
CATEGORY: stat.ML [stat.ML, cs.AI, cs.CV, cs.IT, cs.LG, math.IT]
HIGHLIGHT: In this work, we reveal mathematically why contrastive learners fail in the small-batch-size regime, and present a novel simple, non-trivial contrastive objective named FlatNCE, which fixes this issue.
41, TITLE: On Measuring and Controlling The Spectral Bias of The Deep Image Prior
AUTHORS: Zenglin Shi ; Pascal Mettes ; Subhransu Maji ; Cees G. M. Snoek
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: We present a Lipschitz-controlled approach for the convolution and a Gaussian-controlled approach for the upsampling layer.
42, TITLE: LensID: A CNN-RNN-Based Framework Towards Lens Irregularity Detection in Cataract Surgery Videos
AUTHORS: NEGIN GHAMSARIAN et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose a novel framework as the major step towards lens irregularity detection.