本专栏是计算机视觉方向论文收集积累,时间:2021年5月28日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: Blind Motion Deblurring Super-Resolution: When Dynamic Spatio-Temporal Learning Meets Static Image Understanding
AUTHORS: WENJIA NIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Blind Motion Deblurring Super-Resolution: When Dynamic Spatio-Temporal Learning Meets Static Image Understanding
2, TITLE: The Imaginative Generative Adversarial Network: Automatic Data Augmentation for Dynamic Skeleton-Based Hand Gesture and Human Action Recognition
AUTHORS: Junxiao Shen ; John Dudley ; Per Ola Kristensson
CATEGORY: cs.CV [cs.CV, cs.HC]
HIGHLIGHT: In this paper, we present a novel automatic data augmentation model, the Imaginative Generative Adversarial Network (GAN) that approximates the distribution of the input data and samples new data from this distribution.
3, TITLE: Pose2Drone: A Skeleton-Pose-based Framework ForHuman-Drone Interaction
AUTHORS: ZDRAVKO MARINOV et. al.
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: In this paper, we introduce an HDI framework building upon skeleton-based pose estimation. To perform comprehensive experiments and quantitative analysis, we create a customized testing dataset.
4, TITLE: Dynamic Network Selection for The Object Detection Task: Why It Matters and What We (didn't) Achieve
AUTHORS: Emanuele Vitali ; Anton Lokhmotov ; Gianluca Palermo
CATEGORY: cs.CV [cs.CV, cs.DC]
HIGHLIGHT: In this paper, we want to show the potential benefit of a dynamic auto-tuning approach for the inference process in the Deep Neural Network (DNN) context, tackling the object detection challenge.
5, TITLE: Multi-Modal Semantic Inconsistency Detection in Social Media News Posts
AUTHORS: Scott McCrae ; Kehan Wang ; Avideh Zakhor
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we introduce a novel classification architecture for identifying semantic inconsistencies between video appearance and text caption in social media news posts.
6, TITLE: I3dLoc: Image-to-range Cross-domain Localization Robust to Inconsistent Environmental Conditions
AUTHORS: Peng Yin ; Lingyun Xu ; Ji Zhang ; Sebastian Scherer
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: We present a method for localizing a single camera with respect to a point cloud map in indoor and outdoor scenes.
7, TITLE: SSAN: Separable Self-Attention Network for Video Representation Learning
AUTHORS: Xudong Guo ; Xun Guo ; Yan Lu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a separable self-attention (SSA) module, which models spatial and temporal correlations sequentially, so that spatial contexts can be efficiently used in temporal modeling.
8, TITLE: 3D Segmentation Learning from Sparse Annotations and Hierarchical Descriptors
AUTHORS: Peng Yin ; Lingyun Xu ; Jianmin Ji
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: To alleviate manual efforts, we propose GIDSeg, a novel approach that can simultaneously learn segmentation from sparse annotations via reasoning global-regional structures and individual-vicinal properties.
9, TITLE: When Liebig's Barrel Meets Facial Landmark Detection: A Practical Model
AUTHORS: Haibo Jin ; Jinpeng Li ; Shengcai Liao ; Ling Shao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In recent years, significant progress has been made in the research of facial landmark detection.
10, TITLE: RSCA: Real-time Segmentation-based Context-Aware Scene Text Detection
AUTHORS: Jiachen Li ; Yuan Lin ; Rongrong Liu ; Chiu Man Ho ; Humphrey Shi
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose RSCA: a Real-time Segmentation-based Context-Aware model for arbitrary-shaped scene text detection, which sets a strong baseline for scene text detection with two simple yet effective strategies: Local Context-Aware Upsampling and Dynamic Text-Spine Labeling, which model local spatial transformation and simplify label assignments separately.
11, TITLE: Benchmarking Scientific Image Forgery Detectors
AUTHORS: Jo�o P. Cardenuto ; Anderson Rocha
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To mitigate this bottleneck, we present an extendable open-source library that reproduces the most common image forgery operations reported by the research integrity community: duplication, retouching, and cleaning. Using this library and realistic scientific images, we create a large scientific forgery image benchmark (39,423 images) with an enriched ground-truth.
12, TITLE: Stylizing 3D Scene Via Implicit Representation and HyperNetwork
AUTHORS: Pei-Ze Chiang ; Meng-Shiun Tsai ; Hung-Yu Tseng ; Wei-sheng Lai ; Wei-Chen Chiu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we aim to address the 3D scene stylization problem - generating stylized images of the scene at arbitrary novel view angles.
13, TITLE: DSLR: Dynamic to Static LiDAR Scan Reconstruction Using Adversarially Trained Autoencoder
AUTHORS: PRASHANT KUMAR et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.RO]
HIGHLIGHT: Using Unsupervised Domain Adaptation, we propose DSLR-UDA for transfer to real world data and experimentally show that this performs well in real world settings.
14, TITLE: How Saccadic Vision Might Help with Theinterpretability of Deep Networks
AUTHORS: Iana Sereda ; Grigory Osipov
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We describe how some problems (interpretability,lack of object-orientedness) of modern deep networks potentiallycould be solved by adapting a biologically plausible saccadicmechanism of perception.
15, TITLE: YOLO5Face: Why Reinventing A Face Detector
AUTHORS: Delong Qi ; Weijun Tan ; Qi Yao ; Jingfeng Liu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We design detectors with different model sizes, from a large model to achieve the best performance, to a super small model for real-time detection on an embedded or mobile device.
16, TITLE: Unsupervised Adaptive Semantic Segmentation with Local Lipschitz Constraint
AUTHORS: Guanyu Cai ; Lianghua He
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To solve these problems, we propose a two-stage adaptive semantic segmentation method based on the local Lipschitz constraint that satisfies both domain alignment and domain-specific exploration under a unified principle.
17, TITLE: ICDAR 2021 Competition on Historical Map Segmentation
AUTHORS: JOSEPH CHAZALON et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents the final results of the ICDAR 2021 Competition on Historical Map Segmentation (MapSeg), encouraging research on a series of historical atlases of Paris, France, drawn at 1/5000 scale between 1894 and 1937.
18, TITLE: An Online Learning System for Wireless Charging Alignment Using Surround-view Fisheye Cameras
AUTHORS: Ashok Dahal ; Varun Ravi Kumar ; Senthil Yogamani ; Ciaran Eising
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a system based on the surround-view camera architecture to detect, localize and automatically align the vehicle with the inductive chargepad.
19, TITLE: An Efficient Style Virtual Try on Network
AUTHORS: Shanchen Pang ; Xixi Tao ; Yukun Dong
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper proposed a new stylized virtual try on network, which can not only retain the authenticity of clothing texture and pattern, but also obtain the undifferentiated stylized try on.
20, TITLE: Self-Ensembling Contrastive Learning for Semi-Supervised Medical Image Segmentation
AUTHORS: Jinxi Xiang ; Zhuowei Li ; Wenji Wang ; Qing Xia ; Shaoting Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we aim to boost the performance of semi-supervised learning for medical image segmentation with limited labels using a self-ensembling contrastive learning technique.
21, TITLE: Image-Based Plant Wilting Estimation
AUTHORS: CHANGYE YANG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we examine plant wilting caused by bacterial infection.
22, TITLE: A Dataset for Provident Vehicle Detection at Night
AUTHORS: SASCHA SARALAJEW et. al.
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: In this paper, we study the problem of how to map this intuitive human behavior to computer vision algorithms to detect oncoming vehicles at night just from the light reflections they cause by their headlights. For that, we present an extensive open-source dataset containing 59746 annotated grayscale images out of 346 different scenes in a rural environment at night.
23, TITLE: Using Early-Learning Regularization to Classify Real-World Noisy Data
AUTHORS: Alessio Galatolo ; Alfred Nilsson ; Roderick Karlemstrand ; Yineng Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Liu et al. propose a technique called Early-Learning Regularization, which improves accuracy on the CIFAR datasets when label noise is present.
24, TITLE: Issues in Object Detection in Videos Using Common Single-Image CNNs
AUTHORS: Spencer Ploeger ; Lucas Dasovic
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: A method is proposed that can generate these datasets.
25, TITLE: ViPTT-Net: Video Pretraining of Spatio-temporal Model for Tuberculosis Type Classification from Chest CT Scans
AUTHORS: Hasib Zunair ; Aimon Rahman ; Nabeel Mohammed
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: To incorporate both spatial and temporal features, we develop a hybrid convolutional neural network (CNN) and recurrent neural network (RNN) model, where the features are extracted from each axial slice of the CT scan by a CNN, these sequence of image features are input to a RNN for classification of the CT scan.
26, TITLE: DFPN: Deformable Frame Prediction Network
AUTHORS: M. Ak?n Y?lmaz ; A. Murat Tekalp
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: To this effect, we propose a deformable frame prediction network (DFPN) for task oriented implicit motion modeling and next frame prediction.
27, TITLE: PSRR-MaxpoolNMS: Pyramid Shifted MaxpoolNMS with Relationship Recovery
AUTHORS: Tianyi Zhang ; Jie Lin ; Peng Hu ; Bin Zhao ; Mohamed M. Sabry Aly
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a general, parallelizable and configurable approach PSRR-MaxpoolNMS, to completely replace GreedyNMS at all stages in all detectors.
28, TITLE: Joint-DetNAS: Upgrade Your Detector with NAS, Pruning and Dynamic Distillation
AUTHORS: LEWEI YAO et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We propose Joint-DetNAS, a unified NAS framework for object detection, which integrates 3 key components: Neural Architecture Search, pruning, and Knowledge Distillation.
29, TITLE: Tracking Without Re-recognition in Humans and Machines
AUTHORS: DREW LINSLEY et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: For this, we introduce PathTracker, a synthetic visual challenge that asks human observers and machines to track a target object in the midst of identical-looking "distractor" objects.
30, TITLE: Unsupervised Activity Segmentation By Joint Representation Learning and Online Clustering
AUTHORS: SATEESH KUMAR et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a novel approach for unsupervised activity segmentation, which uses video frame clustering as a pretext task and simultaneously performs representation learning and online clustering.
31, TITLE: Feature Reuse and Fusion for Real-time Semantic Segmentation
AUTHORS: Tan Sixiang
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We have conducted extensive experiments on two semantic segmentation benchmarks.
32, TITLE: Cofga: A Dataset for Fine Grained Classification of Objects from Aerial Imagery
AUTHORS: Eran Dahan ; Tzvi Diskin ; Amit Amram ; Amit Moryossef ; Omer Koren
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we introduce COFGA a new open dataset for the advancement of fine-grained classification research.
33, TITLE: Computer Vision and Conflicting Values: Describing People with Automated Alt Text
AUTHORS: Margot Hanley ; Solon Barocas ; Karen Levy ; Shiri Azenkot ; Helen Nissenbaum
CATEGORY: cs.CY [cs.CY, cs.AI, cs.CV]
HIGHLIGHT: In this paper, we investigate the ethical dilemmas faced by companies that have adopted the use of computer vision for producing alt text: textual descriptions of images for blind and low vision people, We use Facebook's automatic alt text tool as our primary case study.
34, TITLE: Passing Multi-Channel Material Textures to A 3-Channel Loss
AUTHORS: Thomas Chambon ; Eric Heitz ; Laurent Belcour
CATEGORY: cs.GR [cs.GR, cs.CV, I.3.7; I.2.10]
HIGHLIGHT: Our objective is to compute a textural loss that can be used to train texture generators with multiple material channels typically used for physically based rendering such as albedo, normal, roughness, metalness, ambient occlusion, etc.
35, TITLE: Graph-Based Deep Learning for Medical Diagnosis and Analysis: Past, Present and Future
AUTHORS: David Ahmedt-Aristizabal ; Mohammad Ali Armin ; Simon Denman ; Clinton Fookes ; Lars Petersson
CATEGORY: cs.LG [cs.LG, cs.CV, q-bio.QM]
HIGHLIGHT: In this survey, we thoroughly review the different types of graph architectures and their applications in healthcare.
36, TITLE: Drawing Multiple Augmentation Samples Per Image During Training Efficiently Decreases Test Error
AUTHORS: Stanislav Fort ; Andrew Brock ; Razvan Pascanu ; Soham De ; Samuel L. Smith
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this work, we provide a detailed empirical evaluation of how the number of augmentation samples per unique image influences performance on held out data.
37, TITLE: Robust Navigation for Racing Drones Based on Imitation Learning and Modularization
AUTHORS: Tianqi Wang ; Dong Eui Chang
CATEGORY: cs.RO [cs.RO, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: This paper presents a vision-based modularized drone racing navigation system that uses a customized convolutional neural network (CNN) for the perception module to produce high-level navigation commands and then leverages a state-of-the-art planner and controller to generate low-level control commands, thus exploiting the advantages of both data-based and model-based approaches.
38, TITLE: HDRUNet: Single Image HDR Reconstruction with Denoising and Dequantization
AUTHORS: Xiangyu Chen ; Yihao Liu ; Zhengwen Zhang ; Yu Qiao ; Chao Dong
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this work, we propose a novel learning-based approach using a spatially dynamic encoder-decoder network, HDRUNet, to learn an end-to-end mapping for single image HDR reconstruction with denoising and dequantization.
39, TITLE: Efficient High-Resolution Image-to-Image Translation Using Multi-Scale Gradient U-Net
AUTHORS: Kumarapu Laxman ; Shiv Ram Dubey ; Baddam Kalyan ; Satya Raj Vineel Kojjarapu
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose a Multi-Scale Gradient based U-Net (MSG U-Net) model for high-resolution image-to-image translation up to 2048X1024 resolution.
40, TITLE: Cardiac Segmentation on CT Images Through Shape-Aware Contour Attentions
AUTHORS: Sanguk Park ; Minyoung Chung
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, to improve the segmentation accuracy between proximate organs, we introduce a novel model to exploit shape and boundary-aware features.