本专栏是计算机视觉方向论文收集积累,时间:2021年6月29日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: The Deep Neural Network Based Photometry Framework for Wide Field Small Aperture Telescopes
AUTHORS: Peng Jia ; Yongyang Sun ; Qiang Liu
CATEGORY: astro-ph.IM [astro-ph.IM, astro-ph.GA, astro-ph.SR, cs.CV]
HIGHLIGHT: The photometry framework proposed in this paper could be used as an end--to--end quick data processing framework for WFSATs, which can further increase response speed and scientific outputs of WFSATs.
2, TITLE: Visual Conceptual Blending with Large-scale Language and Vision Models
AUTHORS: Songwei Ge ; Devi Parikh
CATEGORY: cs.CL [cs.CL, cs.AI, cs.CV]
HIGHLIGHT: Given an arbitrary object, we identify a relevant object and generate a single-sentence description of the blend of the two using a language model.
3, TITLE: UMIC: An Unreferenced Metric for Image Captioning Via Contrastive Learning
AUTHORS: Hwanhee Lee ; Seunghyun Yoon ; Franck Dernoncourt ; Trung Bui ; Kyomin Jung
CATEGORY: cs.CL [cs.CL, cs.CV]
HIGHLIGHT: In this paper, we introduce a new metric UMIC, an Unreferenced Metric for Image Captioning which does not require reference captions to evaluate image captions. We release the benchmark dataset and pre-trained models to compute the UMIC.
4, TITLE: Exploring Temporal Context and Human Movement Dynamics for Online Action Detection in Videos
AUTHORS: Vasiliki I. Vasileiou ; Nikolaos Kardaris ; Petros Maragos
CATEGORY: cs.CV [cs.CV, cs.HC, cs.RO]
HIGHLIGHT: In this paper, based on the recently proposed framework of Temporal Recurrent Networks, we explore how temporal context and human movement dynamics can be effectively employed for online action detection.
5, TITLE: Semantics-aware Multi-modal Domain Translation:From LiDAR Point Clouds to Panoramic Color Images
AUTHORS: Tiago Cortinhal ; Fatih Kurnaz ; Eren Aksoy
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we present a simple yet effective framework to address the domain translation problem between different sensor modalities with unique data formats.
6, TITLE: Dataset Bias Mitigation Through Analysis of CNN Training Scores
AUTHORS: Ekberjan Derman
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we proposed a novel, domain-independent approach, called score-based resampling (SBR), to locate the under-represented samples of the original training dataset based on the model prediction scores obtained with that training set.
7, TITLE: Hyperspectral Remote Sensing Image Classification Based on Multi-scale Cross Graphic Convolution
AUTHORS: Yunsong Zhao ; Yin Li ; Zhihan Chen ; Tianchong Qiu ; Guojin Liu
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: To fully mine and utilize image features, a new multi-scale feature-mining learning algorithm (MGRNet) is proposed.
8, TITLE: In-N-Out: Towards Good Initialization for Inpainting and Outpainting
AUTHORS: Changho Jo ; Woobin Im ; Sung-Eui Yoon
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In experiments, we compare our method to the traditional procedure and analyze the effectiveness of our method on different applications: image inpainting, image extrapolation, and environment map estimation.
9, TITLE: Spectral-Spatial Graph Reasoning Network for Hyperspectral Image Classification
AUTHORS: Di Wang ; Bo Du ; Liangpei Zhang
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: In this paper, we propose a spectral-spatial graph reasoning network (SSGRN) for hyperspectral image (HSI) classification.
10, TITLE: Scene Uncertainty and The Wellington Posterior of Deterministic Image Classifiers
AUTHORS: Stephanie Tsuei ; Aditya Golatkar ; Stefano Soatto
CATEGORY: cs.CV [cs.CV, cs.LG, stat.ML]
HIGHLIGHT: We propose a method to estimate the uncertainty of the outcome of an image classifier on a given input datum.
11, TITLE: Real-Time Multi-View 3D Human Pose Estimation Using Semantic Feedback to Smart Edge Sensors
AUTHORS: Simon Bultmann ; Sven Behnke
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: We present a novel method for estimation of 3D human poses from a multi-camera setup, employing distributed smart edge sensors coupled with a backend through a semantic feedback loop.
12, TITLE: K-Net: Towards Unified Image Segmentation
AUTHORS: Wenwei Zhang ; Jiangmiao Pang ; Kai Chen ; Chen Change Loy
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: This paper presents a unified, simple, and effective framework for these essentially similar tasks.
13, TITLE: SDOF-Tracker: Fast and Accurate Multiple Human Tracking By Skipped-Detection and Optical-Flow
AUTHORS: Hitoshi Nishimura ; Satoshi Komorita ; Yasutomo Kawanishi ; Hiroshi Murase
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a method that complements the detection results with optical flow, based on the fact that someone's appearance does not change much between adjacent frames.
14, TITLE: ShapeEditer: A StyleGAN Encoder for Face Swapping
AUTHORS: Shuai Yang ; Kai Qiao
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we propose a novel encoder, called ShapeEditor, for high-resolution, realistic and high-fidelity face exchange. In addition, for learning to map into the latent space of StyleGAN, we propose a set of self-supervised loss functions with which the training data do not need to be labeled manually.
15, TITLE: Descriptive Modeling of Textiles Using FE Simulations and Deep Learning
AUTHORS: Arturo Mendoza ; Roger Trullo ; Yanneck Wielhorski
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work we propose a novel and fully automated method for extracting the yarn geometrical features in woven composites so that a direct parametrization of the textile reinforcement is achieved (e.g., FE mesh).
16, TITLE: Change Detection for Geodatabase Updating
AUTHORS: Rongjun Qin
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: This article aims to provide an overview the state-of-the-art change detection methods in the field of Remote Sensing and Geomatics to support the task of updating geodatabases.
17, TITLE: Learning Mesh Representations Via Binary Space Partitioning Tree Networks
AUTHORS: Zhiqin Chen ; Andrea Tagliasacchi ; Hao Zhang
CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG]
HIGHLIGHT: We overcome these challenges by employing a classical spatial data structure from computer graphics, Binary Space Partitioning (BSP), to facilitate 3D learning.
18, TITLE: Learning Without Forgetting for 3D Point Cloud Objects
AUTHORS: Townim Chowdhury ; Mahira Jalisha ; Ali Cheraghian ; Shafin Rahman
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, considering the growth of depth camera technology, we address the same problem for the 3D point cloud object data.
19, TITLE: 3D Reconstruction Through Fusion of Cross-View Images
AUTHORS: Rongjun Qin ; Shuang Song ; Xiao Ling ; Mostafa Elhashash
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this chapter, the authors utilize the imaging geometry and present approaches that perform 3D reconstruction from cross-view images that are drastically different in their viewpoints.
20, TITLE: Geometric Processing for Image-based 3D Object Modeling
AUTHORS: Rongjun Qin ; Xu Huang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This article summarizes the overall geometric processing workflow, with focuses on introducing the state-of-the-art methods of three major components of geometric processing: 1) geo-referencing; 2) Image dense matching 3) texture mapping.
21, TITLE: Saying The Unseen: Video Descriptions Via Dialog Agents
AUTHORS: Ye Zhu ; Yu Wu ; Yi Yang ; Yan Yan
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: As a step towards the more practical application scenarios, we introduce a novel task that aims to describe a video using the natural language dialog between two agents as a supplementary information source given incomplete visual data.
22, TITLE: Interflow: Aggregating Multi-layer Feature Mappings with Attention Mechanism
AUTHORS: Zhicheng Cai
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper proposes the Interflow algorithm specially for traditional CNN models.
23, TITLE: Blind Non-Uniform Motion Deblurring Using Atrous Spatial Pyramid Deformable Convolution and Deblurring-Reblurring Consistency
AUTHORS: Dong Huo ; Abbas Masoumzadeh ; Yee-Hong Yang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a new architecture which consists of multiple Atrous Spatial Pyramid Deformable Convolution (ASPDC) modules to deblur an image end-to-end with more flexibility.
24, TITLE: Identifying High Accuracy Regions in Traffic Camera Images to Enhance The Estimation of Road Traffic Metrics: A Quadtree Based Method
AUTHORS: Yue Lin ; Nningchuan Xiao
CATEGORY: cs.CV [cs.CV, cs.CY]
HIGHLIGHT: In this work, a quadtree based algorithm is developed to continuously partition the image extent until only regions with high detection accuracy are remained.
25, TITLE: Fast Computation of Mutual Information in The Frequency Domain with Applications to Global Multimodal Image Alignment
AUTHORS: Johan �fverstedt ; Joakim Lindblad ; Nata?a Sladoje
CATEGORY: cs.CV [cs.CV, 92C55, 94A08, 94A15, 94A17, 68U10, 68W01]
HIGHLIGHT: We propose an efficient algorithm for computing MI for all discrete displacements (formalized as the cross-mutual information function (CMIF)), which is based on cross-correlation computed in the frequency domain.
26, TITLE: Representation Based Regression for Object Distance Estimation
AUTHORS: Mete Ahishali ; Mehmet Yamac ; Serkan Kiranyaz ; Moncef Gabbouj
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: In this study, we propose a novel approach to predict the distances of the detected objects in an observed scene.
27, TITLE: Fractal Pyramid Networks
AUTHORS: Zhiqiang Deng ; Huimin Yu ; Yangqi Long
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We propose a new network architecture, the Fractal Pyramid Networks (PFNs) for pixel-wise prediction tasks as an alternative to the widely used encoder-decoder structure.
28, TITLE: Mining Atmospheric Data
AUTHORS: Chaabane Djeraba ; J�r�me Riedi
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: The first issue relates the building new public datasets and benchmarks, which are hot priority of the remote sensing community.
29, TITLE: CLIPDraw: Exploring Text-to-Drawing Synthesis Through Language-Image Encoders
AUTHORS: Kevin Frans ; L. B. Soros ; Olaf Witkowski
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This work presents CLIPDraw, an algorithm that synthesizes novel drawings based on natural language input.
30, TITLE: Iris Presentation Attack Detection By Attention-based and Deep Pixel-wise Binary Supervision Network
AUTHORS: Meiling Fang ; Naser Damer ; Fadi Boutros ; Florian Kirchbuchner ; Arjan Kuijper
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Hence, we propose a novel attention-based deep pixel-wise binary supervision (A-PBS) method.
31, TITLE: CAMS: Color-Aware Multi-Style Transfer
AUTHORS: Mahmoud Afifi ; Abdullah Abuolaim ; Mostafa Hussien ; Marcus A. Brubaker ; Michael S. Brown
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a color-aware multi-style transfer method that generates aesthetically pleasing results while preserving the style-color correlation between style and generated images.
32, TITLE: Dual-Stream Reciprocal Disentanglement Learning for Domain Adaption Person Re-Identification
AUTHORS: HUAFENG LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To tackle this problem, in this paper we propose a novel method named Dual-stream Reciprocal Disentanglement Learning (DRDL), which is quite efficient in learning domain-invariant features.
33, TITLE: Inverting and Understanding Object Detectors
AUTHORS: Ang Cao ; Justin Johnson
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose using inversion as a primary tool to understand modern object detectors and develop an optimization-based approach to layout inversion, allowing us to generate synthetic images recognized by trained detectors as containing a desired configuration of objects.
34, TITLE: Domain Adaptive YOLO for One-Stage Cross-Domain Detection
AUTHORS: Shizhao Zhang ; Hongya Tuo ; Jian Hu ; Zhongliang Jing
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, a novel Domain Adaptive YOLO (DA-YOLO) is proposed to improve cross-domain performance for one-stage detectors.
35, TITLE: Darker Than Black-Box: Face Reconstruction from Similarity Queries
AUTHORS: Anton Razzhigaev ; Klim Kireev ; Igor Udovichenko ; Aleksandr Petiushko
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a novel approach that allows reconstructing the face querying only similarity scores of the black-box model.
36, TITLE: Progressive Class-based Expansion Learning For Image Classification
AUTHORS: Hui Wang ; Hanbin Zhao ; Xi Li
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel image process scheme called class-based expansion learning for image classification, which aims at improving the supervision-stimulation frequency for the samples of the confusing classes.
37, TITLE: Multi-Compound Transformer for Accurate Biomedical Image Segmentation
AUTHORS: YUANFENG JI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we tackle the above issues by proposing a unified transformer network, termed Multi-Compound Transformer (MCTrans), which incorporates rich feature learning and semantic structure mining into a unified framework.
38, TITLE: Explicit Clothing Modeling for An Animatable Full-Body Avatar
AUTHORS: DONGLAI XIANG et. al.
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: To address the difficulties, we propose a method to build an animatable clothed body avatar with an explicit representation of the clothing on the upper body from multi-view captured videos.
39, TITLE: HDMapGen: A Hierarchical Graph Generative Model of High Definition Maps
AUTHORS: LU MI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we explore several autoregressive models using different data representations, including sequence, plain graph, and hierarchical graph.
40, TITLE: Early Convolutions Help Transformers See Better
AUTHORS: TETE XIAO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we conjecture that the issue lies with the patchify stem of ViT models, which is implemented by a stride-p pxp convolution (p=16 by default) applied to the input image.
41, TITLE: Rethinking Token-Mixing MLP for MLP-based Vision Backbone
AUTHORS: Tan Yu ; Xu Li ; Yunfeng Cai ; Mingming Sun ; Ping Li
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we re-think the design of the token-mixing MLP.
42, TITLE: Rail-5k: A Real-World Dataset for Rail Surface Defects Detection
AUTHORS: Zihao Zhang ; Shaozuo Yu ; Siwei Yang ; Yu Zhou ; Bingchen Zhao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents the Rail-5k dataset for benchmarking the performance of visual algorithms in a real-world application scenario, namely the rail surface defects detection task.
43, TITLE: EARLIN: Early Out-of-Distribution Detection for Resource-efficient Collaborative Inference
AUTHORS: Sumaiya Tabassum Nimi ; Md Adnan Arefeen ; Md Yusuf Sarwar Uddin ; Yugyung Lee
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel lightweight OOD detection approach that mines important features from the shallow layers of a pretrained CNN model and detects an input sample as ID (In-Distribution) or OOD based on a distance function defined on the reduced feature space.
44, TITLE: A CNN Segmentation-Based Approach to Object Detection and Tracking in Ultrasound Scans with Application to The Vagus Nerve Detection
AUTHORS: ABDULLAH F. AL-BATTAL et. al.
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: In this paper, we propose a deep learning framework to automatically detect and track a specific anatomical target structure in ultrasound scans.
45, TITLE: Recurrent Neural Network Transducer for Japanese and Chinese Offline Handwritten Text Recognition
AUTHORS: Trung Tan Ngo ; Hung Tuan Nguyen ; Nam Tuan Ly ; Masaki Nakagawa
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose an RNN-Transducer model for recognizing Japanese and Chinese offline handwritten text line images.
46, TITLE: Motion Projection Consistency Based 3D Human Pose Estimation with Virtual Bones from Monocular Videos
AUTHORS: Guangming Wang ; Honghao Zeng ; Ziliang Wang ; Zhe Liu ; Hesheng Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, the concept of virtual bones is proposed to solve such a challenge.
47, TITLE: Dizygotic Conditional Variational AutoEncoder for Multi-Modal and Partial Modality Absent Few-Shot Learning
AUTHORS: Yi Zhang ; Sheng Huang ; Xi Peng ; Dan Yang
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we present a novel multi-modal data augmentation approach named Dizygotic Conditional Variational AutoEncoder (DCVAE) for addressing the aforementioned issue.
48, TITLE: False Negative Reduction in Video Instance Segmentation Using Uncertainty Estimates
AUTHORS: Kira Maag
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we present a false negative detection method for image sequences based on inconsistencies in time series of tracked instances given the availability of image sequences in online applications.
49, TITLE: A More Compact Object Detector Head Network with Feature Enhancement and Relational Reasoning
AUTHORS: WEN CHAO ZHANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we propose a more compact object detector head network (CODH), which can not only preserve global context information and condense the information density, but also allows instance-wise feature enhancement and relational reasoning in a larger matrix space.
50, TITLE: Prior-Induced Information Alignment for Image Matting
AUTHORS: Yuhao Liu ; Jiake Xie ; Yu Qiao ; Yong Tang and ; Xin Yang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel network named Prior-Induced Information Alignment Matting Network (PIIAMatting), which can efficiently model the distinction of pixel-wise response maps and the correlation of layer-wise feature maps.
51, TITLE: VAT-Mart: Learning Visual Action Trajectory Proposals for Manipulating 3D ARTiculated Objects
AUTHORS: RUIHAI WU et. al.
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: In this paper, we propose object-centric actionable visual priors as a novel perception-interaction handshaking point that the perception system outputs more actionable guidance than kinematic structure estimation, by predicting dense geometry-aware, interaction-aware, and task-aware visual action affordance and trajectory proposals.
52, TITLE: Feature Combination Meets Attention: Baidu Soccer Embeddings and Transformer Based Temporal Detection
AUTHORS: Xin Zhou ; Le Kang ; Zhiyu Cheng ; Bo He ; Jingyu Xin
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this tech report, we present a two-stage paradigm to detect what and when events happen in soccer broadcast videos.
53, TITLE: Real-Time Human Pose Estimation on A Smart Walker Using Convolutional Neural Networks
AUTHORS: Manuel Palermo ; Sara Moccia ; Lucia Migliorelli ; Emanuele Frontoni ; Cristina P. Santos
CATEGORY: cs.CV [cs.CV, cs.HC]
HIGHLIGHT: We present a novel approach to patient monitoring and data-driven human-in-the-loop control in the context of smart walkers.
54, TITLE: OffRoadTranSeg: Semi-Supervised Segmentation Using Transformers on OffRoad Environments
AUTHORS: Anukriti Singh ; Kartikeya Singh ; P. B. Sujit
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: We present OffRoadTranSeg, the first end-to-end framework for semi-supervised segmentation in unstructured outdoor environment using transformers and automatic data selection for labelling.
55, TITLE: Building A Video-and-Language Dataset with Human Actions for Multimodal Logical Inference
AUTHORS: Riko Suzuki ; Hitomi Yanaka ; Koji Mineshima ; Daisuke Bekki
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper introduces a new video-and-language dataset with human actions for multimodal logical inference, which focuses on intentional and aspectual expressions that describe dynamic human actions.
56, TITLE: Multimodal Few-Shot Learning with Frozen Language Models
AUTHORS: MARIA TSIMPOUKELLI et. al.
CATEGORY: cs.CV [cs.CV, cs.CL, cs.LG]
HIGHLIGHT: Here, we present a simple, yet effective, approach for transferring this few-shot learning ability to a multimodal setting (vision and language).
57, TITLE: Semi-Supervised Raw-to-Raw Mapping
AUTHORS: Mahmoud Afifi ; Abdullah Abuolaim
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: Specifically, we present a semi-supervised raw-to-raw mapping method trained on a small set of paired images alongside an unpaired set of images captured by each camera device. We have generated a new dataset of raw images from two different smartphone cameras as part of this effort.
58, TITLE: Unsupervised Discovery of Actions in Instructional Videos
AUTHORS: AJ Piergiovanni ; Anelia Angelova ; Michael S. Ryoo ; Irfan Essa
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper we address the problem of automatically discovering atomic actions in unsupervised manner from instructional videos.
59, TITLE: Image Content Dependent Semi-fragile Watermarking with Localized Tamper Detection
AUTHORS: Samira Hosseini ; Mojtaba Mahdavi
CATEGORY: cs.CV [cs.CV, cs.MM]
HIGHLIGHT: In this paper to achieve the objectives of semi-fragile watermarking techniques, a method is proposed to not have the mentioned shortcomings.
60, TITLE: A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning
AUTHORS: Pan Zhou ; Caiming Xiong ; Xiao-Tong Yuan ; Steven Hoi
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, math.OC]
HIGHLIGHT: Inspired by this theory, we propose a novel self-labeling refinement approach for contrastive learning.
61, TITLE: Post-Training Quantization for Vision Transformer
AUTHORS: Zhenhua Liu ; Yunhe Wang ; Kai Han ; Siwei Ma ; Wen Gao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present an effective post-training quantization algorithm for reducing the memory storage and computational costs of vision transformers.
62, TITLE: Dataset and Benchmarking of Real-Time Embedded Object Detection for RoboCup SSL
AUTHORS: Roberto Fernandes ; Walber M. Rodrigues ; Edna Barros
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: This paper presents an open-source dataset to be used as a benchmark for real-time object detection in SSL.
63, TITLE: Hear Me Out: Fusional Approaches for Audio Augmented Temporal Action Localization
AUTHORS: Anurag Bagchi ; Jazib Mahmood ; Dolton Fernandes ; Ravi Kiran Sarvadevabhatla
CATEGORY: cs.CV [cs.CV, cs.MM]
HIGHLIGHT: In this paper, we propose simple but effective fusion-based approaches for TAL.
64, TITLE: One-Shot Affordance Detection
AUTHORS: Hongchen Luo ; Wei Zhai ; Jing Zhang ; Yang Cao ; Dacheng Tao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To empower robots with this ability in unseen scenarios, we consider the challenging one-shot affordance detection problem in this paper, i.e., given a support image that depicts the action purpose, all objects in a scene with the common affordance should be detected. Besides, we build a Purpose-driven Affordance Dataset (PAD) by collecting and labeling 4k images from 31 affordance and 72 object categories.
65, TITLE: Attention-guided Progressive Mapping for Profile Face Recognition
AUTHORS: Junyang Huang ; Changxing Ding
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present a method for progressively transforming profile face representations to the canonical pose with an attentive pair-wise loss.
66, TITLE: R2RNet: Low-light Image Enhancement Via Real-low to Real-normal Network
AUTHORS: JIANG HAI et. al.
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: In this paper, we propose a novel Real-low to Real-normal Network for low-light image enhancement, dubbed R2RNet, based on the Retinex theory, which includes three subnets: a Decom-Net, a Denoise-Net, and a Relight-Net. Unlike most previous methods trained on synthetic images, we collect the first Large-Scale Real-World paired low/normal-light images dataset (LSRW dataset) for training.
67, TITLE: Robust Pose Transfer with Dynamic Details Using Neural Video Rendering
AUTHORS: YANG-TIAN SUN et. al.
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: In this paper, we demonstrate that the dynamic details can be preserved even trained from short monocular videos.
68, TITLE: Semi-supervised Semantic Segmentation with Directional Context-aware Consistency
AUTHORS: XIN LAI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Therefore, in this paper, we focus on the semi-supervised segmentation problem where only a small set of labeled data is provided with a much larger collection of totally unlabeled images.
69, TITLE: Radar Voxel Fusion for 3D Object Detection
AUTHORS: Felix Nobis ; Ehsan Shafiei ; Phillip Karle ; Johannes Betz ; Markus Lienkamp
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper develops a low-level sensor fusion network for 3D object detection, which fuses lidar, camera, and radar data.
70, TITLE: The Story in Your Eyes: An Individual-difference-aware Model for Cross-person Gaze Estimation
AUTHORS: Jun Bao ; Buyu Liu ; Jun Yu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a novel method on refining cross-person gaze prediction task with eye/face images only by explicitly modelling the person-specific differences.
71, TITLE: Memory Guided Road Detection
AUTHORS: Praveen Venkatesh ; Rwik Rana ; Varun Jain
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose an architecture that allows us to increase the speed and robustness of road detection without a large hit in accuracy by introducing an underlying shared feature space that is propagated over time, which serves as a flowing dynamic memory.
72, TITLE: Generalized Zero-Shot Learning Using Multimodal Variational Auto-Encoder with Semantic Concepts
AUTHORS: Nihar Bendre ; Kevin Desai ; Peyman Najafirad
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: To overcome this problem, we propose a Multimodal Variational Auto-Encoder (M-VAE) which can learn the shared latent space of image features and the semantic space.
73, TITLE: DONet: Learning Category-Level 6D Object Pose and Size Estimation from Depth Observation
AUTHORS: HAITAO LIN et. al.
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image, without external pose-annotated real-world training data.
74, TITLE: Learning to Solve Geometric Construction Problems from Images
AUTHORS: J. Macke ; J. Sedlar ; M. Olsak ; J. Urban ; J. Sivic
CATEGORY: cs.CV [cs.CV, cs.AI, cs.CG, cs.LG, cs.LO]
HIGHLIGHT: We describe a purely image-based method for finding geometric constructions with a ruler and compass in the Euclidea geometric game.
75, TITLE: Mitigating Severe Over-parameterization in Deep Convolutional Neural Networks Through Forced Feature Abstraction and Compression with An Entropy-based Heuristic
AUTHORS: Nidhi Gowdra ; Roopak Sinha ; Stephen MacDonell ; Wei Qi Yan
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we propose an Entropy-Based Convolutional Layer Estimation (EBCLE) heuristic which is robust and simple, yet effective in resolving the problem of over-parameterization with regards to network depth of CNN model.
76, TITLE: DenseTNT: Waymo Open Dataset Motion Prediction Challenge 1st Place Solution
AUTHORS: Junru Gu ; Qiao Sun ; Hang Zhao
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: In this work, we propose an anchor-free model, named DenseTNT, which performs dense goal probability estimation for trajectory prediction.
77, TITLE: A Diffeomorphic Aging Model for Adult Human Brain from Cross-Sectional Data
AUTHORS: Alphin J Thottupattu ; Jayanthi Sivaswamy ; Venkateswaran P. Krishnan
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a method to develop an aging model for a given population, in the absence of longitudinal data, by using images from different subjects at different time points, the so-called cross-sectional data.
78, TITLE: Few-Shot Domain Expansion for Face Anti-Spoofing
AUTHORS: Bowen Yang ; Jing Zhang ; Zhenfei Yin ; Jing Shao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address the problem, this paper proposes a Style transfer-based Augmentation for Semantic Alignment (SASA) framework.
79, TITLE: Indoor Panorama Planar 3D Reconstruction Via Divide and Conquer
AUTHORS: Cheng Sun ; Chi-Wei Hsiao ; Ning-Hsu Wang ; Min Sun ; Hwann-Tzong Chen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We thus propose a yaw-invariant V-planar reparameterization for CNNs to learn. We create a benchmark for indoor panorama planar reconstruction by extending existing 360 depth datasets with ground truth H\&V-planes (referred to as PanoH&V dataset) and adopt state-of-the-art planar reconstruction methods to predict H\&V-planes as our baselines.
80, TITLE: Privacy-Preserving Image Acquisition Using Trainable Optical Kernel
AUTHORS: Yamin Sepehri ; Pedram Pad ; Pascal Frossard ; L. Andrea Dunbar
CATEGORY: cs.CV [cs.CV, cs.CR, cs.LG, eess.IV, I.2.10; I.5.0]
HIGHLIGHT: In this work, for the first time, we propose a trainable image acquisition method that removes the sensitive identity revealing information in the optical domain before it reaches the image sensor.
81, TITLE: Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering Based on Scene Graphs
AUTHORS: Daniel Reich ; Felix Putze ; Tanja Schultz
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: With the expressed goal of improving system transparency and visual grounding in the reasoning process in VQA, we present a modular system for the task of compositional VQA based on scene graphs.
82, TITLE: Real-time 3D Object Detection Using Feature Map Flow
AUTHORS: Youshaa Murhij ; Dmitry Yudin
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we present a real-time 3D detection approach considering time-spatial feature map aggregation from different time steps of deep neural model inference (named feature map flow, FMF).
83, TITLE: Image Classification with CondenseNeXt for ARM-Based Computing Platforms
AUTHORS: Priyank Kalgaonkar ; Mohamed El-Sharkawy
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this paper, we demonstrate the implementation of our ultra-efficient deep convolutional neural network architecture: CondenseNeXt on NXP BlueBox, an autonomous driving development platform developed for self-driving vehicles.
84, TITLE: Cheating Detection Pipeline for Online Interviews and Exams
AUTHORS: Azmi Can �zgen ; Mahiye Uluya?mur �zt�rk ; Umut Bayraktar
CATEGORY: cs.CV [cs.CV, cs.AI, cs.HC, cs.LG, cs.MM]
HIGHLIGHT: In this work, we present a cheating analysis pipeline for online interviews and exams. To evaluate the performance of the pipeline we collected a private video dataset.
85, TITLE: An Image Classifier Can Suffice Video Understanding
AUTHORS: Quanfu Fan ; Chun-Fu ; Chen ; Rameswar Panda
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a new perspective on video understanding by casting the video recognition problem as an image recognition task.
86, TITLE: Semi-Supervised Deep Ensembles for Blind Image Quality Assessment
AUTHORS: Zhihua Wang ; Dingquan Li ; Kede Ma
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: Here we investigate a semi-supervised ensemble learning strategy to produce generalizable blind image quality assessment models.
87, TITLE: Making Images Real Again: A Comprehensive Survey on Deep Image Composition
AUTHORS: LI NIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this survey, we summarize the datasets and methods for the above research directions.
88, TITLE: Contrastive Counterfactual Visual Explanations With Overdetermination
AUTHORS: ADAM WHITE et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: A novel explainable AI method called CLEAR Image is introduced in this paper.
89, TITLE: A Graph-based Approach to Derive The Geodesic Distance on Statistical Manifolds: Application to Multimedia Information Retrieval
AUTHORS: Zakariae Abbad ; Ahmed Drissi El Maliani ; Said Ouatik El Alaoui ; Mohammed El Hassouni
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we leverage the properties of non-Euclidean Geometry to define the Geodesic distance (GD) on the space of statistical manifolds.
90, TITLE: Speech2Properties2Gestures: Gesture-Property Prediction As A Tool for Generating Representational Gestures from Speech
AUTHORS: TARAS KUCHERENKO et. al.
CATEGORY: cs.HC [cs.HC, cs.CV, cs.GR, cs.LG, I.2.7; I.2.6; I.3.7]
HIGHLIGHT: We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce more semantically rich gestures.
91, TITLE: Core Challenges in Embodied Vision-Language Planning
AUTHORS: JONATHAN FRANCIS et. al.
CATEGORY: cs.LG [cs.LG, cs.CL, cs.CV]
HIGHLIGHT: In this survey paper, we discuss Embodied Vision-Language Planning (EVLP) tasks, a family of prominent embodied navigation and manipulation problems that jointly use computer vision and natural language.
92, TITLE: Self-paced Principal Component Analysis
AUTHORS: Zhao Kang ; Hongfei Liu ; Jiangxin Li ; Xiaofeng Zhu ; Ling Tian
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV, stat.ML]
HIGHLIGHT: Based on this principle, we propose a novel method called Self-paced PCA (SPCA) to further reduce the effect of noise and outliers.
93, TITLE: Understanding Dynamics of Nonlinear Representation Learning and Its Application
AUTHORS: Kenji Kawaguchi ; Linjun Zhang ; Zhun Deng
CATEGORY: cs.LG [cs.LG, cs.CV, math.OC, stat.ML]
HIGHLIGHT: In this paper, we study the dynamics of such implicit nonlinear representation learning.
94, TITLE: Deep Learning for Technical Document Classification
AUTHORS: Shuo Jiang ; Jianxi Luo ; Jie Hu ; Christopher L. Magee
CATEGORY: cs.LG [cs.LG, cs.CV, cs.IR]
HIGHLIGHT: This paper describes a novel multimodal deep learning architecture, called TechDoc, for technical document classification, which utilizes both natural language and descriptive images to train hierarchical classifiers.
95, TITLE: Deep Learning Image Recognition for Non-images
AUTHORS: Boris Kovalerchuk ; Divya Chandrika Kalla ; Bedant Agarwal
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: The CPC-R algorithm presented in this chapter converts non-image data into images by visualizing non-image data.
96, TITLE: Co$^2$L: Contrastive Continual Learning
AUTHORS: Hyuntak Cha ; Jaeho Lee ; Jinwoo Shin
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this paper, we found that the similar holds in the continual learning con-text: contrastively learned representations are more robust against the catastrophic forgetting than jointly trained representations.
97, TITLE: Midpoint Regularization: from High Uncertainty Training to Conservative Classification
AUTHORS: Hongyu Guo
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: We extend this technique by considering example pairs, coined PLS.
98, TITLE: Machine Learning Detection Algorithm for Large Barkhausen Jumps in Cluttered Environment
AUTHORS: Roger Alimi ; Amir Ivry ; Elad Fisher ; Eyal Weiss
CATEGORY: cs.LG [cs.LG, cs.CV, physics.ins-det]
HIGHLIGHT: This work fills this gap and presents algorithms that distinguish dc jumps embedded in natural magnetic field data.
99, TITLE: Domain Conditional Predictors for Domain Adaptation
AUTHORS: Joao Monteiro ; Xavier Gibert ; Jianqiao Feng ; Vincent Dumoulin ; Dar-Shyang Lee
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: In this contribution, we tackle the problem of generalizing across data sources by approaching it from the opposite direction: we consider a conditional modeling approach in which predictions, in addition to being dependent on the input data, use information relative to the underlying data-generating distribution.
100, TITLE: FreeTickets: Accurate, Robust and Efficient Deep Ensemble By Training with Dynamic Sparsity
AUTHORS: SHIWEI LIU et. al.
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this work, we attempt to address this cost-reducing problem by introducing the FreeTickets concept, as the first solution which can boost the performance of sparse convolutional neural networks over their dense network equivalents by a large margin, while using for complete training only a fraction of the computational resources required by the latter.
101, TITLE: Vision-driven Compliant Manipulation for Reliable, High-Precision Assembly Tasks
AUTHORS: ANDREW S. MORGAN et. al.
CATEGORY: cs.RO [cs.RO, cs.AI, cs.CV, cs.SY, eess.SY]
HIGHLIGHT: This paper describes in detail the system components and showcases its efficacy with extensive experiments involving tight tolerance peg-in-hole insertion tasks of various geometries as well as open-world constrained placement tasks.
102, TITLE: Kimera-Multi: Robust, Distributed, Dense Metric-Semantic SLAM for Multi-Robot Systems
AUTHORS: YULUN TIAN et. al.
CATEGORY: cs.RO [cs.RO, cs.CV, cs.MA]
HIGHLIGHT: This paper presents Kimera-Multi, the first multi-robot system that (i) is robust and capable of identifying and rejecting incorrect inter and intra-robot loop closures resulting from perceptual aliasing, (ii) is fully distributed and only relies on local (peer-to-peer) communication to achieve distributed localization and mapping, and (iii) builds a globally consistent metric-semantic 3D mesh model of the environment in real-time, where faces of the mesh are annotated with semantic labels.
103, TITLE: Disentangling Semantic Features of Macromolecules in Cryo-Electron Tomography
AUTHORS: Kai Yi ; Jianye Pang ; Yungeng Zhang ; Xiangrui Zeng ; Min Xu
CATEGORY: q-bio.BM [q-bio.BM, cs.CV]
HIGHLIGHT: This paper has addressed the problem by proposing a 3D Spatial Variational Autoencoder that explicitly disentangle the structure, orientation, and shift of macromolecules.
104, TITLE: Functional Classwise Principal Component Analysis: A Novel Classification Framework
AUTHORS: Avishek Chatterjee ; Satyaki Mazumder ; Koel Das
CATEGORY: stat.ML [stat.ML, cs.CV, cs.LG]
HIGHLIGHT: In this paper, we present a novel classification framework using functional data and classwise Principal Component Analysis (PCA).
105, TITLE: Tiled Sparse Coding in Eigenspaces for The COVID-19 Diagnosis in Chest X-ray Images
AUTHORS: Juan E. Arco ; Andr�s Ortiz ; Javier Ram�rez ; Juan M Gorriz
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In this work, we propose a classification framework based on sparse coding in order to identify the pneumonia patterns associated with different pathologies.
106, TITLE: Using Deep Learning to Detect Patients at Risk for Prostate Cancer Despite Benign Biopsies
AUTHORS: BOING LIU et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, q-bio.QM]
HIGHLIGHT: As a proof-of-principle, we developed and validated a deep convolutional neural network model to distinguish between morphological patterns in benign prostate biopsy whole slide images from men with and without established cancer.
107, TITLE: MTrans: Multi-Modal Transformer for Accelerated MR Imaging
AUTHORS: CHUN-MEI FENG et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To this end, we propose a multi-modal transformer (MTrans), which is capable of transferring multi-scale features from the target modality to the auxiliary modality, for accelerated MR imaging.
108, TITLE: Knee Osteoarthritis Severity Prediction Using An Attentive Multi-Scale Deep Convolutional Neural Network
AUTHORS: Rohit Kumar Jain ; Prasen Kumar Sharma ; Sibaji Gaj ; Arijit Sur ; Palash Ghosh
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: This paper presents a deep learning-based framework, namely OsteoHRNet, that automatically assesses the Knee OA severity in terms of Kellgren and Lawrence (KL) grade classification from X-rays.
109, TITLE: Learning Stochastic Object Models from Medical Imaging Measurements By Use of Advanced AmbientGANs
AUTHORS: Weimin Zhou ; Sayantan Bhadra ; Frank J. Brooks ; Hua Li ; Mark A. Anastasio
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG, stat.ML]
HIGHLIGHT: To circumvent this, in this work, a modified AmbientGAN training strategy is proposed that is suitable for modern progressive or multi-resolution training approaches such as employed in the Progressive Growing of GANs and Style-based GANs.
110, TITLE: A Machine Learning Model for Early Detection of Diabetic Foot Using Thermogram Images
AUTHORS: AMITH KHANDAKAR et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: We have compared a machine learning-based scoring technique with feature selection and optimization techniques and learning classifiers to several state-of-the-art Convolutional Neural Networks (CNNs) on foot thermogram images and propose a robust solution to identify the diabetic foot.
111, TITLE: Progressive Joint Low-light Enhancement and Noise Removal for Raw Images
AUTHORS: Yucheng Lu ; Seung-Won Jung
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To tackle this problem, in this paper, we propose a low-light image processing framework that performs joint illumination adjustment, color enhancement, and denoising.
112, TITLE: A 3D CNN Network with BERT For Automatic COVID-19 Diagnosis From CT-Scan Images
AUTHORS: Weijun Tan ; Jingfeng Liu
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: We present an automatic COVID1-19 diagnosis framework from lung CT-scan slice images.
113, TITLE: Benchmarking Convolutional Neural Networks for Diagnosing Lyme Disease from Images
AUTHORS: SK IMRAN HOSSAIN et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: The main objective of this study is to extensively analyze the effectiveness of CNNs for diagnosing Lyme disease from images and to find out the best CNN architecture for the purpose.
114, TITLE: Weighted Multi-level Deep Learning Analysis and Framework for Processing Breast Cancer WSIs
AUTHORS: Peter Bokor ; Lukas Hudec ; Ondrej Fabian ; Wanda Benesova
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG, I.4.6; I.4.10]
HIGHLIGHT: We present a deep learning-based solution and framework for processing WSI based on a novel approach utilizing the advantages of image levels.
115, TITLE: An XAI Approach to Deep Learning Models in The Detection of Ductal Carcinoma in Situ
AUTHORS: Michele La Ferla ; Matthew Montebello ; Dylan Seychell
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: Following the research by Montavon and Binder, we used the DeepTaylor Layer-wise Relevance Propagation (LRP) model to highlight those pixels and regions within a mammogram which contribute most to its classification.
116, TITLE: BiX-NAS: Searching Efficient Bi-directional Architecture for Medical Image Segmentation
AUTHORS: XINYI WANG et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this work, we study a multi-scale upgrade of a bi-directional skip connected network and then automatically discover an efficient architecture by a novel two-phase Neural Architecture Search (NAS) algorithm, namely BiX-NAS.
117, TITLE: Residual Moment Loss for Medical Image Segmentation
AUTHORS: QUANZIANG WANG et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In this paper, we propose a novel loss function, namely residual moment (RM) loss, to explicitly embed the location information of segmentation targets during the training of deep learning networks.
118, TITLE: ACN: Adversarial Co-training Network for Brain Tumor Segmentation with Missing Modalities
AUTHORS: YIXIN WANG et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose a novel Adversarial Co-training Network (ACN) to solve this issue, in which a series of independent yet related models are trained dedicated to each missing situation with significantly better results.