本专栏是计算机视觉方向论文收集积累,时间:2021年7月15日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question Answering
AUTHORS: RAJAT KONER et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose Graphhopper, a novel method that approaches the task by integrating knowledge graph reasoning, computer vision, and natural language processing techniques.
2, TITLE: Synthesis in Style: Semantic Segmentation of Historical Documents Using Synthetic Data
AUTHORS: Christian Bartz ; Hendrik R�tz ; Haojin Yang ; Joseph Bethge ; Christoph Meinel
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we propose a novel method for the synthesis of training data for semantic segmentation of document images.
3, TITLE: AdvFilter: Predictive Perturbation-aware Filtering Against Adversarial Attack Via Multi-domain Learning
AUTHORS: YIHAO HUANG et. al.
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: To address this problem, we propose predictive perturbation-aware pixel-wise filtering, where dual-perturbation filtering and an uncertainty-aware fusion module are designed and employed to automatically perceive the perturbation amplitude during the training and testing process.
4, TITLE: Artificial Intelligence in PET: An Industry Perspective
AUTHORS: ARKADIUSZ SITEK et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: This paper provides an overview of these industry-specific challenges for the development, standardization, commercialization, and clinical adoption of AI, and explores the potential enhancements to PET imaging brought on by AI in the near future.
5, TITLE: Few-shot Neural Human Performance Rendering from Sparse RGBD Videos
AUTHORS: ANQI PANG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We introduce a two-branch neural blending to combine the neural point render and classical graphics texturing pipeline, which integrates reliable observations over sparse key-frames.
6, TITLE: Dynamic Event Camera Calibration
AUTHORS: Kun Huang ; Yifu Wang ; Laurent Kneip
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: We present the first dynamic event camera calibration algorithm.
7, TITLE: Unsupervised Neural Rendering for Image Hazing
AUTHORS: BOYUN LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we propose a neural rendering method for image hazing, dubbed as HazeGEN.
8, TITLE: GREN: Graph-Regularized Embedding Network for Weakly-Supervised Disease Localization in X-ray Images
AUTHORS: BAOLIAN QI et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we argue that the cross-region and cross-image relationship, as contextual and compensating information, is vital to obtain more consistent and integral regions.
9, TITLE: Semi-Supervised Hypothesis Transfer for Source-Free Domain Adaptation
AUTHORS: NING MA et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this issue, we propose a novel adaptation method via hypothesis transfer without accessing source data at adaptation stage.
10, TITLE: BiSTF: Bilateral-Branch Self-Training Framework for Semi-Supervised Large-scale Fine-Grained Recognition
AUTHORS: Hao Chang ; Guochen Xie ; Jun Yu ; Qiang Ling
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose Bilateral-Branch Self-Training Framework (BiSTF), a simple yet effective framework to improve existing semi-supervised learning methods on class-imbalanced and domain-shifted fine-grained data.
11, TITLE: Detection of Abnormal Behavior with Self-Supervised Gaze Estimation
AUTHORS: Suneung-Kim ; Seong-Whan Lee
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: In this paper, we present a single video conferencing solution using gaze estimation in preparation for these problems. ii) For anomaly detection, we present a new dataset that aggregates the values of a new gaze, head pose, etc. iii) We train newly created data on Multi Layer Perceptron (MLP) models to detect anomaly behavior based on deep learning.
12, TITLE: Graph Jigsaw Learning for Cartoon Face Recognition
AUTHORS: Yong Li ; Lingjie Lao ; Zhen Cui ; Shiguang Shan ; Jian Yang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To mitigate this issue, we propose the GraphJigsaw that constructs jigsaw puzzles at various stages in the classification network and solves the puzzles with the graph convolutional network (GCN) in a progressive manner.
13, TITLE: Self-Supervised Multi-Modal Alignment for Whole Body Medical Imaging
AUTHORS: Rhydian Windsor ; Amir Jamaludin ; Timor Kadir ; Andrew Zisserman
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We make three contributions: (i) We introduce a multi-modal image-matching contrastive framework, that is able to learn to match different-modality scans of the same subject with high accuracy.
14, TITLE: SurgeonAssist-Net: Towards Context-Aware Head-Mounted Display-Based Augmented Reality for Surgical Guidance
AUTHORS: Mitchell Doughty ; Karan Singh ; Nilesh R. Ghugre
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present SurgeonAssist-Net: a lightweight framework making action-and-workflow-driven virtual assistance, for a set of predefined surgical tasks, accessible to commercially available optical see-through head-mounted displays (OST-HMDs). To demonstrate the feasibility of our approach for inference on the HoloLens 2 we created a sample dataset that included video of several surgical tasks recorded from a user-centric point-of-view.
15, TITLE: A Convolutional Neural Network Approach to The Classification of Engineering Models
AUTHORS: Bharadwaj Manda ; Pranjal Bhaskare ; Ramanathan Muthuganapathy
CATEGORY: cs.CV [cs.CV, cs.AI, cs.GR, cs.LG]
HIGHLIGHT: This paper presents a deep learning approach for the classification of Engineering (CAD) models using Convolutional Neural Networks (CNNs).
16, TITLE: How Much Can CLIP Benefit Vision-and-Language Tasks?
AUTHORS: SHENG SHEN et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.CL, cs.LG]
HIGHLIGHT: To further study the advantage brought by CLIP, we propose to use CLIP as the visual encoder in various V&L models in two typical scenarios: 1) plugging CLIP into task-specific fine-tuning; 2) combining CLIP with V&L pre-training and transferring to downstream tasks.
17, TITLE: HDMapNet: An Online HD Map Construction and Evaluation Framework
AUTHORS: Qi Li ; Yue Wang ; Yilun Wang ; Hang Zhao
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we argue that online map learning, which dynamically constructs the HD maps based on local sensor observations, is a more scalable way to provide semantic and geometry priors to self-driving vehicles than traditional pre-annotated HD maps.
18, TITLE: Domain Generalization with Pseudo-Domain Label for Face Anti-Spoofing
AUTHORS: Young Eun Kim ; Seong-Whan Lee
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we proposed a method that enables network to judge its domain by itself with the clustered convolutional feature statistics from intermediate layers of the network, without labeling domains as datasets.
19, TITLE: Deep Learning Based Novel View Synthesis
AUTHORS: Amit More ; Subhasis Chaudhuri
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a deep convolutional neural network (CNN) which learns to predict novel views of a scene from given collection of images.
20, TITLE: Uncertainty-Guided Mixup for Semi-Supervised Domain Adaptation Without Source Data
AUTHORS: Ning Ma ; Jiajun Bu ; Zhen Zhang ; Sheng Zhou
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: More specifically, we propose uncertainty-guided Mixup to reduce the representation's intra-domain discrepancy and perform inter-domain alignment without directly accessing the source data.
21, TITLE: Multi-Label Generalized Zero Shot Learning for The Classification of Disease in Chest Radiographs
AUTHORS: Nasir Hayat ; Hazem Lashen ; Farah E. Shamout
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Here, we propose a multi-label generalized zero shot learning (CXR-ML-GZSL) network that can simultaneously predict multiple seen and unseen diseases in CXR images.
22, TITLE: DVMN: Dense Validity Mask Network for Depth Completion
AUTHORS: Laurenz Reichardt ; Patrick Mangat ; Oliver Wasenm�ller
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: To this end, we introduce a novel layer with spatially variant and content-depended dilation to include additional data from sparse input.
23, TITLE: Faces in The Wild: Efficient Gender Recognition in Surveillance Conditions
AUTHORS: Tiago Roxo ; Hugo Proen�a
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To overcome these limitations, we: 1) present frontal and wild face versions of three well-known surveillance datasets; and 2) propose a model that effectively and dynamically combines facial and body information, which makes it suitable for gender recognition in wild conditions.
24, TITLE: MSFNet:Multi-scale Features Network for Monocular Depth Estimation
AUTHORS: Meiqi Pei
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To the end, we design a Multi-scale Features Network (MSFNet), which consists of Enhanced Diverse Attention (EDA) module and Upsample-Stage Fusion (USF) module.
25, TITLE: PDC: Piecewise Depth Completion Utilizing Superpixels
AUTHORS: Dennis Teutscher ; Patrick Mangat ; Oliver Wasenm�ller
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Thus, we propose our novel Piecewise Depth Completion (PDC), which works completely without deep learning.
26, TITLE: BRIMA: Low-overhead BRowser-only IMage Annotation Tool (Preprint)
AUTHORS: Tuomo Lahtinen ; Hannu Turtiainen ; Andrei Costin
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In order to address such challenges, we develop and present BRIMA -- a flexible and open-source browser extension that allows BRowser-only IMage Annotation at considerably lower overheads.
27, TITLE: Real-Time Pothole Detection Using Deep Learning
AUTHORS: Anas Al Shaghouri ; Rami Alkhatib ; Samir Berjaoui
CATEGORY: cs.CV [cs.CV, cs.LG, I.2; I.4]
HIGHLIGHT: Real-Time Pothole Detection Using Deep Learning
28, TITLE: Developmental Stage Classification of Embryos Using Two-Stream Neural Network with Linear-Chain Conditional Random Field
AUTHORS: STANISLAV LUKYANENKO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a two-stream model for developmental stage classification.
29, TITLE: Generative and Reproducible Benchmarks for Comprehensive Evaluation of Machine Learning Classifiers
AUTHORS: Patryk Orzechowski ; Jason H. Moore
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV, cs.NE, stat.ML, 68T09 (Primary) 62R07, 68-04, 68-11 (Secondary), I.5.2; I.1.2; I.5.1; I.6.5; I.2.0; G.1.6]
HIGHLIGHT: Here, we introduce the DIverse and GENerative ML Benchmark (DIGEN) - a collection of synthetic datasets for comprehensive, reproducible, and interpretable benchmarking of machine learning algorithms for classification of binary outcomes.
30, TITLE: AID-Purifier: A Light Auxiliary Network for Boosting Adversarial Defense
AUTHORS: Duhun Hwang ; Eunjung Lee ; Wonjong Rhee
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: We propose an AID-purifier that can boost the robustness of adversarially-trained networks by purifying their inputs.
31, TITLE: The Foes of Neural Network's Data Efficiency Among Unnecessary Input Dimensions
AUTHORS: Vanessa D'Amario ; Sanjana Srivastava ; Tomotake Sasaki ; Xavier Boix
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this letter, we investigate the impact of unnecessary input dimensions on a central issue of DNNs: their data efficiency, ie.
32, TITLE: Deep Neural Networks Are Surprisingly Reversible: A Baseline for Zero-Shot Inversion
AUTHORS: Xin Dong ; Hongxu Yin ; Jose M. Alvarez ; Jan Kautz ; Pavlo Molchanov
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: This paper presents a zero-shot direct model inversion framework that recovers the input to the trained model given only the internal representation.
33, TITLE: Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition
AUTHORS: Xinda Liu ; Lili Wang ; Xiaoguang Han
CATEGORY: cs.MM [cs.MM, cs.CV, eess.IV]
HIGHLIGHT: In this paper, we analyze the difficulties of fine-grained image recognition from a new perspective and propose a transformer architecture with the peak suppression module and knowledge guidance module, which respects the diversification of discriminative features in a single image and the aggregation of discriminative clues among multiple images.
34, TITLE: RCLC: ROI-based Joint Conventional and Learning Video Compression
AUTHORS: Trinh Man Hoang ; Jinjia Zhou
CATEGORY: cs.MM [cs.MM, cs.CV, eess.IV]
HIGHLIGHT: Addressed that the background information rarely changes in most remote meeting cases, we introduce a Region-Of-Interests (ROI) based video compression framework (named RCLC) that leverages the cutting-edge learning-based and conventional technologies.
35, TITLE: High-Speed and High-Quality Text-to-Lip Generation
AUTHORS: Jinglin Liu ; Zhiying Zhu ; Yi Ren ; Zhou Zhao
CATEGORY: cs.MM [cs.MM, cs.CV]
HIGHLIGHT: In this work, we propose a novel parallel decoding model for high-speed and high-quality text-to-lip generation (HH-T2L).
36, TITLE: Probabilistic Human Motion Prediction Via A Bayesian Neural Network
AUTHORS: Jie Xu ; Xingyu Chen ; Xuguang Lan ; Nanning Zheng
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: To solve this problem, we propose a probabilistic model for human motion prediction in this paper.
37, TITLE: Multi-Attention Generative Adversarial Network for Remote Sensing Image Super-Resolution
AUTHORS: Meng Xu ; Zhihao Wang ; Jiasong Zhu ; Xiuping Jia ; Sen Jia
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose a network based on the generative adversarial network (GAN) to generate high resolution remote sensing images, named the multi-attention generative adversarial network (MA-GAN).
38, TITLE: Learned Image Compression with Discretized Gaussian-Laplacian-Logistic Mixture Model and Concatenated Residual Modules
AUTHORS: HAISHENG FU et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose a more flexible discretized Gaussian-Laplacian-Logistic mixture model (GLLMM) for the latent representations, which can adapt to different contents in different images and different regions of one image more accurately.
39, TITLE: RCDNet: An Interpretable Rain Convolutional Dictionary Network for Single Image Deraining
AUTHORS: Hong Wang ; Qi Xie ; Qian Zhao ; Yong Liang ; Deyu Meng
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To handle such an ill-posed single image deraining task, in this paper, we specifically build a novel deep architecture, called rain convolutional dictionary network (RCDNet), which embeds the intrinsic priors of rain streaks and has clear interpretability.
40, TITLE: End-to-end Ultrasound Frame to Volume Registration
AUTHORS: Hengtao Guo ; Xuanang Xu ; Sheng Xu ; Bradford J. Wood ; Pingkun Yan
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose an end-to-end frame-to-volume registration network (FVR-Net), which can efficiently bridge the previous research gaps by aligning a 2D TRUS frame with a 3D TRUS volume without requiring hardware tracking.
41, TITLE: Hierarchical Analysis of Visual COVID-19 Features from Chest Radiographs
AUTHORS: SHRUTHI BANNUR et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: To address some of those shortcomings, we model radiological features with a human-interpretable class hierarchy that aligns with the radiological decision process.