本专栏是计算机视觉方向论文收集积累,时间:2021年7月12日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: Hoechst Is All You Need: LymphocyteClassification with Deep Learning
AUTHORS: Jessica Cooper ; In Hwa Um ; Ognjen Arandjelovi? ; David J Harrison
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this work we show otherwise, training a deep convolutional neural network to identify cells expressing three proteins (T lymphocyte markers CD3 and CD8, and the B lymphocyte marker CD20) with greater than 90% precision and recall, from Hoechst 33342 stained tissue only.
2, TITLE: Action Unit Detection with Joint Adaptive Attention and Graph Relation
AUTHORS: CHENGGONG ZHANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper describes an approach to the facial action unit (AU) detection.
3, TITLE: Graph-based Deep Generative Modelling for Document Layout Generation
AUTHORS: Sanket Biswas ; Pau Riba ; Josep Llad�s ; Umapada Pal
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we have proposed an automated deep generative model using Graph Neural Networks (GNNs) to generate synthetic data with highly variable and plausible document layouts that can be used to train document interpretation systems, in this case, specially in digital mailroom applications.
4, TITLE: RGB Stream Is Enough for Temporal Action Detection
AUTHORS: Chenhao Wang ; Hongxiang Cai ; Yuxin Zou ; Yichao Xiong
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we argue that optical flow is dispensable in high-accuracy temporal action detection and image level data augmentation (ILDA) is the key solution to avoid performance degradation when optical flow is removed.
5, TITLE: Semantic Segmentation on Multiple Visual Domains
AUTHORS: Floris Naber
CATEGORY: cs.CV [cs.CV, cs.AI, eess.IV]
HIGHLIGHT: In this paper a method for this is proposed for the datasets Cityscapes, SUIM and SUN RGB-D, by creating a label-space that spans all classes of the datasets.
6, TITLE: Seven Basic Expression Recognition Using ResNet-18
AUTHORS: Satnam Singh ; Doris Schicker
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose to use a ResNet-18 architecture that was pre-trained on the FER+ dataset for tackling the problem of affective behavior analysis in-the-wild (ABAW) for classification of the seven basic expressions, namely, neutral, anger, disgust, fear, happiness, sadness and surprise.
7, TITLE: Score Refinement for Confidence-based 3D Multi-object Tracking
AUTHORS: Nuri Benbarka ; Jona Schr�der ; Andreas Zell
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: Our work focuses on a neglected part of the tracking system: score refinement and tracklet termination.
8, TITLE: StyleCariGAN: Caricature Generation Via StyleGAN Feature Map Modulation
AUTHORS: WONJONG JANG et. al.
CATEGORY: cs.CV [cs.CV, cs.GR, I.4.0]
HIGHLIGHT: We present a caricature generation framework based on shape and style manipulation using StyleGAN.
9, TITLE: Memes in The Wild: Assessing The Generalizability of The Hateful Memes Challenge Dataset
AUTHORS: HANNAH ROSE KIRK et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we collect hateful and non-hateful memes from Pinterest to evaluate out-of-sample performance on models pre-trained on the Facebook dataset.
10, TITLE: ViTGAN: Training GANs with Vision Transformers
AUTHORS: KWONJOON LEE et. al.
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: In this paper, we investigate if such observation can be extended to image generation.
11, TITLE: MutualEyeContact: A Conversation Analysis Tool with Focus on Eye Contact
AUTHORS: Alexander Sch�fer ; Tomoko Isomura ; Gerd Reis ; Katsumi Watanabe ; Didier Stricker
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a tool called MutualEyeContact which excels in those tasks and can help scientists to understand the importance of (mutual) eye contact in social interactions.
12, TITLE: Semantic and Geometric Unfolding of StyleGAN Latent Space
AUTHORS: Mustafa Shukor ; Xu Yao ; Bharath Bhushan Damodaran ; Pierre Hellier
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We thus propose a new method to learn a proxy latent representation using normalizing flows to remedy these limitations, and show that this leads to a more efficient space for face image editing.
13, TITLE: Deep Image Synthesis from Intuitive User Input: A Review and Perspectives
AUTHORS: YUAN XUE et. al.
CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG]
HIGHLIGHT: This paper reviews recent works for image synthesis given intuitive user input, covering advances in input versatility, image generation methodology, benchmark datasets, and evaluation metrics.
14, TITLE: Learning Cascaded Detection Tasks with Weakly-Supervised Domain Adaptation
AUTHORS: Niklas Hanselmann ; Nick Schneider ; Benedikt Ortelt ; Andreas Geiger
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose a weakly supervised domain adaptation setting which exploits the structure of cascaded detection tasks.
15, TITLE: Prior-Guided Multi-View 3D Head Reconstruction
AUTHORS: Xueying Wang ; Yudong Guo ; Zhongqi Yang ; Juyong Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we consider this problem with a few multi-view portrait images as input.
16, TITLE: Gradient-Based Quantification of Epistemic Uncertainty for Deep Object Detectors
AUTHORS: Tobias Riedlinger ; Matthias Rottmann ; Marius Schubert ; Hanno Gottschalk
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Here, we introduce novel gradient-based uncertainty metrics and investigate them for different object detection architectures.
17, TITLE: Multi-Modal Association Based Grouping for Form Structure Extraction
AUTHORS: Milan Aggarwal ; Mausoom Sarkar ; Hiresh Gupta ; Balaji Krishnamurthy
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we present a novel multi-modal approach for form structure extraction. We also introduce our new rich human-annotated Forms Dataset.
18, TITLE: Unity Perception: Generate Synthetic Data for Computer Vision
AUTHORS: STEVE BORKMAN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We introduce the Unity Perception package which aims to simplify and accelerate the process of generating synthetic datasets for computer vision tasks by offering an easy-to-use and highly customizable toolset.
19, TITLE: Wavelet Transform-assisted Adaptive Generative Modeling for Colorization
AUTHORS: Jin Li ; Wanyun Li ; Zichen Xu ; Yuhao Wang ; Qiegen Liu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This study presents a novel scheme that exploiting the score-based generative model in wavelet domain to address the issue.
20, TITLE: Towards Robust General Medical Image Segmentation
AUTHORS: Laura Daza ; Juan C. P�rez ; Pablo Arbel�ez
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a new framework to assess the robustness of general medical image segmentation systems. Our contributions are two-fold: (i) we propose a new benchmark to evaluate robustness in the context of the Medical Segmentation Decathlon (MSD) by extending the recent AutoAttack natural image classification framework to the domain of volumetric data segmentation, and (ii) we present a novel lattice architecture for RObust Generic medical image segmentation (ROG).
21, TITLE: Multitask Multi-database Emotion Recognition
AUTHORS: Manh Tu Vu ; Marie Beurton-Aimar
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we introduce our submission to the 2nd Affective Behavior Analysis in-the-wild (ABAW) 2021 competition.
22, TITLE: Multimodal Icon Annotation For Mobile Applications
AUTHORS: Xiaoxue Zang ; Ying Xu ; Jindong Chen
CATEGORY: cs.CV [cs.CV, cs.AI, cs.HC]
HIGHLIGHT: We propose a novel deep learning based multi-modal approach that combines the benefits of both pixel and view hierarchy features as well as leverages the state-of-the-art object detection techniques. In order to demonstrate the utility provided, we create a high quality UI dataset by manually annotating the most commonly used 29 icons in Rico, a large scale mobile design dataset consisting of 72k UI screenshots.
23, TITLE: Cross-modal Attention for MRI and Ultrasound Volume Registration
AUTHORS: XINRUI SONG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper aims to develop a self-attention mechanism specifically for cross-modal image registration.
24, TITLE: Beyond Farthest Point Sampling in Point-Wise Analysis
AUTHORS: YIQUN LIN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present a novel data-driven sampler learning strategy for point-wise analysis tasks.
25, TITLE: Effectiveness of State-of-the-Art Super Resolution Algorithms in Surveillance Environment
AUTHORS: Muhammad Ali Farooq ; Ammar Ali Khan ; Ansar Ahmad ; Rana Hammad Raza
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: For the proposed research work, we have inspected the effectiveness of four conventional yet effective SR algorithms and three deep learning-based SR algorithms to seek the finest method that executes well in a surveillance environment with limited training data op-tions.
26, TITLE: White-Box Cartoonization Using An Extended GAN Framework
AUTHORS: Amey Thakur ; Hasan Rizvi ; Mega Satish
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In the present study, we propose to implement a new framework for estimating generative models via an adversarial process to extend an existing GAN framework and develop a white-box controllable image cartoonization, which can generate high-quality cartooned images/videos from real-world photos and videos.
27, TITLE: A Multi-task Mean Teacher for Semi-supervised Facial Affective Behavior Analysis
AUTHORS: Lingfeng Wang ; Shisen Wang
CATEGORY: cs.CV [cs.CV, cs.HC]
HIGHLIGHT: To boost its performance, this paper presents a multi-task mean teacher model for semi?supervised Affective Behavior Analysis to learn from missing labels and exploring the learning of multiple correlated task simultaneously.
28, TITLE: Event-Based Feature Tracking in Continuous Time with Sliding Window Optimization
AUTHORS: Jason Chui ; Simon Klenk ; Daniel Cremers
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a novel method for continuous-time feature tracking in event cameras.
29, TITLE: Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression
AUTHORS: Shaowu Chen ; Jihao Zhou ; Weize Sun ; Lei Huang
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In particular, three joint matrix decomposition schemes are developed, and the corresponding optimization approaches based on Singular Values Decomposition are proposed.
30, TITLE: Activated Gradients for Deep Neural Networks
AUTHORS: Mei Liu ; Liangming Chen ; Xiaohao Du ; Long Jin ; Mingsheng Shang
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, a novel method by acting the gradient activation function (GAF) on the gradient is proposed to handle these challenges.
31, TITLE: Fast Pixel-Matching for Video Object Segmentation
AUTHORS: Siyue Yu ; Jimin Xiao ; BingFeng Zhang ; Eng Gee Lim
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we are aiming to design a new model to make a good balance between speed and performance.
32, TITLE: Mutually-aware Sub-Graphs Differentiable Architecture Search
AUTHORS: Haoxian Tan ; Sheng Guo ; Yujie Zhong ; Weilin Huang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a conceptually simple yet efficient method to bridge these two paradigms, referred as Mutually-aware Sub-Graphs Differentiable Architecture Search (MSG-DAS).
33, TITLE: JPGNet: Joint Predictive Filtering and Generative Network for Image Inpainting
AUTHORS: XIAOGUANG LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we formulate image inpainting as a mix of two problems, i.e., predictive filtering and deep generation.
34, TITLE: Interpretable Compositional Convolutional Neural Networks
AUTHORS: WEN SHEN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper proposes a method to modify a traditional convolutional neural network (CNN) into an interpretable compositional CNN, in order to learn filters that encode meaningful visual patterns in intermediate convolutional layers.
35, TITLE: UrbanScene3D: A Large Scale Urban Scene Dataset and Simulator
AUTHORS: Yilin Liu ; Fuyou Xue ; Hui Huang
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: We present a large scale urban scene dataset associated with a handy simulator based on Unreal Engine 4 and AirSim, which consists of both man-made and real-world reconstruction scenes in different scales, referred to as UrbanScene3D.
36, TITLE: Emotion Recognition with Incomplete Labels Using Modified Multi-task Learning Technique
AUTHORS: Phan Tran Dac Thinh ; Hoang Manh Hung ; Hyung-Jeong Yang ; Soo-Hyung Kim ; Guee-Sang Lee
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this study, we propose a method that utilizes the association between seven basic emotions and twelve action units from the AffWild2 dataset.
37, TITLE: On The Challenges of Open World Recognitionunder Shifting Visual Domains
AUTHORS: Dario Fontanel ; Fabio Cermelli ; Massimiliano Mancini ; Barbara Caputo
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: To this end, recent works tried to empower visual object recognition methods with the capability to i) detect unseen concepts and ii) extended their knowledge over time, as images of new semantic classes arrive.
38, TITLE: A Multi-modal and Multi-task Learning Method for Action Unit and Expression Recognition
AUTHORS: Yue Jin ; Tianqing Zheng ; Chao Gao ; Guoqiang Xu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we introduce a multi-modal and multi-task learning method by using both visual and audio information.
39, TITLE: ANCER: Anisotropic Certification Via Sample-wise Volume Maximization
AUTHORS: FRANCISCO EIRAS et. al.
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: We introduce ANCER, a practical framework for obtaining anisotropic certificates for a given test set sample via volume maximization.
40, TITLE: Learning to Detect Adversarial Examples Based on Class Scores
AUTHORS: Tobias Uelwer ; Felix Michels ; Oliver De Candido
CATEGORY: cs.LG [cs.LG, cs.CR, cs.CV]
HIGHLIGHT: In this work, we take a closer look at adversarial attack detection based on the class scores of an already trained classification model.
41, TITLE: Differentially Private Training of Neural Networks with Langevin Dynamics Forcalibrated Predictive Uncertainty
AUTHORS: MORITZ KNOLLE et. al.
CATEGORY: cs.LG [cs.LG, cs.CR, cs.CV]
HIGHLIGHT: We highlight and exploit parallels between stochastic gradient Langevin dynamics, a scalable Bayesian inference technique for training deep neural networks, and DP-SGD, in order to train differentially private, Bayesian neural networks with minor adjustments to the original (DP-SGD) algorithm.
42, TITLE: Does Form Follow Function? An Empirical Exploration of The Impact of Deep Neural Network Architecture Design on Hardware-Specific Acceleration
AUTHORS: Saad Abbasi ; Mohammad Javad Shafiee ; Ellick Chan ; Alexander Wong
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: In this study, a comprehensive empirical exploration is conducted to investigate the impact of deep neural network architecture design on the degree of inference speedup that can be achieved via hardware-specific acceleration.
43, TITLE: Exploring Dropout Discriminator for Domain Adaptation
AUTHORS: Vinod K Kurmi ; Venkatesh K Subramanian ; Vinay P. Namboodiri
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: Specifically, we propose a curriculum based dropout discriminator that gradually increases the variance of the sample based distribution and the corresponding reverse gradients are used to align the source and target feature representations.
44, TITLE: Understanding The Distributions of Aggregation Layers in Deep Neural Networks
AUTHORS: Eng-Jon Ong ; Sameed Husain ; Miroslaw Bober
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: To achieve this, we propose a novel mathematical formulation for analytically modelling the probability distributions of output values of layers involved with deep feature aggregation.
45, TITLE: Hacking VMAF and VMAF NEG: Metrics Vulnerability to Different Preprocessing
AUTHORS: Maksim Siniukov ; Anastasia Antsiferova ; Dmitriy Kulikov ; Dmitriy Vatolin
CATEGORY: cs.MM [cs.MM, cs.CV, cs.GR]
HIGHLIGHT: In this paper, we show how popular quality metrics VMAF and its tuning-resistant version VMAF NEG can be artificially increased by video preprocessing.
46, TITLE: EasyCom: An Augmented Reality Dataset to Support Algorithms for Easy Communication in Noisy Environments
AUTHORS: JACOB DONLEY et. al.
CATEGORY: cs.SD [cs.SD, cs.CV, cs.LG, eess.AS, eess.SP]
HIGHLIGHT: In this work, we describe, evaluate and release a dataset that contains over 5 hours of multi-modal data useful for training and testing algorithms for the application of improving conversations for an AR glasses wearer. We have created and are releasing this dataset to facilitate research in multi-modal AR solutions to the cocktail party problem.
47, TITLE: Comparison of 2D Vs. 3D U-Net Organ Segmentation in Abdominal 3D CT Images
AUTHORS: Nico Zettler ; Andre Mastmeyer
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In this work, we focus on comparing 2D U-Nets vs. 3D U-Net counterparts.
48, TITLE: CASPIANET++: A Multidimensional Channel-Spatial Asymmetric Attention Network with Noisy Student Curriculum Learning Paradigm for Brain Tumor Segmentation
AUTHORS: Andrea Liew ; Chun Cheng Lee ; Boon Leong Lan ; Maxine Tan
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we introduce a channel and spatial wise asymmetric attention (CASPIAN) by leveraging the inherent structure of tumors to detect regions of saliency.
49, TITLE: Hepatocellular Carcinoma Segmentation FromDigital Subtraction Angiography Videos UsingLearnable Temporal Difference
AUTHORS: WENTING JIANG et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we raise the problem of HCCsegmentation in DSA videos, and build our own DSA dataset.
50, TITLE: A Deep Discontinuity-Preserving Image Registration Network
AUTHORS: Xiang Chen ; Nishant Ravikumar ; Yan Xia ; Alejandro F Frangi
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To tackle this issue, we propose a weakly-supervised Deep Discontinuity-preserving Image Registration network (DDIR), to obtain better registration performance and realistic deformation fields.
51, TITLE: 3D RegNet: Deep Learning Model for COVID-19 Diagnosis on Chest CT Image
AUTHORS: Haibo Qi ; Yuhan Wang ; Xinyu Liu
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In this paper, a 3D-RegNet-based neural network is proposed for diagnosing the physical condition of patients with coronavirus (Covid-19) infection.
52, TITLE: Modality Specific U-Net Variants for Biomedical Image Segmentation: A Survey
AUTHORS: Narinder Singh Punn ; Sonali Agarwal
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: This article contributes to present the success of these approaches by describing the U-Net framework, followed by the comprehensive analysis of the U-Net variants for different medical imaging or modalities such as magnetic resonance imaging, X-ray, computerized tomography/computerized axial tomography, ultrasound, positron emission tomography, etc.
53, TITLE: LIFE: A Generalizable Autodidactic Pipeline for 3D OCT-A Vessel Segmentation
AUTHORS: DEWEI HU et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: We propose a learning-based method that is only supervised by a self-synthesized modality named local intensity fusion (LIF).
54, TITLE: Retinal OCT Denoising with Pseudo-Multimodal Fusion Network
AUTHORS: Dewei Hu ; Joseph D. Malone ; Yigit Atay ; Yuankai K. Tao ; Ipek Oguz
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this study, we propose a learning-based method that exploits information from the single-frame noisy B-scan and a pseudo-modality that is created with the aid of the self-fusion method.
55, TITLE: Deep Learning Models for Benign and Malign Ocular Tumor Growth Estimation
AUTHORS: Mayank Goswami
CATEGORY: eess.IV [eess.IV, cs.CV, q-bio.TO]
HIGHLIGHT: Exhaustive sensitivity analysis of deep learning models is performed with respect to the number of training and testing images using 8 eight performance indices to study accuracy, reliability/reproducibility, and speed.