
本专栏是计算机视觉方向论文收集积累,时间:2021年6月18日,来源:paper digest

欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!


1, TITLE: Probing Image-Language Transformers for Verb Understanding
AUTHORS: Lisa Anne Hendricks ; Aida Nematzadeh
CATEGORY: cs.CL [cs.CL, cs.CV]
HIGHLIGHT: To do so, we collect a dataset of image-sentence pairs (in English) consisting of 421 verbs that are either visual or commonly found in the pretraining data (i.e., the Conceptual Captions dataset).

2, TITLE: Trilateral Attention Network for Real-time Medical Image Segmentation
AUTHORS: Ghada Zamzmi ; Vandana Sachdev ; Sameer Antani
HIGHLIGHT: In this work, we propose an end-to-end network, called Trilateral Attention Network (TaNet), for real-time detection and segmentation in medical images.

3, TITLE: XCiT: Cross-Covariance Image Transformers
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We propose a "transposed" version of self-attention that operates across feature channels rather than tokens, where the interactions are based on the cross-covariance matrix between keys and queries.

4, TITLE: Semi-Autoregressive Transformer for Image Captioning
AUTHORS: Yuanen Zhou ; Yong Zhang ; Zhenzhen Hu ; Meng Wang
HIGHLIGHT: To make a better trade-off between speed and quality, we introduce a semi-autoregressive model for image captioning~(dubbed as SATIC), which keeps the autoregressive property in global but generates words parallelly in local.

5, TITLE: JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting
AUTHORS: Ron Mokady ; Rotem Tzaban ; Sagie Benaim ; Amit H. Bermano ; Daniel Cohen-Or
HIGHLIGHT: To alleviate this problem, we introduce JOKR - a JOint Keypoint Representation that captures the motion common to both the source and target videos, without requiring any object prior or data collection.

6, TITLE: The 2021 Image Similarity Dataset and Challenge
HIGHLIGHT: This paper introduces a new benchmark for large-scale image similarity detection.

7, TITLE: To Fit or Not to Fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision
AUTHORS: Chunlu Li ; Andreas Morel-Forster ; Thomas Vetter ; Bernhard Egger ; Adam Kortylewski
HIGHLIGHT: In this work, we enable model-based face autoencoders to segment occluders accurately without requiring any additional supervision during training, and this separates regions where the model will be fitted from those where it will not be fitted.

8, TITLE: NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go
HIGHLIGHT: We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i.e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them.

9, TITLE: Unsupervised Training Data Generation of Handwritten Formulas Using Generative Adversarial Networks with Self-Attention
AUTHORS: Matthias Springstein ; Eric M�ller-Budack ; Ralph Ewerth
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we introduce a system that creates a large set of synthesized training examples of mathematical expressions which are derived from LaTeX documents.

10, TITLE: Indian Masked Faces in The Wild Dataset
AUTHORS: Shiksha Mishra ; Puspita Majumdar ; Richa Singh ; Mayank Vatsa
HIGHLIGHT: In this paper, we present a novel \textbf{Indian Masked Faces in the Wild (IMFW)} dataset which contains images with variations in pose, illumination, resolution, and the variety of masks worn by the subjects.

11, TITLE: Deep Contrastive Graph Representation Via Adaptive Homotopy Learning
AUTHORS: Rui Zhang ; Chengjun Lu ; Ziheng Jiao ; Xuelong Li
HIGHLIGHT: To address the problem above, we propose a novel adaptive homotopy framework (AH) in which the Maclaurin duality is employed, such that the homotopy parameters can be adaptively obtained.

12, TITLE: Federated CycleGAN for Privacy-Preserving Image-to-Image Translation
AUTHORS: Joonyoung Song ; Jong Chul Ye
CATEGORY: cs.CV [cs.CV, cs.LG, stat.ML]
HIGHLIGHT: To address this, here we propose a novel federated CycleGAN architecture that can learn image translation in an unsupervised manner while maintaining the data privacy.

13, TITLE: AttDLNet: Attention-based DL Network for 3D LiDAR Place Recognition
AUTHORS: Tiago Barros ; Lu�s Garrote ; Ricardo Pereira ; Cristiano Premebida ; Urbano J. Nunes
HIGHLIGHT: To address the problem of place recognition using LiDAR data, this paper proposes a novel 3D LiDAR-based deep learning network (named AttDLNet) that comprises an encoder network and exploits an attention mechanism to selectively focus on long-range context and interfeature relationships.

14, TITLE: An Evaluation of Self-Supervised Pre-Training for Skin-Lesion Analysis
AUTHORS: Levy Chaves ; Alceu Bissoto ; Eduardo Valle ; Sandra Avila
HIGHLIGHT: In this work, we assess self-supervision for the diagnosis of skin lesions, comparing three self-supervised pipelines to a challenging supervised baseline, on five test datasets comprising in- and out-of-distribution samples.

15, TITLE: Unsupervised Video Prediction from A Single Frame By Estimating 3D Dynamic Scene Structure
AUTHORS: Paul Henderson ; Christoph H. Lampert ; Bernd Bickel
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: Our goal in this work is to generate realistic videos given just one initial frame as input.

16, TITLE: Dynamic Knowledge Distillation with A Single Stream Structure for RGB-DSalient Object Detection
AUTHORS: Guangyu Ren ; Tania Stathaki
HIGHLIGHT: To tackle this dilemma, we propose a dynamic distillation method along with a lightweight framework, which significantly reduces the parameters.

17, TITLE: A Random CNN Sees Objects: One Inductive Bias of CNN and Its Applications
AUTHORS: Yun-Hao Cao ; Jianxin Wu
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: That is, a CNN has an inductive bias to naturally focus on objects, named as Tobias (``The object is at sight'') in this paper.

18, TITLE: Deformation Driven Seq2Seq Longitudinal Tumor and Organs-at-Risk Prediction for Radiotherapy
HIGHLIGHT: Methods: To deal with the aforementioned challenges and to comply with the clinical requirements, we present a novel 3D sequence-to-sequence model based on Convolution Long Short Term Memory (ConvLSTM) that makes use of series of deformation vector fields (DVF) between individual timepoints and reference pre-treatment/planning CTs to predict future anatomical deformations and changes in gross tumor volume as well as critical OARs.

19, TITLE: How Can We Learn (more) from Challenges? A Statistical Approach to Driving Future Algorithm Development
HIGHLIGHT: To address this gap in the literature, we (1) present a statistical framework for learning from challenges and (2) instantiate it for the specific task of instrument instance segmentation in laparoscopic videos.

20, TITLE: Optical Mouse: 3D Mouse Pose From Single-View Video
AUTHORS: BO HU et. al.
HIGHLIGHT: We present a method to infer the 3D pose of mice, including the limbs and feet, from monocular videos.

21, TITLE: Multi-level Motion Attention for Human Motion Prediction
AUTHORS: Wei Mao ; Miaomiao Liu ; Mathieu Salzmann ; Hongdong Li
HIGHLIGHT: Here, we introduce an attention based feed-forward network that explicitly leverages this observation.

22, TITLE: Adversarial Visual Robustness By Causal Intervention
AUTHORS: Kaihua Tang ; Mingyuan Tao ; Hanwang Zhang
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we provide a causal viewpoint of adversarial vulnerability: the cause is the confounder ubiquitously existing in learning, where attackers are precisely exploiting the confounding effect.

23, TITLE: ShuffleBlock: Shuffle to Regularize Deep Convolutional Neural Networks
AUTHORS: Sudhakar Kumawat ; Gagan Kanojia ; Shanmuganathan Raman
HIGHLIGHT: We provide several ablation studies on selecting various hyperparameters of the ShuffleBlock module and propose a new scheduling method that further enhances its performance.

24, TITLE: Knowledge Distillation from Multi-modal to Mono-modal Segmentation Networks
CATEGORY: cs.CV [cs.CV, cs.AI, stat.ML]
HIGHLIGHT: In this paper, we propose KD-Net, a framework to transfer knowledge from a trained multi-modal network (teacher) to a mono-modal one (student).

25, TITLE: Insights Into Data Through Model Behaviour: An Explainability-driven Strategy for Data Auditing for Responsible Computer Vision Applications
AUTHORS: Alexander Wong ; Adam Dorfman ; Paul McInnis ; Hayden Gunraj
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this study, we take a departure and explore an explainability-driven strategy to data auditing, where actionable insights into the data at hand are discovered through the eyes of quantitative explainability on the behaviour of a dummy model prototype when exposed to data.

26, TITLE: The Fishnet Open Images Database: A Dataset for Fish Detection and Fine-Grained Categorization in Fisheries
AUTHORS: Justin Kay ; Matt Merrifield
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: To address this, we present the Fishnet Open Images Database, a large dataset of EM imagery for fish detection and fine-grained categorization onboard commercial fishing vessels.

27, TITLE: A Two-stage Multi-modal Affect Analysis Framework for Children with Autism Spectrum Disorder
AUTHORS: Jicheng Li ; Anjana Bhat ; Roghayeh Barmaki
CATEGORY: cs.CV [cs.CV, cs.HC, cs.MM]
HIGHLIGHT: In this paper, we present an open-source two-stage multi-modal approach leveraging acoustic and visual cues to predict three main affect states of children with ASD's affect states (positive, negative, and neutral) in real-world play therapy scenarios, and achieved an overall accuracy of 72:40%.

28, TITLE: Deep Subdomain Adaptation Network for Image Classification
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: Based on this, we present Deep Subdomain Adaptation Network (DSAN) which learns a transfer network by aligning the relevant subdomain distributions of domain-specific layer activations across different domains based on a local maximum mean discrepancy (LMMD).

29, TITLE: Learning to Associate Every Segment for Video Panoptic Segmentation
AUTHORS: Sanghyun Woo ; Dahun Kim ; Joon-Young Lee ; In So Kweon
HIGHLIGHT: Specifically, we aim to learn coarse segment-level matching and fine pixel-level matching together.

30, TITLE: Privacy-Preserving Eye-tracking Using Deep Learning
AUTHORS: Salman Seyedi ; Zifan Jiang ; Allan Levey ; Gari D. Clifford
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we focus on the case of a deep network model trained on images of individual faces.

31, TITLE: BABEL: Bodies, Action and Behavior with English Labels
AUTHORS: Abhinanda R. Punnakkal ; Arjun Chandrasekaran ; Nikos Athanasiou ; Alejandra Quiros-Ramirez ; Michael J. Black
CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG]
HIGHLIGHT: To address this, we present BABEL, a large dataset with language labels describing the actions being performed in mocap sequences.

32, TITLE: Long-Short Temporal Contrastive Learning of Video Transformers
AUTHORS: Jue Wang ; Gedas Bertasius ; Du Tran ; Lorenzo Torresani
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we empirically demonstrate that self-supervised pretraining of video transformers on video-only datasets can lead to action recognition results that are on par or better than those obtained with supervised pretraining on large-scale image datasets, even massive ones such as ImageNet-21K.

33, TITLE: Scale-Consistent Fusion: from Heterogeneous Local Sampling to Global Immersive Rendering
AUTHORS: Wenpeng Xing ; Jie Chen ; Zaifeng Yang ; Qiang Wang
HIGHLIGHT: To overcome this challenge, we propose a novel scale-consistent volume rescaling algorithm that robustly aligns the disparity probability volumes (DPV) among different captures for scale-consistent global geometry fusion.

34, TITLE: Layer Folding: Neural Network Depth Reduction Using Activation Linearization
HIGHLIGHT: We propose a method that learns whether non-linear activations can be removed, allowing to fold consecutive linear layers into one.

35, TITLE: SIFT Matching By Context Exposed
AUTHORS: Fabio Bellavia
HIGHLIGHT: This paper investigates how to step up local image descriptor matching by exploiting matching context information.

36, TITLE: Always Be Dreaming: A New Approach for Data-Free Class-Incremental Learning
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we consider the high-impact problem of Data-Free Class-Incremental Learning (DFCIL), where an incremental learning agent must learn new concepts over time without storing generators or training data from past tasks.

37, TITLE: Visual Correspondence Hallucination: Towards Geometric Reasoning
AUTHORS: Hugo Germain ; Vincent Lepetit ; Guillaume Bourmaud
HIGHLIGHT: In this paper, we bridge this gap by training a network to output a peaked probability distribution over the correspondent's location, regardless of this correspondent being visible, occluded, or outside the field of view.

38, TITLE: Deep HDR Hallucination for Inverse Tone Mapping
AUTHORS: Demetris Marnerides ; Thomas Bashford-Rogers ; Kurt Debattista
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: This work presents a GAN-based method that hallucinates missing information from badly exposed areas in LDR images and compares its efficacy with alternative variations.

39, TITLE: IFCNet: A Benchmark Dataset for IFC Entity Classification
AUTHORS: Christoph Emunds ; Nicolas Pauen ; Veronika Richter ; J�r�me Frisch ; Christoph van Treeck
HIGHLIGHT: This work presents IFCNet, a dataset of single-entity IFC files spanning a broad range of IFC classes containing both geometric and semantic information.

40, TITLE: Learning to Predict Visual Attributes in The Wild
HIGHLIGHT: To this end, we propose several techniques that systematically tackle these challenges, including a base model that utilizes both low- and high-level CNN features with multi-hop attention, reweighting and resampling techniques, a novel negative label expansion scheme, and a novel supervised attribute-aware contrastive learning algorithm. In this paper, we introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.

41, TITLE: Multi-Label Learning from Single Positive Labels
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We extend existing multi-label losses to this setting and propose novel variants that constrain the number of expected positive labels during training.

42, TITLE: MoDist: Motion Distillation for Self-supervised Video Representation Learning
AUTHORS: Fanyi Xiao ; Joseph Tighe ; Davide Modolo
HIGHLIGHT: We present MoDist as a novel method to explicitly distill motion information into self-supervised video representations.

43, TITLE: THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers
HIGHLIGHT: We present THUNDR, a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people, given monocular RGB images.

44, TITLE: Automatic Main Character Recognition for Photographic Studies
AUTHORS: Mert Seker ; Anssi M�nnist� ; Alexandros Iosifidis ; Jenni Raitoharju
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we analyze the feasibility of solving the main character recognition needed for photographic studies automatically and propose a method for identifying the main characters. To evaluate both the subjectivity of the task and the performance of our method, we collected a dataset of 300 varying images from multiple sources and asked five people, a photographic researcher and four other persons, to annotate the main characters.

45, TITLE: SPeCiaL: Self-Supervised Pretraining for Continual Learning
AUTHORS: Lucas Caccia ; Joelle Pineau
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: This paper presents SPeCiaL: a method for unsupervised pretraining of representations tailored for continual learning.

46, TITLE: Episode Adaptive Embedding Networks for Few-shot Learning
AUTHORS: Fangbing Liu ; Qing Wang
HIGHLIGHT: In this paper, we propose a novel approach, namely \emph{Episode Adaptive Embedding Network} (EAEN), to learn episode-specific embeddings of instances.

47, TITLE: Wavelet-Packet Powered Deepfake Image Detection
AUTHORS: Moritz Wolter ; Felix Blanke ; Charles Tapley Hoyt ; Jochen Garcke
HIGHLIGHT: This paper aims to fill this gap and describes a wavelet-based approach to gan-generated image analysis and detection.

48, TITLE: Positional Contrastive Learning for VolumetricMedical Image Segmentation
CATEGORY: cs.CV [cs.CV, I.4.6]
HIGHLIGHT: To address this issue, we propose a novel positional contrastive learning (PCL) framework to generate contrastive data pairs by leveraging the position information in volumetric medical images.

49, TITLE: Using Multiple Losses for Accurate Facial Age Estimation
AUTHORS: Yi Zhou ; Heikki Huttunen ; Tapio Elomaa
HIGHLIGHT: In this paper, we propose a simple yet effective approach for age estimation, which improves the performance compared to classification-based methods.

50, TITLE: Invisible for Both Camera and LiDAR: Security of Multi-Sensor Fusion Based Perception in Autonomous Driving Under Physical-World Attacks
CATEGORY: cs.CR [cs.CR, cs.CV, cs.LG]
HIGHLIGHT: We propose a novel attack pipeline that addresses two main design challenges: (1) non-differentiable target camera and LiDAR sensing systems, and (2) non-differentiable cell-level aggregated features popularly used in LiDAR-based AD perception.

51, TITLE: Learning Perceptual Manifold of Fonts
AUTHORS: Haoran Xie ; Yuki Fujita ; Kazunori Miyata
CATEGORY: cs.GR [cs.GR, cs.CV]
HIGHLIGHT: Motivated by this methodology, this work aims to adjust the machine generated character fonts with the effort of human workers in the perception study. After we obtained the distribution data of specific preferences, we utilize manifold learning approach to visualize the font distribution.

52, TITLE: SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV, cs.RO]
HIGHLIGHT: In this work, we consider robust policy learning which targets zero-shot generalization to unseen visual environments with large distributional shift.

53, TITLE: Class Balancing GAN with A Classifier in The Loop
AUTHORS: Harsh Rangwani ; Konda Reddy Mopuri ; R. Venkatesh Babu
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: In this work we introduce a novel theoretically motivated Class Balancing regularizer for training GANs.

54, TITLE: Evaluating The Robustness of Bayesian Neural Networks Against Different Types of Attacks
AUTHORS: Yutian Pang ; Sheng Cheng ; Jueming Hu ; Yongming Liu
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: To evaluate the robustness gain of Bayesian neural networks on image classification tasks, we perform input perturbations, and adversarial attacks to the state-of-the-art Bayesian neural networks, with a benchmark CNN model as reference.

55, TITLE: On Anytime Learning at Macroscale
AUTHORS: Lucas Caccia ; Jing Xu ; Myle Ott ; Marc'Aurelio Ranzato ; Ludovic Denoyer
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this work, we consider such a streaming learning setting, which we dub {\em anytime learning at macroscale} (ALMA).

56, TITLE: LiRA: Learning Visual Speech Representations from Audio Through Self-supervision
AUTHORS: Pingchuan Ma ; Rodrigo Mira ; Stavros Petridis ; Bj�rn W. Schuller ; Maja Pantic
CATEGORY: cs.LG [cs.LG, cs.CV, cs.SD, eess.AS]
HIGHLIGHT: In this work, we propose Learning visual speech Representations from Audio via self-supervision (LiRA).

57, TITLE: Regularization of Mixture Models for Robust Principal Graph Learning
AUTHORS: Tony Bonnaire ; Aur�lien Decelle ; Nabila Aghanim
CATEGORY: cs.LG [cs.LG, cond-mat.dis-nn, cs.CV]
HIGHLIGHT: In the particular case of manifold learning for ridge detection, we assume that the underlying manifold can be modeled as a graph structure acting like a topological prior for the Gaussian clusters turning the problem into a maximum a posteriori estimation.

58, TITLE: On The Dark Side of Calibration for Modern Neural Networks
AUTHORS: Aditya Singh ; Alessandro Bay ; Biswa Sengupta ; Andrea Mirabile
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: This paper presents a theoretically and empirically supported exposition for reviewing a model's calibration and refinement.

59, TITLE: Scaling-up Diverse Orthogonal Convolutional Networks with A Paraunitary Framework
AUTHORS: Jiahao Su ; Wonmin Byeon ; Furong Huang
CATEGORY: cs.LG [cs.LG, cs.CV, cs.NA, math.NA]
HIGHLIGHT: To address this problem, we propose a theoretical framework for orthogonal convolutional layers, which establishes the equivalence between various orthogonal convolutional layers in the spatial domain and the paraunitary systems in the spectral domain.

60, TITLE: Orthogonal-Pad� Activation Functions: Trainable Activation Functions for Smooth and Faster Convergence in Deep Networks
AUTHORS: Koushik Biswas ; Shilpak Banerjee ; Ashish Kumar Pandey
CATEGORY: cs.NE [cs.NE, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: We have proposed orthogonal-Pad\'e activation functions, which are trainable activation functions and show that they have faster learning capability and improves the accuracy in standard deep learning datasets and models.

61, TITLE: Improving On-Screen Sound Separation for Open Domain Videos with Audio-Visual Self-attention
AUTHORS: Efthymios Tzinis ; Scott Wisdom ; Tal Remez ; John R. Hershey
CATEGORY: cs.SD [cs.SD, cs.CV, cs.LG]
HIGHLIGHT: We introduce a state-of-the-art audio-visual on-screen sound separation system which is capable of learning to separate sounds and associate them with on-screen objects by looking at in-the-wild videos.

62, TITLE: Localized Uncertainty Attacks
CATEGORY: stat.ML [stat.ML, cs.CR, cs.CV, cs.LG]
HIGHLIGHT: In this paper, we present localized uncertainty attacks, a novel class of threat models against deterministic and stochastic classifiers.

63, TITLE: Automatic Segmentation of The Prostate on 3D Trans-rectal Ultrasound Images Using Statistical Shape Models and Convolutional Neural Networks
AUTHORS: Golnoosh Samei ; Davood Karimi ; Claudia Kesch ; Septimiu Salcudean
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this work we propose to segment the prostate on a challenging dataset of trans-rectal ultrasound (TRUS) images using convolutional neural networks (CNNs) and statistical shape models (SSMs).

64, TITLE: A Multi-task Convolutional Neural Network for Blind Stereoscopic Image Quality Assessment Using Naturalness Analysis
AUTHORS: Salima Bourbia ; Ayoub Karine ; Aladine Chetouani ; Mohammed El Hassouni
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this work, we propose to integrate these characteristics to estimate the quality of stereoscopic images without reference through a convolutional neural network.

65, TITLE: Controllable Confidence-Based Image Denoising
AUTHORS: Haley Owsianko ; Florian Cassayre ; Qiyuan Liang
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To mitigate these problems, in this project, we present a framework that is capable of controllable, confidence-based noise removal. We introduce a set of techniques to fuse the two components smoothly in the frequency domain.
