本专栏是计算机视觉方向论文收集积累,时间:2021年9月10日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
AUTHORS: Stella Frank ; Emanuele Bugliarello ; Desmond Elliott
CATEGORY: cs.CL [cs.CL, cs.CV]
HIGHLIGHT: We propose a diagnostic method based on cross-modal input ablation to assess the extent to which these models actually integrate cross-modal information.
2, TITLE: Single Image 3D Object Estimation with Primitive Graph Networks
AUTHORS: Qian He ; Desen Zhou ; Bo Wan ; Xuming He
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: To address those challenges, we adopt a primitive-based representation for 3D object, and propose a two-stage graph network for primitive-based 3D object estimation, which consists of a sequential proposal module and a graph reasoning module.
3, TITLE: Preservational Learning Improves Self-supervised Medical Image Models By Reconstructing Diverse Contexts
AUTHORS: Hong-Yu Zhou ; Chixiang Lu ; Sibei Yang ; Xiaoguang Han ; Yizhou Yu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: From this perspective, we introduce Preservational Learning to reconstruct diverse image contexts in order to preserve more information in learned representations.
4, TITLE: Tiny CNN for Feature Point Description for Document Analysis: Approach and Dataset
AUTHORS: A. Sheshkus ; A. Chirvonaya ; V. L. Arlazarov
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we study the problem of feature points description in the context of document analysis and template matching. In this paper, we construct and provide a dataset with a method of training patches retrieval.
5, TITLE: OSSR-PID: One-Shot Symbol Recognition in P&ID Sheets Using Path Sampling and GCN
AUTHORS: Shubham Paliwal ; Monika Sharma ; Lovekesh Vig
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Since, many symbols in P&ID are structurally very similar to each other, we utilize Arcface loss during DGCNN training which helps in maximizing symbol class separability by producing highly discriminative embeddings.
6, TITLE: Copy-Move Image Forgery Detection Based on Evolving Circular Domains Coverage
AUTHORS: SHILIN LU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: The aim of this paper is to improve the accuracy of copy-move forgery detection (CMFD) in image forensics by proposing a novel scheme.
7, TITLE: IFBiD: Inference-Free Bias Detection
AUTHORS: Ignacio Serna ; Aythami Morales ; Julian Fierrez ; Javier Ortega-Garcia
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: For the face models, we achieved 90% accuracy in distinguishing between models biased towards Asian, Black, or Caucasian ethnicity. To do so, we generated two databases with 36K and 48K biased models each.
8, TITLE: HSMD: An Object Motion Detection Algorithm Using A Hybrid Spiking Neural Network Architecture
AUTHORS: Pedro Machado ; Andreas Oikonomou ; Joao Filipe Ferreira ; T. M. McGinnity
CATEGORY: cs.CV [cs.CV, cs.NE]
HIGHLIGHT: The Hybrid Sensitive Motion Detector (HSMD) algorithm proposed in this work enhances the GSOC dynamic background subtraction (DBS) algorithm with a customised 3-layer spiking neural network (SNN) that outputs spiking responses akin to the OMS-GC.
9, TITLE: NEAT: Neural Attention Fields for End-to-End Autonomous Driving
AUTHORS: Kashyap Chitta ; Aditya Prakash ; Andreas Geiger
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.RO]
HIGHLIGHT: We present NEural ATtention fields (NEAT), a novel representation that enables such reasoning for end-to-end imitation learning models.
10, TITLE: ConvMLP: Hierarchical Convolutional MLPs for Vision
AUTHORS: Jiachen Li ; Ali Hassani ; Steven Walton ; Humphrey Shi
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To tackle these problems, we propose ConvMLP: a hierarchical Convolutional MLP for visual recognition, which is a light-weight, stage-wise, co-design of convolution layers, and MLPs.
11, TITLE: Taming Self-Supervised Learning for Presentation Attack Detection: In-Image De-Folding and Out-of-Image De-Mixing
AUTHORS: HAOZHE LIU et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this work, we propose to use self-supervised learning to find a reasonable initialization against local trap, so as to improve the generalization ability in detecting PAs on the biometric system.The proposed method, denoted as IF-OM, is based on a global-local view coupled with De-Folding and De-Mixing to derive the task-specific representation for PAD.During De-Folding, the proposed technique will learn region-specific features to represent samples in a local pattern by explicitly maximizing cycle consistency.
12, TITLE: Towards Robust Cross-domain Image Understanding with Unsupervised Noise Removal
AUTHORS: LEI ZHU et. al.
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: In this paper, we propose a novel method, termed Noise Tolerant Domain Adaptation, for WSDA.
13, TITLE: Continuous Event-Line Constraint for Closed-Form Velocity Initialization
AUTHORS: Peng Xin ; Xu Wangting ; Yang Jiaqi ; Kneip Laurent
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose the continuous event-line constraint, which relies on a constant-velocity motion assumption as well as trifocal tensor geometry in order to express a relationship between line observations given by event clusters as well as first-order camera dynamics.
14, TITLE: Fine-grained Data Distribution Alignment for Post-Training Quantization
AUTHORS: YUNSHAN ZHONG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To alleviate this limitation, in this paper, we leverage the synthetic data introduced by zero-shot quantization with calibration dataset and we propose a fine-grained data distribution alignment (FDDA) method to boost the performance of post-training quantization.
15, TITLE: Deep Hough Voting for Robust Global Registration
AUTHORS: Junha Lee ; Seungwook Kim ; Minsu Cho ; Jaesik Park
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present an efficient and robust framework for pairwise registration of real-world 3D scans, leveraging Hough voting in the 6D transformation parameter space. We then construct a set of triplets of correspondences to cast votes on the 6D Hough space, representing the transformation parameters in sparse tensors.
16, TITLE: UCTransNet: Rethinking The Skip Connections in U-Net from A Channel-wise Perspective with Transformer
AUTHORS: Haonan Wang ; Peng Cao ; Jiaqi Wang ; Osmar R. Zaiane
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: Based on our findings, we propose a new segmentation framework, named UCTransNet (with a proposed CTrans module in U-Net), from the channel perspective with attention mechanism.
17, TITLE: Towards Transferable Adversarial Attacks on Vision Transformers
AUTHORS: ZHIPENG WEI et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we posit that adversarial attacks on transformers should be specially tailored for their architecture, jointly considering both patches and self-attention, in order to achieve high transferability.
18, TITLE: Neural-IMLS: Learning Implicit Moving Least-Squares for Surface Reconstruction from Unoriented Point Clouds
AUTHORS: ZIXIONG WANG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.GR]
HIGHLIGHT: In this paper, we propose Neural-IMLS, a novel approach that learning noise-resistant signed distance function (SDF) for reconstruction.
19, TITLE: M5Product: A Multi-modal Pretraining Benchmark for E-commercial Product Downstream Tasks
AUTHORS: XIAO DONG et. al.
CATEGORY: cs.CV [cs.CV, cs.MM]
HIGHLIGHT: In this paper, we aim to advance the research of multi-modal pre-training on E-commerce and subsequently contribute a large-scale dataset, named M5Product, which consists of over 6 million multimodal pairs, covering more than 6,000 categories and 5,000 attributes.
20, TITLE: PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
AUTHORS: ZHI QIAO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a Parallel, Iterative and Mimicking Network (PIMNet) to balance accuracy and efficiency.
21, TITLE: Modified Supervised Contrastive Learning for Detecting Anomalous Driving Behaviours
AUTHORS: Shehroz S. Khan ; Ziting Shen ; Haoying Sun ; Ax Patel ; Ali Abedi
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We formulate this problem as a supervised contrastive learning approach to learn a visual representation to detect normal, and seen and unseen anomalous driving behaviours.
22, TITLE: Multi-Tensor Network Representation for High-Order Tensor Completion
AUTHORS: Chang Nie ; Huan Wang ; Zhihui Lai
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a fundamental tensor decomposition (TD) framework: Multi-Tensor Network Representation (MTNR), which can be regarded as a linear combination of a range of TD models, e.g., CANDECOMP/PARAFAC (CP) decomposition, Tensor Train (TT), and Tensor Ring (TR).
23, TITLE: Automated LoD-2 Model Reconstruction from Very-HighResolution Satellite-derived Digital Surface Model and Orthophoto
AUTHORS: Shengxi Gui ; Rongjun Qin
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a model-driven method that reconstructs LoD-2 building models following a "decomposition-optimization-fitting" paradigm.
24, TITLE: Talk-to-Edit: Fine-Grained Facial Editing Via Dialog
AUTHORS: Yuming Jiang ; Ziqi Huang ; Xingang Pan ; Chen Change Loy ; Ziwei Liu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose Talk-to-Edit, an interactive facial editing framework that performs fine-grained attribute manipulation through dialog between the user and the system.
25, TITLE: TxT: Crossmodal End-to-End Learning with Transformers
AUTHORS: Jan-Martin O. Steitz ; Jonas Pfeiffer ; Iryna Gurevych ; Stefan Roth
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: We address both shortcomings with TxT, a transformer-based crossmodal pipeline that enables fine-tuning both language and visual components on the downstream task in a fully end-to-end manner.
26, TITLE: Learning Cross-Scale Visual Representations for Real-Time Image Geo-Localization
AUTHORS: Tianyi Zhang ; Matthew Johnson-Roberson
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: In this study, we aim to address this problem by localizing image observations in a 2D multi-modal geospatial map. We introduce the cross-scale dataset and a methodology to produce additional data from cross-modality sources.
27, TITLE: Reconstructing and Grounding Narrated Instructional Videos in 3D
AUTHORS: DIMITRI ZHUKOV et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work we aim to reconstruct such objects and to localize associated narrations in 3D.
28, TITLE: Improving Deep Metric Learning By Divide and Conquer
AUTHORS: Artsiom Sanakoyeu ; Pingchuan Ma ; Vadim Tschernezki ; Bj�rn Ommer
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose to build a more expressive representation by jointly splitting the embedding space and the data hierarchically into smaller sub-parts.
29, TITLE: Self Supervision to Distillation for Long-Tailed Visual Recognition
AUTHORS: Tianhao Li ; Limin Wang ; Gangshan Wu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we show that soft label can serve as a powerful solution to incorporate label correlation into a multi-stage training scheme for long-tailed recognition.
30, TITLE: IICNet: A Generic Framework for Reversible Image Conversion
AUTHORS: Ka Leong Cheng ; Yueqi Xie ; Qifeng Chen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This work develops Invertible Image Conversion Net (IICNet) as a generic solution to various RIC tasks due to its strong capacity and task-independent design.
31, TITLE: Leveraging Local Domains for Image-to-Image Translation
AUTHORS: Anthony Dell'Eva ; Fabio Pizzati ; Massimo Bertozzi ; Raoul de Charette
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, cs.RO]
HIGHLIGHT: In this paper, we leverage human knowledge about spatial domain characteristics which we refer to as 'local domains' and demonstrate its benefit for image-to-image translation.
32, TITLE: ACP++: Action Co-occurrence Priors for Human-Object Interaction Detection
AUTHORS: Dong-Jin Kim ; Xiao Sun ; Jinsoo Choi ; Stephen Lin ; In So Kweon
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we model the correlations as action co-occurrence matrices and present techniques to learn these priors and leverage them for more effective training, especially on rare classes.
33, TITLE: Application of The Singular Spectrum Analysis on Electroluminescence Images of Thin-film Photovoltaic Modules
AUTHORS: Evgenii Sovetkin ; Bart E. Pieters
CATEGORY: cs.CV [cs.CV, stat.AP]
HIGHLIGHT: We propose an EL image decomposition as a sum of three components: global intensity, cell, and aperiodic components.
34, TITLE: Improving Video-Text Retrieval By Multi-Stream Corpus Alignment and Dual Softmax Loss
AUTHORS: Xing Cheng ; Hezheng Lin ; Xiangyu Wu ; Fan Yang ; Dong Shen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a multi-stream Corpus Alignment network with single gate Mixture-of-Experts (CAMoE) and a novel Dual Softmax Loss (DSL) to solve the two heterogeneity.
35, TITLE: Improving Building Segmentation for Off-Nadir Satellite Imagery
AUTHORS: HANXIANG HAO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a method that is able to provide accurate building segmentation for satellite imagery captured from a large range of off-nadir angles.
36, TITLE: Multilingual Audio-Visual Smartphone Dataset And Evaluation
AUTHORS: HAREESH MANDALAPU et. al.
CATEGORY: cs.CR [cs.CR, cs.CV]
HIGHLIGHT: In this work, we present an audio-visual smartphone dataset captured in five different recent smartphones.
37, TITLE: Energy Attack: On Transferring Adversarial Examples
AUTHORS: Ruoxi Shi ; Borui Yang ; Yangzhou Jiang ; Chenglong Zhao ; Bingbing Ni
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In this work we propose Energy Attack, a transfer-based black-box $L_\infty$-adversarial attack.
38, TITLE: ErfAct: Non-monotonic Smooth Trainable Activation Functions
AUTHORS: Koushik Biswas ; Sandeep Kumar ; Shilpak Banerjee ; Ashish Kumar Pandey
CATEGORY: cs.NE [cs.NE, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: We propose two novel non-monotonic smooth trainable activation functions, called ErfAct-1 and ErfAct-2.
39, TITLE: Dynamic Modeling of Hand-Object Interactions Via Tactile Sensing
AUTHORS: QIANG ZHANG et. al.
CATEGORY: cs.RO [cs.RO, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: In this work, we employ a high-resolution tactile glove to perform four different interactive activities on a diversified set of objects.
40, TITLE: SORNet: Spatial Object-Centric Representations for Sequential Manipulation
AUTHORS: Wentao Yuan ; Chris Paxton ; Karthik Desingh ; Dieter Fox
CATEGORY: cs.RO [cs.RO, cs.CV, cs.LG]
HIGHLIGHT: In this work, we propose SORNet (Spatial Object-Centric Representation Network), which extracts object-centric representations from RGB images conditioned on canonical views of the objects of interest.
41, TITLE: Fair Conformal Predictors for Applications in Medical Imaging
AUTHORS: Charles Lu ; Andreanne Lemay ; Ken Chang ; Katharina Hoebel ; Jayashree Kalpathy-Cramer
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In this paper, we conduct a field survey with clinicians to assess clinical use-cases of conformal predictions.
42, TITLE: Towards Fully Automated Segmentation of Rat Cardiac MRI By Leveraging Deep Learning Frameworks
AUTHORS: DANIEL FERNANDEZ-LLANEZA et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: Combined with a novel cardiac phase selection strategy, our work presents an important first step towards a fully automated segmentation pipeline in the context of rat cardiac analysis.
43, TITLE: PhysGNN: A Physics-Driven Graph Neural Network Based Model for Predicting Soft Tissue Deformation in Image-Guided Neurosurgery
AUTHORS: Yasmin Salehi ; Dennis Giannacopoulos
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: Therefore, this work proposes a novel framework, PhysGNN, a data-driven model that approximates the solution of FEA by leveraging graph neural networks (GNNs), which are capable of accounting for the mesh structural information and inductive learning over unstructured grids and complex topological structures.