本专栏是计算机视觉方向论文收集积累,时间:2021年9月14日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: COSMic: A Coherence-Aware Generation Metric for Image Descriptions
AUTHORS: MERT ?NAN et. al.
CATEGORY: cs.CL [cs.CL, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: We present a dataset of image$\unicode{x2013}$description pairs annotated with coherence relations.
2, TITLE: DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval
AUTHORS: AICHUN ZHU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we propose a novel Deep Surroundings-person Separation Learning (DSSL) model in this paper to effectively extract and match person information, and hence achieve a superior retrieval accuracy.
3, TITLE: DSNet: A Dual-Stream Framework for Weakly-Supervised Gigapixel Pathology Image Analysis
AUTHORS: TIANGE XIANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a novel weakly-supervised framework for classifying whole slide images (WSIs).
4, TITLE: The State of The Art When Using GPUs in Devising Image Generation Methods Using Deep Learning
AUTHORS: Yasuko Kawahata
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: In this case study, since the number of pixels becomes core dumping when the number of pixels is 512 or more, we can consider that we should consider improvement in the vector calculation part.
5, TITLE: LEA-Net: Layer-wise External Attention Network for Efficient Color Anomaly Detection
AUTHORS: Ryoya Katafuchi ; Terumasa Tokunaga
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we propose a novel model called Layer-wise External Attention Network (LEA-Net) for efficient image anomaly detection.
6, TITLE: ArtiBoost: Boosting Articulated 3D Hand-Object Pose Estimation Via Online Exploration and Synthesis
AUTHORS: KAILIN LI et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: To address the above issues, we propose ArtiBoost, a lightweight online data enrichment method that boosts articulated hand-object pose estimation from the data perspective.
7, TITLE: Facial Anatomical Landmark Detection Using Regularized Transfer Learning with Application to Fetal Alcohol Syndrome Recognition
AUTHORS: Zeyu Fu ; Jianbo Jiao ; Michael Suttie ; J. Alison Noble
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: To address this restriction, we develop a new regularized transfer learning approach that exploits the knowledge of a network learned on large facial recognition datasets.
8, TITLE: Nonlocal Patch-Based Fully-Connected Tensor Network Decomposition for Remote Sensing Image Inpainting
AUTHORS: Wen-Jie Zheng ; Xi-Le Zhao ; Yu-Bang Zheng ; Zhi-Feng Pang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Besides, we propose an efficient proximal alternating minimization-based algorithm to solve the proposed NL-FCTN decomposition-based model with a theoretical convergence guarantee.
9, TITLE: FaceGuard: Proactive Deepfake Detection
AUTHORS: Yuankun Yang ; Chenyue Liang ; Hongyu He ; Xiaoyu Cao ; Neil Zhenqiang Gong
CATEGORY: cs.CV [cs.CV, cs.CR, cs.LG]
HIGHLIGHT: In this work, we propose FaceGuard, a proactive deepfake-detection framework.
10, TITLE: Graph-Based 3D Multi-Person Pose Estimation Using Multi-View Images
AUTHORS: SIZE WU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: And we propose three task-specific graph neural networks for effective message passing.
11, TITLE: Online Unsupervised Learning of Visual Representations and Categories
AUTHORS: Mengye Ren ; Tyler R. Scott ; Michael L. Iuzzolino ; Michael C. Mozer ; Richard Zemel
CATEGORY: cs.CV [cs.CV, cs.LG, stat.ML]
HIGHLIGHT: In this work, we propose an unsupervised model that simultaneously performs online visual representation learning and few-shot learning of new categories without relying on any class labels.
12, TITLE: UMPNet: Universal Manipulation Policy Network for Articulated Objects
AUTHORS: Zhenjia Xu ; Zhanpeng He ; Shuran Song
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: We introduce the Universal Manipulation Policy Network (UMPNet) -- a single image-based policy network that infers closed-loop action sequences for manipulating arbitrary articulated objects.
13, TITLE: What Happens in Face During A Facial Expression? Using Data Mining Techniques to Analyze Facial Expression Motion Vectors
AUTHORS: MOHAMAD ROSHANZAMIR et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, optical flow algorithm was used to extract deformation or motion vectors created in the face because of facial expressions.
14, TITLE: CANS: Communication Limited Camera Network Self-Configuration for Intelligent Industrial Surveillance
AUTHORS: Jingzheng Tu ; Qimin Xu ; Cailian Chen
CATEGORY: cs.CV [cs.CV, cs.MM]
HIGHLIGHT: In this paper, an adaptive camera network self-configuration method (CANS) of video surveillance is proposed to cope with multiple video streams of heterogeneous quality of service (QoS) demands for edge-enabled IIoT.
15, TITLE: Unsupervised Domain Adaptation for Cross-modality Liver Segmentation Via Joint Adversarial Learning and Self-learning
AUTHORS: Jin Hong ; Simon Chun Ho Yu ; Weitian Chen
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: In this work, we report a novel unsupervised domain adaptation framework for cross-modality liver segmentation via joint adversarial learning and self-learning.
16, TITLE: Shape-Biased Domain Generalization Via Shock Graph Embeddings
AUTHORS: Maruthi Narayanan ; Vickram Rajendran ; Benjamin Kimia
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: The inclusion of a role of shape alleviates these vulnerabilities and some approaches have achieved this by training on negative images, images endowed with edge maps, or images with conflicting shape and texture information.
17, TITLE: Learning to Predict Diverse Human Motions from A Single Image Via Mixture Density Networks
AUTHORS: Chunzhi Gu ; Yan Zhao ; Chao Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel approach to predict future human motions from a much weaker condition, i.e., a single image, with mixture density networks (MDN) modeling.
18, TITLE: Deep Joint Source-Channel Coding for Multi-Task Network
AUTHORS: Mengyang Wang ; Zhicong Zhang ; Jiahui Li ; Mengyao Ma ; Xiaopeng Fan
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this work, we propose an MTL network with a deep joint source-channel coding (JSCC) framework, which allows operating under CI scenarios.
19, TITLE: Conditional MoCoGAN for Zero-Shot Video Generation
AUTHORS: Shun Kimura ; Kazuhiko Kawamoto
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a conditional generative adversarial network (GAN) model for zero-shot video generation.
20, TITLE: Cylindrical and Asymmetrical 3D Convolution Networks for LiDAR-based Perception
AUTHORS: XINGE ZHU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Motivated by this investigation, we propose a new framework for the outdoor LiDAR segmentation, where cylindrical partition and asymmetrical 3D convolution networks are designed to explore the 3D geometric pattern while maintaining these inherent properties.
21, TITLE: Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search
AUTHORS: Jialu Wang ; Yang Liu ; Xin Eric Wang
CATEGORY: cs.CV [cs.CV, cs.CL, I.2.7]
HIGHLIGHT: Therefore, we introduce two novel debiasing approaches: an in-processing fair sampling method to address the gender imbalance issue for training models, and a post-processing feature clipping method base on mutual information to debias multimodal representations of pre-trained models.
22, TITLE: Prioritized Subnet Sampling for Resource-Adaptive Supernet Training
AUTHORS: BOHONG CHEN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose Prioritized Subnet Sampling to train a resource-adaptive supernet, termed PSS-Net.
23, TITLE: Global-Local Dynamic Feature Alignment Network for Person Re-Identification
AUTHORS: ZHANGQIANG MING et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Targeting to solve these problems, we propose a simple and efficient Local Sliding Alignment (LSA) strategy to dynamically align the local features of two images by setting a sliding window on the local stripes of the pedestrian.
24, TITLE: Single-stream CNN with Learnable Architecture for Multi-source Remote Sensing Data
AUTHORS: YI YANG et. al.
CATEGORY: cs.CV [cs.CV, cs.LG, eess.IV]
HIGHLIGHT: In this paper, we propose an efficient and generalizable framework based on deep convolutional neural network (CNN) for multi-source remote sensing data joint classification.
25, TITLE: Sparse MLP for Image Recognition: Is Self-Attention Really Necessary?
AUTHORS: CHUANXIN TANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we explore whether the core self-attention module in Transformer is the key to achieving excellent performance in image recognition.
26, TITLE: Adversarially Trained Object Detector for Unsupervised Domain Adaptation
AUTHORS: Kazuma Fujii ; Hiroshi Kera ; Kazuhiko Kawamoto
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this study, we demonstrate that adversarial training in the source domain can be employed as a new approach for unsupervised domain adaptation.
27, TITLE: Spatial-Separated Curve Rendering Network for Efficient and High-Resolution Image Harmonization
AUTHORS: Jingtang Liang ; Xiaodong Cun ; Chi-Man Pun
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we propose a novel spatial-separated curve rendering network (S2CRNet) for efficient and high-resolution image harmonization for the first time.
28, TITLE: On Pursuit of Designing Multi-modal Transformer for Video Grounding
AUTHORS: Meng Cao ; Long Chen ; Mike Zheng Shou ; Can Zhang ; Yuexian Zou
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: To this end, we reformulate video grounding as a set prediction task and propose a novel end-to-end multi-modal Transformer model, dubbed as \textbf{GTR}.
29, TITLE: Meta Navigator: Search for A Good Adaptation Policy for Few-shot Learning
AUTHORS: CHI ZHANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Inspired by the recent success in Automated Machine Learning literature (AutoML), in this paper, we present Meta Navigator, a framework that attempts to solve the aforementioned limitation in few-shot learning by seeking a higher-level strategy and proffer to automate the selection from various few-shot learning designs.
30, TITLE: Explain Me The Painting: Multi-Topic Knowledgeable Art Description Generation
AUTHORS: Zechen Bai ; Yuta Nakashima ; Noa Garcia
CATEGORY: cs.CV [cs.CV, cs.AI, cs.CL]
HIGHLIGHT: This work presents a framework to bring art closer to people by generating comprehensive descriptions of fine-art paintings.
31, TITLE: ChangeChip: A Reference-Based Unsupervised Change Detection for PCB Defect Detection
AUTHORS: Yehonatan Fridman ; Matan Rusanovsky ; Gal Oren
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we introduce ChangeChip, an automated and integrated change detection system for defect detection in PCBs, from soldering defects to missing or misaligned electronic elements, based on Computer Vision (CV) and UL.
32, TITLE: HCDG: A Hierarchical Consistency Framework for Domain Generalization on Medical Image Segmentation
AUTHORS: Yijun Yang ; Shujun Wang ; Pheng-Ann Heng ; Lequan Yu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present a novel Hierarchical Consistency framework for Domain Generalization (HCDG) by ensembling Extrinsic Consistency and Intrinsic Consistency.
33, TITLE: CDTrans: Cross-domain Transformer for Unsupervised Domain Adaptation
AUTHORS: TONGKUN XU et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: Along with the pseudo labels, a weight-sharing triple-branch transformer framework is proposed to apply self-attention and cross-attention for source/target feature learning and source-target domain alignment, respectively.
34, TITLE: Efficient Tensor Completion Via Element-wise Weighted Low-rank Tensor Train with Overlapping Ket Augmentation
AUTHORS: Yang Zhang ; Yao Wang ; Zhi Han ; Xi'ai Chen ; Yandong Tang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To remedy such two issues, in this work, we propose a novel tensor completion approach via the element-wise weighted technique.
35, TITLE: Pose with Style: Detail-Preserving Pose-Guided Image Synthesis with Conditional StyleGAN
AUTHORS: BADOUR ALBAHAR et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present an algorithm for re-rendering a person from a single image under arbitrary poses.
36, TITLE: Single-stage Keypoint-based Category-level Object Pose Estimation from An RGB Image
AUTHORS: Yunzhi Lin ; Jonathan Tremblay ; Stephen Tyree ; Patricio A. Vela ; Stan Birchfield
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation that operates on unknown object instances within a known category using a single RGB image as input.
37, TITLE: On The Sins of Image Synthesis Loss for Self-supervised Depth Estimation
AUTHORS: ZHAOSHUO LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We show empirically that - contrary to common belief - improvements in image synthesis do not necessitate improvement in depth estimation.
38, TITLE: DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection
AUTHORS: Steven Lang ; Fabrizio Ventola ; Kristian Kersting
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this work, we present DAFNe: A Dense one-stage Anchor-Free deep Network for oriented object detection.
39, TITLE: ADNet: Leveraging Error-Bias Towards Normal Direction in Face Alignment
AUTHORS: Yangyu Huang ; Hao Yang ; Chong Li ; Jongyoo Kim ; Fangyun Wei
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we investigate the error-bias issue in face alignment, where the distributions of landmark errors tend to spread along the tangent line to landmark curves.
40, TITLE: Image Shape Manipulation from A Single Augmented Training Sample
AUTHORS: Yael Vinker ; Eliahu Horwitz ; Nir Zabari ; Yedid Hoshen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present DeepSIM, a generative model for conditional image manipulation based on a single image.
41, TITLE: Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories
AUTHORS: FAIT POMS et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories, where finding relevant examples to evaluate on is particularly challenging.
42, TITLE: Fine-Grained Few Shot Learning with Foreground Object Transformation
AUTHORS: Chaofei Wang ; Shiji Song ; Qisen Yang ; Xiang Li ; Gao Huang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this challenging task, we propose a novel method named foreground object transformation (FOT), which is composed of a foreground object extractor and a posture transformation generator.
43, TITLE: CarNet: A Lightweight and Efficient Encoder-Decoder Architecture for High-quality Road Crack Detection
AUTHORS: Kai Li ; Yingjie Tian ; Zhiquan Qi
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present a lightweight encoder-decoder architecture, CarNet, for efficient and high-quality crack detection.
44, TITLE: Learning to Ground Visual Objects for Visual Dialog
AUTHORS: Feilong Chen ; Xiuyi Chen ; Can Xu ; Daxin Jiang
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: In this paper, we propose a novel approach to Learn to Ground visual objects for visual dialog, which employs a novel visual objects grounding mechanism where both prior and posterior distributions over visual objects are used to facilitate visual objects grounding.
45, TITLE: Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color
AUTHORS: MOSTAFA ABDOU et. al.
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: Pretrained language models have been shown to encode relational information, such as the relations between entities or concepts in knowledge-bases -- (Paris, Capital, France).
46, TITLE: Partially-supervised Novel Object Captioning Leveraging Context from Paired Data
AUTHORS: Shashank Bujimalla ; Mahesh Subedar ; Omesh Tickoo
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: In this paper, we propose an approach to improve image captioning solutions for images with novel objects that do not have caption labels in the training dataset. We create synthetic paired captioning data for these novel objects by leveraging context from existing image-caption pairs.
47, TITLE: U-Net Convolutional Network for Recognition of Vessels and Materials in Chemistry Lab
AUTHORS: Zhihao Shang ; Di Bo
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: U-Net Convolutional Network for Recognition of Vessels and Materials in Chemistry Lab
48, TITLE: Multiresolution Deep Implicit Functions for 3D Shape Representation
AUTHORS: ZHANG CHEN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We introduce Multiresolution Deep Implicit Functions (MDIF), a hierarchical representation that can recover fine geometry detail, while being able to perform global operations such as shape completion.
49, TITLE: Adversarial Bone Length Attack on Action Recognition
AUTHORS: Nariki Tanaka ; Hiroshi Kera ; Kazuhiko Kawamoto
CATEGORY: cs.CV [cs.CV, cs.AI, cs.CR, cs.LG]
HIGHLIGHT: In this paper, we show that adversarial attacks can be performed on skeleton-based action recognition models, even in a significantly low-dimensional setting without any temporal manipulation.
50, TITLE: Variational Disentanglement for Domain Generalization
AUTHORS: Yufei Wang ; Haoliang Li ; Lap-Pui Chau ; Alex C. Kot
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose to tackle the problem of domain generalization by delivering an effective framework named Variational Disentanglement Network (VDN), which is capable of disentangling the domain-specific features and task-specific features, where the task-specific features are expected to be better generalized to unseen but related test data.
51, TITLE: Improving Robustness of Adversarial Attacks Using An Affine-Invariant Gradient Estimator
AUTHORS: Wenzhao Xiang ; Hang Su ; Chang Liu ; Yandong Guo ; Shibao Zheng
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: To address this issue, we propose an affine-invariant adversarial attack which can consistently construct adversarial examples robust over a distribution of affine transformation.
52, TITLE: Leveraging Clinical Characteristics for Improved Deep Learning-Based Kidney Tumor Segmentation on CT
AUTHORS: Christina B. Lund ; Bas H. M. van der Velden
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: Leveraging Clinical Characteristics for Improved Deep Learning-Based Kidney Tumor Segmentation on CT
53, TITLE: MLFW: A Database for Face Recognition on Masked Faces
AUTHORS: Chengrui Wang ; Han Fang ; Yaoyao Zhong ; Weihong Deng
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To figure out the impact of masks on face recognition model, we build a simple but effective tool to generate masked faces from unmasked faces automatically, and construct a new database called Masked LFW (MLFW) based on Cross-Age LFW (CALFW) database.
54, TITLE: Preliminary Wildfire Detection Using State-of-the-art PTZ (Pan, Tilt, Zoom) Camera Technology and Convolutional Neural Networks
AUTHORS: Samarth Shah
CATEGORY: cs.CV [cs.CV, cs.AI, 68T45, I.4.1; I.4.6; I.4.7; E.1; I.2.10]
HIGHLIGHT: The objective of the research is to detect forest fires in their earlier stages to prevent them from spreading, prevent them from causing damage to a variety of things, and most importantly, reduce or eliminate the chances of someone dying from a wildfire. We propose a more representative and evenly distributed data through better settings, lighting, atmospheres, etc., and class distribution in the entire dataset.
55, TITLE: Challenges and Solutions in DeepFakes
AUTHORS: Jatin Sharma ; Sahil Sharma
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: So, to counter this emerging problem, we introduce a dataset of 140k real and fake faces which contain 70k real faces from the Flickr dataset collected by Nvidia, as well as 70k fake faces sampled from 1 million fake faces generated by style GAN.
56, TITLE: A Semi-supervised Self-training Method to Develop Assistive Intelligence for Segmenting Multiclass Bridge Elements from Inspection Videos
AUTHORS: Muhammad Monjurul Karim ; Ruwen Qin ; Zhaozheng Yin ; enda Chen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper is motivated to develop an assistive intelligence model for segmenting multiclass bridge elements from inspection videos captured by an aerial inspection platform.
57, TITLE: Instance-Conditioned GAN
AUTHORS: Arantxa Casanova ; Marl�ne Careil ; Jakob Verbeek ; Michal Drozdzal ; Adriana Romero-Soriano
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we take inspiration from kernel density estimation techniques and introduce a non-parametric approach to modeling distributions of complex datasets.
58, TITLE: Discovering The Unknown Knowns: Turning Implicit Knowledge in The Dataset Into Explicit Training Examples for Visual Question Answering
AUTHORS: Jihyung Kil ; Cheng Zhang ; Dong Xuan ; Wei-Lun Chao
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: In this paper, we take a drastically different approach.
59, TITLE: Weakly Supervised Person Search with Region Siamese Networks
AUTHORS: CHUCHU HAN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present a weakly supervised setting where only bounding box annotations are available.
60, TITLE: BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation
AUTHORS: Naina Dhingra ; Florian Ritter ; Andreas Kunz
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We propose a bidirectional GRU (BiGRU) transformer network (BGT-Net) for the scene graph generation for images.
61, TITLE: Border-SegGCN: Improving Semantic Segmentation By Refining The Border Outline Using Graph Convolutional Network
AUTHORS: Naina Dhingra ; George Chogovadze ; Andreas Kunz
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We present Border-SegGCN, a novel architecture to improve semantic segmentation by refining the border outline using graph convolutional networks (GCN).
62, TITLE: DeepPyram: Enabling Pyramid View and Deformable Pyramid Reception for Semantic Segmentation in Cataract Surgery Videos
AUTHORS: Negin Ghamsarian ; Mario Taschwer ; klaus Schoeffmann
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: This paper proposes a semantic segmentation network termed as DeepPyram that can achieve superior performance in segmenting relevant objects in cataract surgery videos with varying issues.
63, TITLE: Class-Distribution-Aware Calibration for Long-Tailed Visual Recognition
AUTHORS: Mobarakol Islam ; Lalithkumar Seenivasan ; Hongliang Ren ; Ben Glocker
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this study, we propose class-distribution-aware TS (CDA-TS) and LS (CDA-LS) by incorporating class frequency information in model calibration in the context of long-tailed distribution.
64, TITLE: MovieCuts: A New Dataset and Benchmark for Cut Type Recognition
AUTHORS: Alejandro Pardo ; Fabian Caba Heilbron ; Juan Le�n Alc�zar ; Ali Thabet ; Bernard Ghanem
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper introduces the cut type recognition task, which requires modeling of multi-modal information. To ignite research in the new task, we construct a large-scale dataset called MovieCuts, which contains more than 170K videoclips labeled among ten cut types.
65, TITLE: PQ-Transformer: Jointly Parsing 3D Objects and Layouts from Point Clouds
AUTHORS: Xiaoxue Chen ; Hao Zhao ; Guyue Zhou ; Ya-Qin Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Along with the novel quad representation, we propose a tailored physical constraint loss function that discourages object-layout interference.
66, TITLE: SphereFace Revived: Unifying Hyperspherical Face Recognition
AUTHORS: Weiyang Liu ; Yandong Wen ; Bhiksha Raj ; Rita Singh ; Adrian Weller
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In order to address this problem, we introduce a unified framework to understand large angular margin in hyperspherical face recognition.
67, TITLE: Evaluating Computer Vision Techniques for Urban Mobility on Large-Scale, Unconstrained Roads
AUTHORS: Harish Rithish ; Raghava Modhugu ; Ranjith Reddy ; Rohit Saluja ; C. V. Jawahar
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Code, models, and datasets used in this work will be publicly released.
68, TITLE: Learning Statistical Representation with Joint Deep Embedded Clustering
AUTHORS: Mina Rezaei ; Emilio Dorigatti ; David Ruegamer ; Bernd Bischl
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To overcome these limitations, we introduce StatDEC, a new unsupervised framework for joint statistical representation learning and clustering.
69, TITLE: Bornon: Bengali Image Captioning with Transformer-based Deep Learning Approach
AUTHORS: Faisal Muhammad Shah ; Mayeesha Humaira ; Md Abidur Rahman Khan Jim ; Amit Saha Ami ; Shimul Paul
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Image captioning using Encoder-Decoder based approach where CNN is used as the Encoder and sequence generator like RNN as Decoder has proven to be very effective.
70, TITLE: PAT: Pseudo-Adversarial Training For Detecting Adversarial Videos
AUTHORS: Nupur Thakur ; Baoxin Li
CATEGORY: cs.CV [cs.CV, I.4.9; I.5.1]
HIGHLIGHT: In this paper, we propose a novel yet simple algorithm called Pseudo-Adversarial Training (PAT), to detect the adversarial frames in a video without requiring knowledge of the attack.
71, TITLE: Unsupervised Domain Adaptive Learning Via Synthetic Data for Person Re-identification
AUTHORS: Qi Wang ; Sikai Bai ; Junyu Gao ; Yuan Yuan ; Xuelong Li
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we attempt to remedy these problems from two aspects, namely data and methodology.
72, TITLE: RobustART: Benchmarking Robustness on Architecture Design and Training Techniques
AUTHORS: SHIYU TANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Thus, we propose RobustART, the first comprehensive Robustness investigation benchmark on ImageNet (including open-source toolkit, pre-trained model zoo, datasets, and analyses) regarding ARchitecture design (44 human-designed off-the-shelf architectures and 1200+ networks from neural architecture search) and Training techniques (10+ general techniques, e.g., data augmentation) towards diverse noises (adversarial, natural, and system noises).
73, TITLE: Convolutional Hough Matching Networks for Robust and Efficient Visual Correspondence
AUTHORS: Juhong Min ; Seungwook Kim ; Minsu Cho
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work we introduce a Hough transform perspective on convolutional matching and propose an effective geometric matching algorithm, dubbed Convolutional Hough Matching (CHM).
74, TITLE: Contrastive Quantization with Code Memory for Unsupervised Image Retrieval
AUTHORS: Jinpeng Wang ; Ziyun Zeng ; Bin Chen ; Tao Dai ; Shu-Tao Xia
CATEGORY: cs.CV [cs.CV, cs.AI, cs.IR]
HIGHLIGHT: This paper provides a novel solution to unsupervised deep quantization, namely Contrastive Quantization with Code Memory (MeCoQ).
75, TITLE: Evolving Architectures with Gradient Misalignment Toward Low Adversarial Transferability
AUTHORS: Kevin Richard G. Operiano ; Wanchalerm Pora ; Hitoshi Iba ; Hiroshi Kera
CATEGORY: cs.CV [cs.CV, cs.NE]
HIGHLIGHT: In this study, we address this problem from a novel perspective through investigating the contribution of the network architecture to transferability.
76, TITLE: Pyramid Hybrid Pooling Quantization for Efficient Fine-Grained Image Retrieval
AUTHORS: Ziyun Zeng ; Jinpeng Wang ; Bin Chen ; Tao Dai ; Shu-Tao Xia
CATEGORY: cs.CV [cs.CV, cs.AI, cs.IR]
HIGHLIGHT: To improve fine-grained image hashing, we propose Pyramid Hybrid Pooling Quantization (PHPQ).
77, TITLE: Conditional Generation of Synthetic Geospatial Images from Pixel-level and Feature-level Inputs
AUTHORS: Xuerong Xiao ; Swetava Ganguli ; Vipul Pandey
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: Towards this goal, we present a deep conditional generative model, called VAE-Info-cGAN, that combines a Variational Autoencoder (VAE) with a conditional Information Maximizing Generative Adversarial Network (InfoGAN), for synthesizing semantically rich images simultaneously conditioned on a pixel-level condition (PLC) and a macroscopic feature-level condition (FLC).
78, TITLE: Spatial and Semantic Consistency Regularizations for Pedestrian Attribute Recognition
AUTHORS: Jian Jia ; Xiaotang Chen ; Kaiqi Huang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To fully exploit inter-image relations and aggregate human prior in the model learning process, we construct a Spatial and Semantic Consistency (SSC) framework that consists of two complementary regularizations to achieve spatial and semantic consistency for each attribute.
79, TITLE: A Decidability-Based Loss Function
AUTHORS: PEDRO SILVA et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, a loss function based on the decidability index is proposed to improve the quality of embeddings for the verification routine.
80, TITLE: Constructing Phrase-level Semantic Labels to Form Multi-Grained Supervision for Image-Text Retrieval
AUTHORS: ZHIHAO FAN et. al.
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: In this paper, we explore to introduce additional phrase-level supervision for the better identification of mismatched units in the text.
81, TITLE: An Unsupervised Deep-Learning Method for Fingerprint Classification: The CCAE Network and The Hybrid Clustering Strategy
AUTHORS: Yue-Jie Hou ; Zai-Xin Xie ; Jian-Hu ; Yao-Shen ; Chi-Chun Zhou
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: In this paper, we propose a new and efficient unsupervised deep learning method that can extract fingerprint features and classify fingerprint patterns automatically.
82, TITLE: Generating Datasets of 3D Garments with Sewing Patterns
AUTHORS: Maria Korosteleva ; Sung-Hee Lee
CATEGORY: cs.CV [cs.CV, cs.AI, cs.GR, cs.LG]
HIGHLIGHT: To facilitate research in these directions, we propose a method for generating large synthetic datasets of 3D garment designs and their sewing patterns. With this pipeline, we created the first large-scale synthetic dataset of 3D garment models with their sewing patterns.
83, TITLE: Learning Indoor Inverse Rendering with 3D Spatially-Varying Lighting
AUTHORS: Zian Wang ; Jonah Philion ; Sanja Fidler ; Jan Kautz
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we address the problem of jointly estimating albedo, normals, depth and 3D spatially-varying lighting from a single image.
84, TITLE: Mutual Supervision for Dense Object Detection
AUTHORS: Ziteng Gao ; Limin Wang ; Gangshan Wu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we break the convention of the same training samples for these two heads in dense detectors and explore a novel supervisory paradigm, termed as Mutual Supervision (MuSu), to respectively and mutually assign training samples for the classification and regression head to ensure this consistency.
85, TITLE: MSGDD-cGAN: Multi-Scale Gradients Dual Discriminator Conditional Generative Adversarial Network
AUTHORS: Mohammadreza Naderi ; Zahra Nabizadeh ; Nader Karimi ; Shahram Shirani ; Shadrokh Samavi
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: In this work, we propose a method called MSGDD-cGAN, which first stabilizes the performance of the cGANs using multi-connections gradients flow.
86, TITLE: A Self-Supervised Deep Framework for Reference Bony Shape Estimation in Orthognathic Surgical Planning
AUTHORS: DEQIANG XIAO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Therefore, we propose a self-supervised deep framework to automatically estimate reference facial bony shape models.
87, TITLE: Task Guided Compositional Representation Learning for ZDA
AUTHORS: Shuang Liu ; Mete Ozay
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we address learning feature representations which are invariant to and shared among different domains considering task characteristics for ZDA.
88, TITLE: Vision-based System Identification and 3D Keypoint Discovery Using Dynamics Constraints
AUTHORS: Miguel Jaques ; Martin Asenov ; Michael Burke ; Timothy Hospedales
CATEGORY: cs.CV [cs.CV, cs.AI, stat.ML]
HIGHLIGHT: This paper introduces V-SysId, a novel method that enables simultaneous keypoint discovery, 3D system identification, and extrinsic camera calibration from an unlabeled video taken from a static camera, using only the family of equations of motion of the object of interest as weak supervision.
89, TITLE: Check Your Other Door! Establishing Backdoor Attacks in The Frequency Domain
AUTHORS: Hasan Abed Al Kader Hammoud ; Bernard Ghanem
CATEGORY: cs.CR [cs.CR, cs.CV, cs.LG]
HIGHLIGHT: In this work, we propose a complete pipeline for generating a dynamic, efficient, and invisible backdoor attack in the frequency domain.
90, TITLE: DHA: End-to-End Joint Optimization of Data Augmentation Policy, Hyper-parameter and Architecture
AUTHORS: KAICHEN ZHOU et. al.
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: In view of these, we propose DHA, which achieves joint optimization of Data augmentation policy, Hyper-parameter and Architecture.
91, TITLE: The Mathematics of Adversarial Attacks in AI -- Why Deep Learning Is Unstable Despite The Existence of Stable Neural Networks
AUTHORS: Alexander Bastounis ; Anders C Hansen ; Verner Vla?i?
CATEGORY: cs.LG [cs.LG, cs.CV, cs.NA, math.NA, stat.ML]
HIGHLIGHT: Our paper addresses why there has been no solution to the problem, as we prove the following mathematical paradox: any training procedure based on training neural networks for classification problems with a fixed architecture will yield neural networks that are either inaccurate or unstable (if accurate) -- despite the provable existence of both accurate and stable neural networks for the same classification problems.
92, TITLE: BioLCNet: Reward-modulated Locally Connected Spiking Neural Networks
AUTHORS: Hafez Ghaemi ; Erfan Mirzaei ; Mahbod Nouri ; Saeed Reza Kheradpisheh
CATEGORY: cs.NE [cs.NE, cs.CV, cs.LG, q-bio.NC, I.2.6; I.5.1]
HIGHLIGHT: To propose a more biologically plausible solution, we designed a locally connected spiking neural network (SNN) trained using spike-timing-dependent plasticity (STDP) and its reward-modulated variant (R-STDP) learning rules.
93, TITLE: RVMDE: Radar Validated Monocular Depth Estimation for Robotics
AUTHORS: Muhamamd Ishfaq Hussain ; Muhammad Aasim Rafique ; Moongu Jeon
CATEGORY: cs.RO [cs.RO, cs.CV, cs.LG]
HIGHLIGHT: This work explores the utility of coarse signals from radar when fused with fine-grained data from a monocular camera for depth estimation in harsh environmental conditions.
94, TITLE: Towards Robust Monocular Visual Odometry for Flying Robots on Planetary Missions
AUTHORS: MARTIN WUDENKA et. al.
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: In this paper, we present an advanced robust monocular odometry algorithm that uses efficient optical flow tracking to obtain feature correspondences between images and a refined keyframe selection criterion.
95, TITLE: Balancing The Budget: Feature Selection and Tracking for Multi-Camera Visual-Inertial Odometry
AUTHORS: Lintong Zhang ; David Wisth ; Marco Camurri ; Maurice Fallon
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: To overcome these challenges, we introduce two novel methods to improve multi-camera feature tracking.
96, TITLE: A Complex Constrained Total Variation Image Denoising Algorithm with Application to Phase Retrieval
AUTHORS: Yunhui Gao ; Liangcai Cao
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In particular, we introduce two types of complex TV in both isotropic and anisotropic forms.
97, TITLE: Team NeuroPoly: Description of The Pipelines for The MICCAI 2021 MS New Lesions Segmentation Challenge
AUTHORS: Uzay Macar ; Enamundram Naga Karthik ; Charley Gros ; Andr�anne Lemay ; Julien Cohen-Adad
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: This paper gives a detailed description of the pipelines used for the 2nd edition of the MICCAI 2021 Challenge on Multiple Sclerosis Lesion Segmentation.
98, TITLE: Efficient Re-parameterization Residual Attention Network For Nonhomogeneous Image Dehazing
AUTHORS: Tian Ye ; ErKang Chen ; XinRui Huang ; Peng Chen
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: The contribution of this paper mainly has the following three aspects: 1) A novel Multi-branch Attention (MA) block.
99, TITLE: Domain and Content Adaptive Convolution for Domain Generalization in Medical Image Segmentation
AUTHORS: Shishuai Hu ; Zehui Liao ; Jianpeng Zhang ; Yong Xia
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose a multi-source domain generalization model, namely domain and content adaptive convolution (DCAC), for medical image segmentation.
100, TITLE: WeakSTIL: Weak Whole-slide Image Level Stromal Tumor Infiltrating Lymphocyte Scores Are All You Need
AUTHORS: YONI SCHIRRIS et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: We present WeakSTIL, an interpretable two-stage weak label deep learning pipeline for scoring the percentage of stromal tumor infiltrating lymphocytes (sTIL%) in H&E-stained whole-slide images (WSIs) of breast cancer tissue.
101, TITLE: CAN3D: Fast 3D Medical Image Segmentation Via Compact Context Aggregation
AUTHORS: WEI DAI et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: To address these challenges, we present a compact convolutional neural network with a shallow memory footprint to efficiently reduce the number of model parameters required for state-of-art performance.
102, TITLE: Co-Correcting: Noise-tolerant Medical Image Classification Via Mutual Label Correction
AUTHORS: Jiarun Liu ; Ruirui Li ; Chuan Sun
CATEGORY: eess.IV [eess.IV, cs.AI, cs.CV, cs.LG]
HIGHLIGHT: To fill the gap, this paper proposes a noise-tolerant medical image classification framework named Co-Correcting, which significantly improves classification accuracy and obtains more accurate labels through dual-network mutual learning, label probability estimation, and curriculum label correcting.
103, TITLE: Dual-view Snapshot Compressive Imaging Via Optical Flow Aided Recurrent Neural Network
AUTHORS: RUIYING LU et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose an optical flow-aided recurrent neural network for dual video SCI systems, which provides high-quality decoding in seconds.
104, TITLE: Blood Vessel Segmentation in En-face OCTA Images: A Frequency Based Method
AUTHORS: Anna Breger ; Felix Goldbach ; Bianca S. Gerendas ; Ursula Schmidt-Erfurth ; Martin Ehler
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: We present a novel method for the vessel identification based on frequency representations of the image, in particular, using so-called Gabor filter banks.
105, TITLE: IceNet for Interactive Contrast Enhancement
AUTHORS: Keunsoo Ko ; Chang-Su Kim
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: A CNN-based interactive contrast enhancement algorithm, called IceNet, is proposed in this work, which enables a user to adjust image contrast easily according to his or her preference.
106, TITLE: Self Supervised Learning Improves DMMR/MSI Detection from Histology Slides Across Multiple Cancers
AUTHORS: CHARLIE SAILLARD et. al.
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: In this study, we leverage recent advances in self-supervised learning by training neural networks on histology images from the TCGA dataset using MoCo V2.
107, TITLE: Sickle Cell Disease Severity Prediction from Percoll Gradient Images Using Graph Convolutional Networks
AUTHORS: ARIO SADAFI et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: Here, we propose a novel approach combining a graph convolutional network, a convolutional neural network, fast Fourier transform, and recursive feature elimination to predict the severity of SCD directly from a Percoll image.
108, TITLE: A Joint Graph and Image Convolution Network for Automatic Brain Tumor Segmentation
AUTHORS: Camillo Saueressig ; Adam Berkley ; Reshma Munbodh ; Ritambhara Singh
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG, q-bio.TO]
HIGHLIGHT: We present a joint graph convolution-image convolution neural network as our submission to the Brain Tumor Segmentation (BraTS) 2021 challenge.
109, TITLE: Low-Light Image Enhancement with Normalizing Flow
AUTHORS: YUFEI WANG et. al.
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we investigate to model this one-to-many relationship via a proposed normalizing flow model.
110, TITLE: Differential Diagnosis of Frontotemporal Dementia and Alzheimer's Disease Using Generative Adversarial Network
AUTHORS: Ma Da ; Lu Donghuan ; Popuri Karteek ; Beg Mirza Faisal
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this study, a novel framework was proposed by using the Generative Adversarial Network technique to distinguish FTD, AD and normal control subjects, using volumetric features extracted at coarse-to-fine structural scales from Magnetic Resonance Imaging scans.
111, TITLE: Follow The Curve: Robotic-Ultrasound Navigation with Learning Based Localization of Spinous Processes for Scoliosis Assessment
AUTHORS: Maria Victorova ; Michael Ka-Shing Lee ; David Navarro-Alarcon ; Yongping Zheng
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: This paper introduces a robotic-ultrasound approach for spinal curvature tracking and automatic navigation.
112, TITLE: Real-Time EMG Signal Classification Via Recurrent Neural Networks
AUTHORS: Reza Bagherian Azhiri ; Mohammad Esmaeili ; Mehrdad Nourani
CATEGORY: eess.SP [eess.SP, cs.CV, cs.LG, cs.RO]
HIGHLIGHT: In this paper, after extracting features from a hybrid time-frequency domain (discrete Wavelet transform), we utilize a set of recurrent neural network-based architectures to increase the classification accuracy and reduce the prediction delay time.