论文阅读笔记-A Survey on Graph Neural Networks and Graph Transformers in Computer Vision(GNN综述)

论文阅读笔记-GNN综述

主要介绍了GNN以及它在各个领域的应用

2D NATURAL IMAGES

Image Classification

Multi-Label Classification

ML-GCN:builds a directed graph on the basis of label space, where each node stands for a object label (word embeddings) and their connections model the inter-dependencies of different labels.

attention-driven GCN:model the label dependencies via more elaborate GNN architectures

hypergraph neural networks:model the label dependencies via more elaborate GNN architectures

Few-Shot Learning

论文名称 来源 主要思想
Few-shot learning with graph neural networks ICLR,2018 formulate FSL as a supervised interpolation problem on a densely-connected graph, where the vertices stand for images in the collection and the adjacency is learnable with trainable similarity kernels.
Learning to propagate labels: Transductive propagation network for few-shot learning ICLR,2019 constructs graphs on the top of embedding space to fully exploit the manifold structure of the novel classes.Label information is propagated from the support set to the query set based on the constructed graphs
dge-labeling graph neural network for few-shot learning CVPR,2019 propose a edge-labeling GNN framework that learns to predict edge labels, explicitly constraining the intra- and inter-class similarities.
Learning from the past: Continual meta-learning via bayesian graph modeling AAAI,2020 formulate meta-learning-based FSL as continual learning of a sequence of tasks and resort to Bayesian GNN to capture the intra- and inter-task correlations.
Dpgn: Distribution propagation graph network for few-shot learning CVPR,2020 devise a dual complete graph network to model both distribution- and instance-level relations.
Hierarchical graph neural networks for few-shot learning TCSVT,2021 exploit the hierarchical relationships among graph nodes via the bottom-up and top-down reasoning modules.
Hybrid graph neural networks for few-shot learning AAAI,2022 introduce an instance GNN and a prototype GNN as feature embedding task adaptation modules for quickly adapting learned features to new tasks.

Zero-Shot Learning (ZSL)

论文名称 来源 主要思想
Rethinking knowledge graph propagation for zero-shot learning CVPR,2019 propose a Dense Graph Propagation (DGP) module to exploit the hierarchical structure of knowledge graph.It consists of two phases to iteratively propagate knowledge between a node and its ancestors and descendants.
Region graph embedding network for zero-shot learning ECCV,2020 represent each input image as a region graph, where each node stands for an attended region in the image and the edges are appearance similarities among these region nodes.
Attribute propagation network for graph zero-shot learning AAAI,2020 generates and updates attribute vectors with an attribute propagation network for optimizing the attribute space
Isometric propagation network for generalized zero-shot learning ICLR,2021 introduce the visual and semantic prototype propagation on auto-generated graphs to enhance the inter-class relations and align the corresponding classwise dependencies in visual and semantic space
Learning graph embeddings for open world compositional zero-shot learning TPAMI, 2022 introducing a Compositional Cosine Graph Embedding (Co-CGE) model to learn the relationship between primitives and compositions through a GCN.They quantitatively measure the feasibility scores of a state-object composition and incorporate the computed scores into CoCGE in two ways
Gndan: Graph navigated dual attention network for zero-shot learning IEEE TNNLS, 2022 resort to GAT for exploiting the appearance relations between local regions and the cooperation between local and global features.

Transfer Learning

论文名称 来源 主要思想
Gcan: Graph convolutional adversarial network for unsupervised domain adaptation CVPR,2019 propose a Graph Convolutional Adversarial Network (GCAN) for DA, where a GCN is developed on top of densely-connected instance graphs to encode data structure information.
Heterogeneous graph attention network for unsupervised multiple-target domain adaptation IEEE TPAMI, 2020 build a heterogeneous relation graph and introduce GAT to propagate the semantic information and generate reliable pseudo-labels.
Curriculum graph co-teaching for multi-target domain adaptation CVPR,2021 introduce a GCN to aggregate information from different domains along with a co-teaching and curriculum learning strategy to achieve progressive adaptation.
Progressive graph learning for open-set domain adaptation ICML,2020 study the problem of open-set DA via a progressive graph learning framework to select pseudo-labels and thus avoid the negative transfer.
Prototype-matching graph network for heterogeneous domain adaptation ACMMM 2020 attain cross-domain prototype alignment based on features learned from different stages of GNNs.
Learning to combine: Knowledge aggregation for multi-source domain adaptation ECCV. Springer, 2020. introduce a knowledge graph based on the prototypes of different domains to perform information propagation among semantically adjacent representations.
Compound domain generalization via meta-knowledge encoding CVPR,2022 build global prototypical relation graphs and introduce a graph self-attention mechanism

当前工作重点

Current work focuses on extracting adhoc knowledge graphs from the data for a certain task, which is heuristic and relies on the human prior

未来的方向

(1)develop general and automatic graph construction procedures,

(2)enhance the interactions between abstract graph structures and task-specific classifiers

(3)excavate more fine-grained building blocks (node and edge) to increase the capability of constructed graphs.

Object Detection

论文名称 来源 主要思想
Reasoning-rcnn: Unifying adaptive global reasoning into large-scale object detection CVPR,2019 presents an adaptive global reasoning network for large-scale object detection by incorporating commonsense knowledge (category-wise knowledge graph) and propagating visual information globally
Spatial-aware graph relation network for large-scale object detection CVPR,2019 adaptively discover semantic and spatial relationships without requiring prior handcrafted linguistic knowledge
Relation networks for object detection CVPR,2018 introduces an adapted attention module to detection head networks, explicitly learning information between objects through encoding the longrange dependencies.
Relationnet++: Bridging visual representations for object detection via transformer decoder NeurIPS,2020 presents a selfattention-based decoder module to embrace the strengths of different object/part representations within a single detection framework.
Gar: Graph assisted reasoning for object detection WACV,2020 introduce a heterogeneous graph to jointly model object-object and object-scene relations.
Graphfpn: Graph feature pyramid network for object detection ICCV,2021 propose a graph feature pyramid network (GraphFPN), which explores the contextual and hierarchical structures of an input image based on a superpixel hierarchy
Relation matters: Foreground-aware graph-based relational reasoning for domain adaptive object detection IEEE TPAMI,2022 first builds intra- and inter-domain relation graphs in virtue of cyclic between-domain consistency without any prior knowledge about the target distribution.
Sigma: Semantic-complete graph matching for domain adaptive object detection ICCV,2021 formulates DAOD as a graph matching problem by establishing cross-image graphs to model classconditional distributions on both domains
Semantic relation reasoning for shot-stable few-shot object detection CVPR,2022 introduces a semantic relation reasoning module to integrate semantic information between base and novel classes for novel object detection

说明:domain adaptive object detection (DAOD)

当前的工作重点

exploit between-object, cross-scale or cross-domain relationships, as well as relationships between base and novel classes

未来的方向

(1)design better region-to-node feature mapping methods,

(2)incorporate Transformer (or pure GNN) encoders to improve the expressive power of initial node features

(3)directly perform reasoning in the original feature space to better preserve the intrinsic structure of images.

Image Segmentation

一般的分割

论文题目 来源 主要思想
Dual graph convolutional network for semantic segmentation BMVC,2019 targets on modeling the global context of input features via a dual GCN framework where a coordinate space GCN models spatial relationships between pixels in the image, and a feature space GCN models dependencies along the channel dimensions of the network’s feature map.
Graph-based global reasoning networks CVPR,2019 design the global reasoning unit by projecting features that are globally aggregated in coordinate space to node domain and performing relational reasoning in a fullyconnected graph.
Dynamic graph message passing networks CVPR,2020 dynamically samples the neighborhood of a node and then predicts the node dependencies, filter weights, and affinity matrix to attain information propagation
Representative graph neural network ECCV,2020 propose to dynamically sample some representative nodes for relational modeling.
Spatial pyramid based graph reasoning for semantic segmentation CVPR,2020 propose an improved Laplacian formulation that enables graph reasoning in the original feature space, fully exploiting the contextual relations at different feature scales.
Class-wise dynamic graph convolution for semantic segmentation ECCV,2020 introduce a classwise dynamic graph convolution module to conduct graph reasoning over the pixels that belong to the same class
Bidirectional graph reasoning network for panoptic segmentation CVPR,2020 design a bidirectional graph reasoning network to bridge the things branch and the stuff branch for panoptic segmentation.

One-Shot Semantic Segmentation

论文题目 来源 主要思想
Pyramid graph networks with connection attentions for region-based oneshot semantic segmentation ICCV,2019 introduce a pyramid graph attention module to model the connection between query and support feature maps

Few-Shot Semantic Segmentation

论文题目 来源 主要思想
Scale-aware graph neural network for few-shot semantic segmentation CVPR,2021 propose a scale-aware GNN to perform crossscale relational reasoning among support-query images. A self-node collaboration mechanism is introduced to perceive different resolutions of the same object.

Weakly Supervised Semantic Segmentation

论文题目 来源 主要思想
Affinity attention graph neural network for weakly supervised semantic segmentation IEEE,TPAMI 2021 an image will first be converted to a weighted graph via an affinity CNN network, and then an affinity attention layer is devised to obtain long-range interactions from the constructed graph and propagate semantic information to the unlabeled pixels

当前的工作重点

explore contextual information in the localor global-level with pyramid pooling, dilated convolutions, or the self-attention mechanism

Scene Graph Generation (SGG)

任务概述:检测图像中的对象对及其关系以生成可视化的场景图的任务,它提供了对视觉场景的高级理解,而不是孤立地处理单个对象

论文题目 来源 主要思想
Factorizable net: an efficient subgraph-based framework for scene graph generation ECCV,2018 a subgraph-based approach (each subgraph is regarded as a node), has a spatially weighted message passing structure to refine the features of objects and subgroups by passing messages among them with attention-like schemes
Graph r-cnn for scene graph generation ECCV,2018 first obtain a sparse candidate graph by pruning the densely-connected graph generated from RPN via a relation proposal network, then an attentional GCN is introduced to aggregate contextual information and update node features and edge relationships
Attentive relational networks for mapping images to scene graphs CVPR,2019 propose attentive relational networks, which first transform label word embeddings and visual features into a shared semantic space, and then rely on GAT to perform feature aggregation for final relation inference
Bipartite graph network with adaptive message passing for unbiased scene graph generation CVPR,2021 introduce bipartite GNN to estimate and propagate relation confidence in a multi-stage manner.
Energy-based learning for scene graph generation CVPR,2021 propose an energybased framework, which depends on graph message passing algorithm for computing the energy of configurations.

VIDEO UNDERSTANDING

Video Action Recognition

任务介绍:视频人体动作识别是视频处理和理解的基本任务之一,其目的是识别和分类RGB/深度视频或骨架数据中的人体动作。

Action Recognition

论文题目 来源 主要思想
propose to capture the long-range temporal contexts via graph-based reasoning over human-object and object-object relationships
construct actor-centric object-level graph and applying GCNs to capture the contexts among objects in a actor-centric way.A relation-level graph is built to inference the contexts in relation nodes
propose multi-scale reasoning in the temporal graph of a video, in which each node is a frame in the video, and the pairwise relations between nodes are represented as a learnable adjacent matrix
extend the GCN-based relation modeling to zero-shot action recognition and leverage knowledge graphs to model the relations among actions and attributes jointly
introduce a graph-based high-order relation modeling method for long-term action recognition.

Skeleton-Based Action Recognition.

论文题目 来源 主要思想
propose a STGCN network first connects joints in a frame according to the natural connectivity in the human body and then connects the same joints in two consecutive frames to maintain temporal information.
introduce a fully-connected graph with learnable edge weights between joints and a data-dependent graph learned from the input skeleton.
connect physically-apart skeleton joints to captures the patterns of collaborative moving joints
improves the joints’ connection in a single frame by adding edges between limbs and head.it uses GCNs to capture joints’ relations in single frames and adopt the LSTM to capture the temporal dynamics.
introduce to maintain edge features and learn both node and edge feature representations via directed graph convolution.
first construct multiple dilated windows over temporal dimension.Then separately utilize GCNs on multiple graphs with different scales.Finally aggregate the results of GCNs on all the graphs in multiple windows to capture multi-scale and long-range dependencies.

Temporal Action Localization

你可能感兴趣的:(论文阅读,论文阅读,计算机视觉,人工智能)