GCN\GAT研究概述

文章目录

  • 数据集
    • 官方数据集cora
      • THE DIRECTORY CONTAINS TWO FILES
  • mini-batch思想
  • GCN
    • 层与层之间的传播方式
  • GAT
    • attention

数据集

官方数据集cora

  • The Cora dataset consists of Machine Learning papers.

  • These papers are classified into one of the following seven classes:

    • Case_Based
    • Genetic_Algorithms
    • Neural_Networks
    • Probabilistic_Methods
    • Reinforcement_Learning
    • Rule_Learning
    • Theory
  • The papers were selected in a way such that in the final corpus every paper cites or is cited by atleast one other paper.

  • There are 2708 papers in the whole corpus. => 该数据集共2708个样本点

  • Vocabulary: After stemming and removing stopwords we were left with a vocabulary of size 1433 unique words (All words with document frequency less than 10 were removed).

THE DIRECTORY CONTAINS TWO FILES

  • .content file: contains descriptions of the papers in the following format:

       + 
    
    • The first entry in each line contains the unique string ID of the paper followed by binary values indicating whether each word in the vocabulary is present (indicated by 1) or absent (indicated by 0) in the paper.
  • 共有2708行,每一行代表一个样本点,即一篇论文。

  • 每篇论文都由一个1433维的词向量表示,所以,每个样本点具有1433个特征。

    • 词向量的每个元素都对应一个词,且该元素只有0或1两个取值。

    • 取0表示该元素对应的词不在论文中,取1表示在论文中。

    • 所有的词来源于具有1433个词的字典。

    • Finally, the last entry in the line contains the class label of the paper.

  • .cites file: contains the citation graph of the corpus. Each line describes a link in the following format:

       
    
    • 共5429行,Each line contains two paper IDs.
    • The first entry is the ID of the paper being cited and the second ID stands for the paper which contains the citation.
    • The direction of the link is from right to left. If a line is represented by “paper1 paper2” then the link is “paper2->paper1”.
  • 每篇论文都至少引用了一篇其他论文,或者被其他论文引用,也就是样本点之间存在联系,没有任何一个样本点与其他样本点完全没联系。如果将样本点看做图中的点,则这是一个连通的图,不存在孤立点

  • 如果将论文看做图中的点,那么这5429行便是点之间的5429条边。

https://blog.csdn.net/yeziand01/java/article/details/93374216

mini-batch思想

  • 通过创建一个稀疏的块对角矩阵来实现并行化操作
  • 并在节点的维度将节点特征矩阵和target矩阵连接起来。这种方式使得比较容易地在不同的batch中进行操作

GCN\GAT研究概述_第1张图片GCN\GAT研究概述_第2张图片

  • 资料:
    • https://zhuanlan.zhihu.com/p/78452993

GCN

层与层之间的传播方式

在这里插入图片描述

GAT

attention

  • 对于节点3,它的邻接节点只有节点2和节点4,但不代表这两个节点对节点3具有一样的重要性。这个“重要性”可以进行量化,更可以通过网络训练得出。这个“重要性”,在文中叫attention,可以通过训练得到。这便是GAT的核心创新点了。

GCN\GAT研究概述_第3张图片
GCN\GAT研究概述_第4张图片

  • 资料:
    • https://zhuanlan.zhihu.com/p/99927545

你可能感兴趣的:(GNN)