该代码为《SEMI-SUPERVISED CLASSIFICATION WITH GRAPH CONVOLUTIONAL NETWORKS》 一文中作者公开的代码,代码网址为https://github.com/tkipf/gcn。
"""
ind.dataset_str.x => the feature vectors of the training instances as scipy.sparse.csr.csr_matrix object;
ind.dataset_str.tx => the feature vectors of the test instances as scipy.sparse.csr.csr_matrix object;
ind.dataset_str.allx => the feature vectors of both labeled and unlabeled training instances
(a superset of ind.dataset_str.x) as scipy.sparse.csr.csr_matrix object;
ind.dataset_str.y => the one-hot labels of the labeled training instances as numpy.ndarray object;
ind.dataset_str.ty => the one-hot labels of the test instances as numpy.ndarray object;
ind.dataset_str.ally => the labels for instances in ind.dataset_str.allx as numpy.ndarray object;
ind.dataset_str.graph => a dict in the format {index: [index_of_neighbor_nodes]} as collections.defaultdict
object;
ind.dataset_str.test.index => the indices of test instances in graph, for the inductive setting as list object.
"""
在\gcn-master\gcn\models.py中,def load_data(dataset_str)进行数据加载:
使用Cora数据集,通过pickle模块将数据反序列化得到变量x, y, tx, ty, allx, ally, graph:
"""
ind.cora.x => x.shape:(140,1433)
ind.cora.tx => tx.shape:(1000,1433)
ind.cora.allx => allx.shape:(1708,1433)
ind.cora.y => y.shape:(140,7)id
ind.cora.ty => ty.shape:(1000,7)
ind.cora.ally => ally.shape:(1708,7)
ind.cora.graph => graph:{键值:list},键值从0到2707表示每个节点id,list记录表示与该节点相连的节点id
ind.cora.test.index => test_idx_reorder.shape:(1000,),记录测试节点id
"""
将allx, tx拼接,ally, ty拼接,得到Cora数据集所有节点的特征和标签:
features = sp.vstack((allx, tx)).tolil()#features.shape:(2708,1433)节点特征
labels = np.vstack((ally, ty))#labels.shape:(2708,7)节点标签
adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))#adj.shape(2708,2708)邻接矩阵
生成训练/验证/测试掩膜以及分割训练/验证/测试数据:
取[0,140)的节点作为训练集,[140,640)的节点作为验证集,[1708,2707]的节点作为测试集。
idx_test = test_idx_range.tolist()
idx_train = range(len(y))
idx_val = range(len(y), len(y)+500)
train_mask = sample_mask(idx_train, labels.shape[0])
val_mask = sample_mask(idx_val, labels.shape[0])
test_mask = sample_mask(idx_test, labels.shape[0])
y_train = np.zeros(labels.shape)
y_val = np.zeros(labels.shape)
y_test = np.zeros(labels.shape)
#训练/验证/测试集中不包含的节点特征的值为零
y_train[train_mask, :] = labels[train_mask, :]
y_val[val_mask, :] = labels[val_mask, :]
y_test[test_mask, :] = labels[test_mask, :]
在\gcn-master\gcn\utils.py中,preprocess_adj(adj)将邻接矩阵 a d j adj adj归一化:
对应论文中的 D ~ − 1 2 A ~ D ~ − 1 2 \tilde D^{-\frac{1}{2}}\tilde A\tilde D^{-\frac{1}{2}} D~−21A~D~−21部分,( A = ~ A + I N A\tilde = A+I_{N} A=~A+IN, D ~ i i = ∑ j A ~ i j \tilde D_{ii}=\sum_{j}\tilde A_{ij} D~ii=∑jA~ij)
def normalize_adj(adj):
"""Symmetrically normalize adjacency matrix."""
adj = sp.coo_matrix(adj)
rowsum = np.array(adj.sum(1))
d_inv_sqrt = np.power(rowsum, -0.5).flatten()
d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
d_mat_inv_sqrt = sp.diags(d_inv_sqrt)
return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo()
def preprocess_adj(adj):
"""Preprocessing of adjacency matrix for simple GCN model and conversion to tuple representation."""
adj_normalized = normalize_adj(adj + sp.eye(adj.shape[0]))
return sparse_to_tuple(adj_normalized)
在\gcn-master\gcn\layers.py的 GraphConvolution(Layer)类中定义了图卷积层,在初始化GraphConvolution类时,self.vars[‘weights_’ + str(i)]以及self.vars[‘bias’]是该图卷积层创建的可学习权重和偏置。相当于论文中 Θ ∈ R C × F \Theta \in R^{C×F} Θ∈RC×F参数。
with tf.variable_scope(self.name + '_vars'):
for i in range(len(self.support)):
self.vars['weights_' + str(i)] = glorot([input_dim, output_dim],
name='weights_' + str(i))
if self.bias:
self.vars['bias'] = zeros([output_dim], name='bias')
当对GraphConvolution类使用def _call_(self, inputs),子类GraphConvolution继承了父类Layer的def _call_(self, inputs),在执行def _call_(self, inputs)时,会调用 def _call(self, inputs)方法,但是子类GraphConvolution类重新定义了def _call(self, inputs),将再次跑到子类定义的def call(self, inputs)方法中。
在def _call(self, inputs)方法中,注释为” # convolve“部分为论文中的图卷积公式:
Z = ( D ~ − 1 2 A ~ D ~ − 1 2 ) X Θ Z=(\tilde D^{-\frac{1}{2}}\tilde A\tilde D^{-\frac{1}{2}})X\Theta Z=(D~−21A~D~−21)XΘ
其中, p r e _ s u p = X Θ , s e l f . s u p p o r t [ i ] = D ~ − 1 2 A ~ D ~ − 1 2 , s u p p o r t = ( D ~ − 1 2 A ~ D ~ − 1 2 ) X Θ pre\_sup=X\Theta,self.support[i]=\tilde D^{-\frac{1}{2}}\tilde A\tilde D^{-\frac{1}{2}},support=(\tilde D^{-\frac{1}{2}}\tilde A\tilde D^{-\frac{1}{2}})X\Theta pre_sup=XΘ,self.support[i]=D~−21A~D~−21,support=(D~−21A~D~−21)XΘ。
# convolve
supports = list()
for i in range(len(self.support)):
if not self.featureless:
pre_sup = dot(x, self.vars['weights_' + str(i)],
sparse=self.sparse_inputs)
else:
pre_sup = self.vars['weights_' + str(i)]
support = dot(self.support[i], pre_sup, sparse=True)
supports.append(support)
output = tf.add_n(supports)
在\gcn-master\gcn\models.py的GCN(Model)类中,定义了实验的网络结构,其中关于网络结构部分的属性为self.layers,是一个list格式数据,在 def _build(self)类方法中实现,可以看出结构由两层图卷积组成,需要注意的是act参数,结合第4节中GraphConvolution可以知道,act参数是对卷积后的结果进行操作,第一层卷积后经过relu()函数后再输出(act=tf.nn.relu),第二层卷积后直接输出(act=lambda x: x)。
def _build(self):
self.layers.append(GraphConvolution(input_dim=self.input_dim,
output_dim=FLAGS.hidden1,
placeholders=self.placeholders,
act=tf.nn.relu,
dropout=True,
sparse_inputs=True,
logging=self.logging))
self.layers.append(GraphConvolution(input_dim=FLAGS.hidden1,
output_dim=self.output_dim,
placeholders=self.placeholders,
act=lambda x: x,
dropout=True,
logging=self.logging))
在\gcn-master\gcn\models.py中定义了Model(object)类,作为GCN(Model)的父类。在初始化子类GCN(Model)时将调用存在于父类Model(object)类的def build(self),构建前向传播、权值更新的计算过程。
def build(self):
""" Wrapper for _build() """
# 创建两层图卷积序列,类似于keras中的sequence部分。
with tf.variable_scope(self.name):
self._build()
#将输入送入构建好的两层图卷积中,得到输出self.outputs。
self.activations.append(self.inputs)
for layer in self.layers:
hidden = layer(self.activations[-1])
self.activations.append(hidden)
self.outputs = self.activations[-1]
# Store model variables for easy access
variables = tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope=self.name)
self.vars = {var.name: var for var in variables}
# 计算损失和准确率
self._loss()
self._accuracy()
#权重优化
self.opt_op = self.optimizer.minimize(self.loss)
结构表示为论文中:
s e l f . o u t p u t s = f ( s e l f . i n p u t s , A ) = A ^ R e L U ( A ^ ( s e l f . i n p u t s ) W ( 0 ) ) W ( 1 ) self.outputs=f(self.inputs,A)=\hat A\ ReLU(\hat A(self.inputs)W^{(0)})W^{(1)} self.outputs=f(self.inputs,A)=A^ ReLU(A^(self.inputs)W(0))W(1)
此时并没有对self.outputs进行softmax,softmax部分是在计算交叉熵损失的时候进行的。在计算交叉熵损失的时候会将self.outputs先进行 s o f t ( ∗ ) soft(*) soft(∗)计算,在进行交叉熵的计算,和pytorch中的交叉熵计算方法类似,所以在定义网络结构时没有加上 s o f t m a x ( ∗ ) softmax(*) softmax(∗)。交叉熵损失定义在\gcn-master\gcn\metrics.py中的masked_softmax_cross_entropy(preds, labels, mask)函数中。
def masked_softmax_cross_entropy(preds, labels, mask):
"""Softmax cross-entropy loss with masking."""
loss = tf.nn.softmax_cross_entropy_with_logits(logits=preds, labels=labels)
mask = tf.cast(mask, dtype=tf.float32)
mask /= tf.reduce_mean(mask)
loss *= mask
return tf.reduce_mean(loss)