tensorflow 2 学习笔记:以GCN代码为例,学习模型训练流程实现方法

目录

  • 模块
    • Layer
      • 功能
      • 输入
      • 输出
      • 代码
    • Model
      • 功能
      • 输入
      • 输出
      • 代码
    • Train
      • 功能
      • 代码
    • Config
      • 功能
      • 代码
    • Utils
      • 功能
      • 代码

本文是GCN(Semi-Supervised Classification with Graph Convolutional Networks, by Thomas N. Kipf)的tensorflow 2 实现的学习笔记。本文的目的是通过梳理作者实现GCN的思路,学习使用TF2搭建训练DL模型。为了使思路清晰,本文仅涉及和GCN模型直接相关的代码,不涉及baseline的代码。部分代码有微小改动。本文代码来源为 https://github.com/dragen1860/GCN-TF2。此代码是代码作者在论文作者提供的代码(https://github.com/tkipf/gcn)的基础上修改的。

模块

GCN代码可分为五个模块:Layer, Model, Train, Utils(utilities), Config(configuration)。

Layer

Layer模块负责定义实验所需的神经网络层。每种层都继承自基类tf.keras.layer.Layer

功能

  1. 定义层的参数 ,包括权重(w, b),输入输出大小,dropout率,激活函数等;
  2. 定义层的运算,包括前向传播过程(线性变换、activation)、dropout等;
  3. 初始化权重,调用初始化方法(由Util模块定义)初始化层的权重

输入

  1. 用来定义层的参数 ,包括输入输出大小,激活函数,输入参数的性质(是否为稀疏矩阵,是否具有特征等);
  2. 用来计算的参数,包括特征/activation、经过处理的邻接矩阵等

输出

  1. Activation ,即激活函数的输出

代码

from    utils import *
import  tensorflow as tf
from    tensorflow import keras
from    tensorflow.keras import layers
from    config import args


class GraphConvolution(layers.Layer):
    """
    图卷积层
    """
    def __init__(self, input_dim, output_dim, num_features_nonzero,
                 dropout=0.,
                 is_sparse_inputs=False,
                 activation=tf.nn.relu, # tf.nn: Wrappers for primitive Neural Net (NN) Operations.
                 bias=False,
                 featureless=False, **kwargs):
        super(GraphConvolution, self).__init__(**kwargs) #super()函数被用来调用父类函数;# **kwargs被用来接收不定数量的形如“variable = value”的参数
		#定义参数
        self.dropout = dropout 
        self.activation = activation
        self.is_sparse_inputs = is_sparse_inputs
        self.featureless = featureless
        self.bias = bias
        self.num_features_nonzero = num_features_nonzero

        self.weights_ = []
        for i in range(1):
            w = self.add_variable('weight' + str(i), [input_dim, output_dim])
            self.weights_.append(w)
        if self.bias:
            self.bias = self.add_variable('bias', [output_dim])


    def call(self, inputs, training=None):
    	# supports 是经过处理的邻接矩阵,邻接矩阵处理是由Util模块负责的
        x, support_ = inputs

        # dropout
        if training is not False and self.is_sparse_inputs:
            x = sparse_dropout(x, self.dropout, self.num_features_nonzero)
        elif training is not False:
            x = tf.nn.dropout(x, self.dropout)


        # 卷积
        supports = list()
        for i in range(len(support_)):
            if not self.featureless: # if it has features x
                pre_sup = dot(x, self.weights_[i], sparse=self.is_sparse_inputs)
            else:
                pre_sup = self.weights_[i]

            support = dot(support_[i], pre_sup, sparse=True)
            supports.append(support)

        output = tf.add_n(supports)

        # bias
        if self.bias:
            output += self.bias
        
		# activation
        return self.activation(output)

Model

组装神经网络层形成模型。继承自基类tf.keras.Model

功能

  1. 定义模型的参数 ,包括输入输出大小,神经网络层等;
  2. 定义神经网络层及其关系,包括确定神经网络层参数(输入输出,激活函数、dropout率等),层的顺序,层之间的运算处理等;
  3. 前向传播,包括调用层进行计算,softmax等;
  4. 评估,包括计算损失和精度(损失函数和精度函数由Utils模块定义)

输入

  1. 用来定义模型的参数 ,包括输入输出大小,输入参数的性质(是否为稀疏矩阵,是否具有特征等);
  2. 用来计算的参数,包括训练/验证/测试集(x, y),经过处理的邻接矩阵等

输出

  1. 预测结果
  2. 评估,即损失函数值和精度值

代码

import  tensorflow as tf
from    tensorflow import keras
from    layers import *
from    metrics import *
from    config import args 

class GCN(keras.Model):

    def __init__(self, input_dim, output_dim, num_features_nonzero, **kwargs):
        super(GCN, self).__init__(**kwargs)

        self.input_dim = input_dim 
        self.output_dim = output_dim

        print('input dim:', input_dim)
        print('output dim:', output_dim)
        print('num_features_nonzero:', num_features_nonzero)

        self.layers_ = []
        self.layers_.append(GraphConvolution(input_dim=self.input_dim, 
                                            output_dim=args.hidden1, 
                                            num_features_nonzero=num_features_nonzero,
                                            activation=tf.nn.relu,
                                            dropout=args.dropout,
                                            is_sparse_inputs=True))

        self.layers_.append(GraphConvolution(input_dim=args.hidden1, 
                                            output_dim=self.output_dim, 
                                            num_features_nonzero=num_features_nonzero,
                                            activation=lambda x: x,
                                            dropout=args.dropout))

        for p in self.trainable_variables:
            print(p.name, p.shape)

    def call(self, inputs, training=None):
        """
        :param inputs: x, y, 处理过的邻接矩阵等
        :param training: 是否是训练模式
        :return: 损失函数值和精度值
        """
        x, label, mask, support = inputs

        outputs = [x]
        
		#前向传播
        for layer in self.layers:
            hidden = layer((outputs[-1], support), training)
            outputs.append(hidden)
        output = outputs[-1]

        # # Weight decay loss
        loss = tf.zeros([])
        for var in self.layers_[0].trainable_variables:
            loss += args.weight_decay * tf.nn.l2_loss(var)

        # 计算损失函数和精度
        loss += masked_softmax_cross_entropy(output, label, mask)

        acc = masked_accuracy(output, label, mask)

        return loss, acc

    # 预测,计算softmax
    def predict(self):
        return tf.nn.softmax(self.outputs)

Train

调用其他模块,训练、测试模型的脚本。

功能

  1. 加载数据集 ,包括训练集/验证集/测试集的 x, y,经过处理的邻接矩阵;
  2. 加载全局变量,包括使用的模型/数据集的代号、优化器(optimizer)及其参数(learning rate)、epoch数、dropout率等;
  3. 训练模型,包括前向传播调用、计算梯度、梯度下降、模型验证、指标可视化等;
  4. 测试,包括使用测试集运行和评估模型

代码

import time
import tensorflow as tf
from tensorflow.keras import optimizers
from utils import *
from models import GCN, MLP
from config import args

import  os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# set random seed
seed = 123
np.random.seed(seed)
tf.random.set_seed(seed)

# load data
adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask = load_data(args.dataset)
print('adj:', adj.shape)
print('features:', features.shape)
print('y:', y_train.shape, y_val.shape, y_test.shape)
print('mask:', train_mask.shape, val_mask.shape, test_mask.shape)

# D^-1@X
features = preprocess_features(features) # [49216, 2], [49216], [2708, 1433]
print('features coordinates::', features[0].shape)
print('features data::', features[1].shape)
print('features shape::', features[2])

support = [preprocess_adj(adj)]
num_supports = 1
model_func = GCN

# Create model
model = GCN(input_dim=features[2][1], output_dim=y_train.shape[1], num_features_nonzero=features[1].shape) # [1433]


train_label = tf.convert_to_tensor(y_train)
train_mask = tf.convert_to_tensor(train_mask)
val_label = tf.convert_to_tensor(y_val)
val_mask = tf.convert_to_tensor(val_mask)
test_label = tf.convert_to_tensor(y_test)
test_mask = tf.convert_to_tensor(test_mask)
features = tf.SparseTensor(*features)
support = [tf.cast(tf.SparseTensor(*support[0]), dtype=tf.float32)]
num_features_nonzero = features.values.shape
dropout = args.dropout
optimizer = optimizers.Adam(lr=1e-2)


for epoch in range(args.epochs):

    with tf.GradientTape() as tape:
        loss, acc = model((features, train_label, train_mask,support))
    grads = tape.gradient(loss, model.trainable_variables)
    optimizer.apply_gradients(zip(grads, model.trainable_variables))

    _, val_acc = model((features, val_label, val_mask, support), training=False)

    if epoch % 20 == 0:

        print(epoch, float(loss), float(acc), '\tval:', float(val_acc))

test_loss, test_acc = model((features, test_label, test_mask, support), training=False)
print('\ttest:', float(test_loss), float(test_acc))

Config

使用一个arcgparse.ArgumentParser对象设置全局变量。

功能

  1. 设置全局变量 ,包括使用的数据集代码、使用的模型代码、学习率、dropout率等,在训练时被导入Train模块。

代码

import  argparse

args = argparse.ArgumentParser()
args.add_argument('--dataset', default='cora')
args.add_argument('--model', default='gcn')
args.add_argument('--learning_rate', default=0.01)
args.add_argument('--epochs', default=200)
args.add_argument('--hidden1', default=16)
args.add_argument('--dropout', default=0.5)
args.add_argument('--weight_decay', default=5e-4)
args.add_argument('--early_stopping', default=10)
args.add_argument('--max_degree', default=3)

args = args.parse_args()
print(args)

Utils

定义Helper Function,被别的模块使用。

功能

  1. 加载数据,使用pickle从文本构建数据集对象、邻接矩阵对象和训练集/测试集/验证集(其实没有,但是可以有)索引对象;
  2. 数据预处理,包括feature归一化,邻接矩阵格式转换,邻接矩阵变换、数据清洗等;
  3. 权重初始化,实现了多种权重初始化方法;
  4. 其他,一些零碎的工作

代码

import  numpy as np
import  pickle as pkl
import  networkx as nx
import  scipy.sparse as sp
from    scipy.sparse.linalg.eigen.arpack import eigsh
import  sys

# pickle 是python自带的模块。它可以将对象序列化,即以二进制文本的形式将内存中的对象写入一个文件,从而永久保存;也可以反向序列化对象,即从保存对象的文本文件读取并构造对象
# networkx 网络数据结构操作、分析模块

def parse_index_file(filename):
    """
    Parse index file.
    """
    index = []
    # os对象是可迭代的。每次迭代返回一行文本。
    for line in open(filename):
        # str.strip(): 移除字符串头尾的多余字符。参数为要移除的字符的列表。省略参数则移除空格
        index.append(int(line.strip()))
    return index


def sample_mask(idx, l):
    """
    Create mask.
    """
    mask = np.zeros(l)
    mask[idx] = 1
    return np.array(mask, dtype=np.bool)


def load_data(dataset_str):
    """
    Loads input data from gcn/data directory

    ind.dataset_str.x => the feature vectors of the training instances as scipy.sparse.csr.csr_matrix object;
    ind.dataset_str.tx => the feature vectors of the test instances as scipy.sparse.csr.csr_matrix object;
    ind.dataset_str.allx => the feature vectors of both labeled and unlabeled training instances
        (a superset of ind.dataset_str.x) as scipy.sparse.csr.csr_matrix object;
    ind.dataset_str.y => the one-hot labels of the labeled training instances as numpy.ndarray object;
    ind.dataset_str.ty => the one-hot labels of the test instances as numpy.ndarray object;
    ind.dataset_str.ally => the labels for instances in ind.dataset_str.allx as numpy.ndarray object;
    ind.dataset_str.graph => a dict in the format {index: [index_of_neighbor_nodes]} as collections.defaultdict
        object;
    ind.dataset_str.test.index => the indices of test instances in graph, for the inductive setting as list object.

    All objects above must be saved using python pickle module.

    :param dataset_str: Dataset name
    :return: All data input files loaded (as well the training/test data).
    """
    names = ['x', 'y', 'tx', 'ty', 'allx', 'ally', 'graph']
    objects = []
    for i in range(len(names)):
        with open("data/ind.{}.{}".format(dataset_str, names[i]), 'rb') as f:
            if sys.version_info > (3, 0):
                # 使用pickle模块加载数据集对象
                objects.append(pkl.load(f, encoding='latin1'))
            else:
                objects.append(pkl.load(f))

    x, y, tx, ty, allx, ally, graph = tuple(objects)
    test_idx_reorder = parse_index_file("data/ind.{}.test.index".format(dataset_str))
    test_idx_range = np.sort(test_idx_reorder)

    if dataset_str == 'citeseer':
        # Fix citeseer dataset (there are some isolated nodes in the graph)
        # Find isolated nodes, add them as zero-vecs into the right position
        test_idx_range_full = range(min(test_idx_reorder), max(test_idx_reorder)+1)
        tx_extended = sp.lil_matrix((len(test_idx_range_full), x.shape[1]))
        tx_extended[test_idx_range-min(test_idx_range), :] = tx
        tx = tx_extended
        ty_extended = np.zeros((len(test_idx_range_full), y.shape[1]))
        ty_extended[test_idx_range-min(test_idx_range), :] = ty
        ty = ty_extended

    # features 的类型为sp的sparse matrix
    features = sp.vstack((allx, tx)).tolil()
    features[test_idx_reorder, :] = features[test_idx_range, :]

    # adj 的类型为nx的sparse matrix
    adj = nx.adjacency_matrix(nx.from_dict_of_lists(graph))

    labels = np.vstack((ally, ty))
    labels[test_idx_reorder, :] = labels[test_idx_range, :]

    idx_test = test_idx_range.tolist()
    idx_train = range(len(y))
    idx_val = range(len(y), len(y)+500)

    train_mask = sample_mask(idx_train, labels.shape[0])
    val_mask = sample_mask(idx_val, labels.shape[0])
    test_mask = sample_mask(idx_test, labels.shape[0])

    y_train = np.zeros(labels.shape)
    y_val = np.zeros(labels.shape)
    y_test = np.zeros(labels.shape)
    y_train[train_mask, :] = labels[train_mask, :]
    y_val[val_mask, :] = labels[val_mask, :]
    y_test[test_mask, :] = labels[test_mask, :]

    return adj, features, y_train, y_val, y_test, train_mask, val_mask, test_mask


def sparse_to_tuple(sparse_mx):
    """
    Convert sparse matrix to tuple representation.
    """
    def to_tuple(mx):
        if not sp.isspmatrix_coo(mx):
            mx = mx.tocoo()
        coords = np.vstack((mx.row, mx.col)).transpose()
        values = mx.data
        shape = mx.shape
        return coords, values, shape

    if isinstance(sparse_mx, list):
        for i in range(len(sparse_mx)):
            sparse_mx[i] = to_tuple(sparse_mx[i])
    else:
        sparse_mx = to_tuple(sparse_mx)

    return sparse_mx


def preprocess_features(features):
    """
    按行对feature归一化
    """
    rowsum = np.array(features.sum(1)) # 按行求和, [2708, 1]
    r_inv = np.power(rowsum, -1).flatten() # 取倒数, [2708]
    r_inv[np.isinf(r_inv)] = 0. # 极小值转化为0
    r_mat_inv = sp.diags(r_inv) # 构造由按行求和的倒数构成的对角矩阵, [2708, 2708]
    features = r_mat_inv.dot(features) # 对角矩阵和x相乘,完成归一化 D^-1:[2708, 2708]@X:[2708, num_features]
    return sparse_to_tuple(features) # 将稀疏矩阵转化成tuple表示方法,([],[],[])


def normalize_adj(adj):
    """按照论文中的方法(D^-0.5@A@D^0.5)处理邻接矩阵"""
    adj = sp.coo_matrix(adj)
    rowsum = np.array(adj.sum(1)) # D
    d_inv_sqrt = np.power(rowsum, -0.5).flatten() # D^-0.5
    d_inv_sqrt[np.isinf(d_inv_sqrt)] = 0.
    d_mat_inv_sqrt = sp.diags(d_inv_sqrt) # D^-0.5
    return adj.dot(d_mat_inv_sqrt).transpose().dot(d_mat_inv_sqrt).tocoo() # D^-0.5AD^0.5


def preprocess_adj(adj):
    """按照论文中的方法(D^-0.5@A@D^0.5)处理邻接矩阵."""
    adj_normalized = normalize_adj(adj + sp.eye(adj.shape[0]))
    return sparse_to_tuple(adj_normalized)


def chebyshev_polynomials(adj, k):
    """
    Calculate Chebyshev polynomials up to order k. Return a list of sparse matrices (tuple representation).
    """
    print("Calculating Chebyshev polynomials up to order {}...".format(k))

    adj_normalized = normalize_adj(adj)
    laplacian = sp.eye(adj.shape[0]) - adj_normalized
    largest_eigval, _ = eigsh(laplacian, 1, which='LM')
    scaled_laplacian = (2. / largest_eigval[0]) * laplacian - sp.eye(adj.shape[0])

    t_k = list()
    t_k.append(sp.eye(adj.shape[0]))
    t_k.append(scaled_laplacian)

    def chebyshev_recurrence(t_k_minus_one, t_k_minus_two, scaled_lap):
        s_lap = sp.csr_matrix(scaled_lap, copy=True)
        return 2 * s_lap.dot(t_k_minus_one) - t_k_minus_two

    for i in range(2, k+1):
        t_k.append(chebyshev_recurrence(t_k[-1], t_k[-2], scaled_laplacian))

    return sparse_to_tuple(t_k)

def uniform(shape, scale=0.05, name=None):
    """Uniform init."""
    initial = tf.random.uniform(shape, minval=-scale, maxval=scale, dtype=tf.float32)
    return tf.Variable(initial, name=name)


def glorot(shape, name=None):
    """Glorot & Bengio (AISTATS 2010) init."""
    init_range = np.sqrt(6.0/(shape[0]+shape[1]))
    initial = tf.random.uniform(shape, minval=-init_range, maxval=init_range, dtype=tf.float32)
    return tf.Variable(initial, name=name)


def zeros(shape, name=None):
    """All zeros."""
    initial = tf.zeros(shape, dtype=tf.float32)
    return tf.Variable(initial, name=name)


def ones(shape, name=None):
    """All ones."""
    initial = tf.ones(shape, dtype=tf.float32)
    return tf.Variable(initial, name=name)

你可能感兴趣的:(tensorflow,tensorflow)