《MolGAN: An implicit generative model for small molecular graphs》了解

  • 这篇论文研究的是化学分子(molecule),把其作为一个无向图看待;原子(atom)作为对应分子图上的一个节点(node),每个原子关联着一个 T T T维的独热向量 x i x_i xi,表示原子的类别。
  • 形式化一个分子的表示为:属性矩阵 X ∈ R N × T X \in R^{N \times T} XRN×T 和 邻接张量 A ∈ R N × N × Y A \in R^{N \times N \times Y} ARN×N×Y,第 Y Y Y维度用来表示边的类型(如 A i j A_{ij} Aij的类型)。
  • 具体化的表示:
  1. 数据集是QM9 ,description:
    contains 133,885 organic compounds up to 9 heavy atoms:
    carbon ( C), oxygen (O), nitrogen (N) and fluorine (F) 。
  2. N = 9 N=9 N=9 最大节点数目; T = 5 T=5 T=5 原子类型数目,即:(C, O, N, F, and one padding symbol); Y = 4 Y=4 Y=4 边类型数目,即:(single, double, triple and no bond) 。
  3. 但是在实验的时候边的类型跑出来有这5种:
[rdkit.Chem.rdchem.BondType.ZERO,
 rdkit.Chem.rdchem.BondType.SINGLE,
 rdkit.Chem.rdchem.BondType.DOUBLE,
 rdkit.Chem.rdchem.BondType.TRIPLE,
 rdkit.Chem.rdchem.BondType.AROMATIC]

Discriminator

Solver文件里的 build_model 函数中:

# d_conv_dim : [[128, 64], 128, [128, 64]]
# m_dim : self.data.atom_num_types  5
# b_dim : self.data.bond_num_types   4
self.D = Discriminator(self.d_conv_dim, self.m_dim, self.b_dim, self.dropout)
self.D.to(self.device)

train 函数中,实例化 self.D(a_tensor, None, x_tensor) 的同时,调用了 Discriminator中定义的forward 函数:

# a (N, 9, 9) tensor
a = torch.from_numpy(a).to(self.device).long()            # Adjacency.
#  a (N, 9) matrix
x = torch.from_numpy(x).to(self.device).long()            # Nodes.
a_tensor = self.label2onehot(a, self.b_dim)
x_tensor = self.label2onehot(x, self.m_dim)
#  a_tensor : [N,9,9,5]  x_tensor :torch.Size([16, 9, 5])

logits_real, features_real = self.D(a_tensor, None, x_tensor)

model文件中的定义,初始化过程中初始了GraphConvolutionGraphAggregation 层:

class Discriminator(nn.Module):
    """Discriminator network with PatchGAN."""
    def __init__(self, conv_dim, m_dim, b_dim, dropout):
        super(Discriminator, self).__init__()
        # conv_dim : [[128, 64], 128, [128, 64]]
        # m_dim : 5 原子类型
        # b_dim : 5 边类型
        graph_conv_dim, aux_dim, linear_dim = conv_dim
        # (5, [128,64], 5, 0)
        self.gcn_layer = GraphConvolution(m_dim, graph_conv_dim, b_dim, dropout)
        # (64,128,5,0)
        self.agg_layer = GraphAggregation(graph_conv_dim[-1], aux_dim, b_dim, dropout)

        # multi dense layer
        layers = []
        for c0, c1 in zip([aux_dim]+linear_dim[:-1], linear_dim):
            layers.append(nn.Linear(c0,c1))
            layers.append(nn.Dropout(dropout))
        self.linear_layer = nn.Sequential(*layers)

        self.output_layer = nn.Linear(linear_dim[-1], 1)

    def forward(self, adj, hidden, node, activatation=None):
        # adj: a (N, 9, 9, 5) tensor
        # node: a (N, 9, 5) matrix

        adj = adj[:,:,:,1:].permute(0,3,1,2) # adj:torch.Size([16, 4, 9, 9])

        annotations = torch.cat((hidden, node), -1) if hidden is not None else node
        # h: 16x9x64
        h = self.gcn_layer(annotations, adj)
        # 16x9x69
        annotations = torch.cat((h, hidden, node) if hidden is not None\
                                 else (h, node), -1)
	# 16x9x128
        h = self.agg_layer(annotations, torch.tanh)
        # h: 16x9x64
        h = self.linear_layer(h)

        # Need to implemente batch discriminator #
        ##########################################
	# 16x9x1
        output = self.output_layer(h)
        output = activatation(output) if activatation is not None else output

        return output, h

由于Discriminator的forward 中有

h = self.gcn_layer(annotations, adj)  

__init__中定义了:

self.gcn_layer = GraphConvolution(m_dim, graph_conv_dim, b_dim, dropout)

从而调用layers文件中的GraphConvolution类的forward函数:
目的是下面的公式:目的是不同的边类型都有一个adj(属性矩阵X)的变换矩阵与之相乘,之后逐元素相加,避免过多的模型参数,但是输出的时候
在这里插入图片描述
RGCN中是这样定义的,其中W就是adj(属性矩阵X)的变换矩阵
在这里插入图片描述

但是这个graphConvolution并没有后tanh激活的样子?

# 不同边类型的矩阵乘以同一属性矩阵的变换,然后相同位置求和;再加上属性矩阵的变换;
# 重复上述操作变换为其他维度矩阵
class GraphConvolution(Module):
    # (5, [128,64], 4, 0)
    # self.gcn_layer = GraphConvolution(m_dim, graph_conv_dim, b_dim, dropout)
    def __init__(self, in_features, out_feature_list, b_dim, dropout):
        super(GraphConvolution, self).__init__()
        self.in_features = in_features
        self.out_feature_list = out_feature_list
        # 9x128
        self.linear1 = nn.Linear(in_features, out_feature_list[0])
        # 128x64
        self.linear2 = nn.Linear(out_feature_list[0], out_feature_list[1])

        self.dropout = nn.Dropout(dropout)
    # h = self.gcn_layer(annotations, adj)
    # adj:torch.Size([16, 4, 9, 9])
    # annotations equal to node: a (16, 9, 5) matrix
    def forward(self, input, adj, activation=None):
        # hidden: 16x9x5在第二维度叠加4次,是 (16,4,9,128)
        hidden = torch.stack([self.linear1(input) for _ in range(adj.size(1))], 1)
        # 16x4x9x128
        hidden = torch.einsum('bijk,bikl->bijl', (adj, hidden))
        # 16x9x128 + 16x9x128
        hidden = torch.sum(hidden, 1) + self.linear1(input)
        hidden = activation(hidden) if activation is not None else hidden
        # 激活之后再做dropout操作
        hidden = self.dropout(hidden)
        # 16x4x9x64
        output = torch.stack([self.linear2(hidden) for _ in range(adj.size(1))], 1)
        # 16x4x9x64
        output = torch.einsum('bijk,bikl->bijl', (adj, output))
        # 16x9x64
        output = torch.sum(output, 1) + self.linear2(hidden)
        output = activation(output) if activation is not None else output
        output = self.dropout(output)

        return output

同样,在GCN调用后,修改annotations矩阵:

# 16x9x69
annotations = torch.cat((h, hidden, node) if hidden is not None\
                         else (h, node), -1)

然后调用aggre

h = self.agg_layer(annotations, torch.tanh)

# (64,128,5,0)
self.agg_layer = GraphAggregation(graph_conv_dim[-1], aux_dim, b_dim, dropout)

这样之后返回的output是 16x9x128,即每一个原子类型和边类型都一个128维度的embedding了。
对应agg的公式,i,j都是转换函数,最后激活之后逐个元素点乘, σ \sigma σ 是sigmoid函数;
最后输出的时候做一个tanh激活:
总的目的:对隐层邻接边矩阵乘以属性矩阵的变换 h v h_v hv 与 属性矩阵 x v x_v xv本身做拼接然后变换为一个数值代表整个图。
在这里插入图片描述

class GraphAggregation(Module):
    # (64,128,5,0)
    def __init__(self, in_features, out_features, b_dim, dropout):
        super(GraphAggregation, self).__init__()
        self.sigmoid_linear = nn.Sequential(nn.Linear(in_features+b_dim, out_features),
                                            nn.Sigmoid())
        self.tanh_linear = nn.Sequential(nn.Linear(in_features+b_dim, out_features),
                                         nn.Tanh())
        self.dropout = nn.Dropout(dropout)

    def forward(self, input, activation):
        i = self.sigmoid_linear(input)
        j = self.tanh_linear(input)
        # i的每个元素乘以j中的每一个元素
        # output : Nx128
        output = torch.sum(torch.mul(i,j), 1)
        output = activation(output) if activation is not None\
                 else output
        output = self.dropout(output)

        return output

副一个图,点乘这个输出的意义是什么不太清楚,
在这里插入图片描述

返回

# Compute loss with real images.
#  logits_real :16x9x1
#  features_real :16x9x64
logits_real, features_real = self.D(a_tensor, None, x_tensor)
d_loss_real = - torch.mean(logits_real)

Discriminator的更新:

  • 引入梯度惩罚作为对1-Lipschitz连续性的另一种软约束,作为对原始WGAN梯度裁剪方案的改进:生成器的损失更新不变,判别器的变为:
    在这里插入图片描述

恩。。。

你可能感兴趣的:(神经网络)