GCN论文阅读与代码梳理(3)——ASTGCN

GCN论文阅读与代码梳理(3)——ASTGCN_第1张图片

ASTGCN包含三个独立分量,分别模拟交通流量的近期依赖性、日周期性和周周期性。主要贡献有:

  • 通过空间注意力捕捉不同位置之间的空间相关性,通过时间注意力捕捉不同时间之间的时间相关性。

  • 设计了时空卷积模块,包括空间图卷积和时间卷积;

  • 在真实公路交通流量数据集上取得最好的效果。

模型架构如下:

GCN论文阅读与代码梳理(3)——ASTGCN_第2张图片

  • X_h是一段与预测周期直接相邻的历史时间序列

  • X_d是一段过去几天中与预测时间段相同的时间段时间序列

  • X_w是一段过去几周中与预测时间段相同的时间段时间序列

如图:

GCN论文阅读与代码梳理(3)——ASTGCN_第3张图片

本文提出了时空注意力以捕获交通网络中的动态时空相关性。

  • 时空注意力

    • 空间注意力:S={V_s}\cdot {\sigma{((X_h^{(r-1)}W_1)W_2(W_3X_h^{(r-1)})^T+b_s)}},其中节点i与节点j之间的相关程度为S_{i,j}^{'}=\frac{S_{i,j}}{\sum_{j=1}^Nexp{(S_{i,j})}},其中X_h^{(r-1)}=(X_1,X_2,...X_{T_{r-1}}),相应代码如下:

    • class SpatialAttentionLayer(nn.Module):
          """
          compute spatial attention scores
          """
      
          def __init__(self, DEVICE, in_channels, num_of_nodes, num_of_timesteps):
              super(SpatialAttentionLayer, self).__init__()
              self.W1 = nn.Parameter(torch.FloatTensor(num_of_timesteps).to(DEVICE))
              self.W2 = nn.Parameter(torch.FloatTensor(in_channels, num_of_timesteps).to(DEVICE))
              self.W3 = nn.Parameter(torch.FloatTensor(in_channels).to(DEVICE))
              self.bs = nn.Parameter(torch.FloatTensor(1, num_of_nodes, num_of_nodes).to(DEVICE))
              self.Vs = nn.Parameter(torch.FloatTensor(num_of_nodes, num_of_nodes).to(DEVICE))
      
          def forward(self, x):
              """
              :param x: (batch_size, N, F_in, T)
              :return: (B,N,N)
              """
      
              lhs = torch.matmul(torch.matmul(x, self.W1), self.W2)  # (b,N,F,T)(T)->(b,N,F)(F,T)->(b,N,T)
      
              rhs = torch.matmul(self.W3, x).transpose(-1, -2)  # (F)(b,N,F,T)->(b,N,T)->(b,T,N)
      
              product = torch.matmul(lhs, rhs)  # (b,N,T)(b,T,N) -> (B, N, N)
      
              S = torch.matmul(self.Vs, torch.sigmoid(product + self.bs))  # (N,N)(B, N, N)->(B,N,N)
      
              S_normalized = F.softmax(S, dim=1)
      
              return S_normalized

    • 时间注意力:E={V_e}\cdot {\sigma{((X_h^{(r-1)})^TU_1)U_2(U_3X_h^{(r-1)})^T+b_e)}},其中时间i与时间j之间的相关程度为E_{i,j}^{'}=\frac{E_{i,j}}{\sum_{j=1}^{T_{r-1}}exp{(E_{i,j})}},其中X_h^{(r-1)}=(X_1,X_2,...X_{T_{r-1}}),相应代码如下:

    • class TemporalAttentionLayer(nn.Module):
          def __init__(self, DEVICE, in_channels, num_of_nodes, num_of_timesteps):
              super(TemporalAttentionLayer, self).__init__()
              self.U1 = nn.Parameter(torch.FloatTensor(num_of_nodes).to(DEVICE))
              self.U2 = nn.Parameter(torch.FloatTensor(in_channels, num_of_nodes).to(DEVICE))
              self.U3 = nn.Parameter(torch.FloatTensor(in_channels).to(DEVICE))
              self.be = nn.Parameter(torch.FloatTensor(1, num_of_timesteps, num_of_timesteps).to(DEVICE))
              self.Ve = nn.Parameter(torch.FloatTensor(num_of_timesteps, num_of_timesteps).to(DEVICE))
      
              def forward(self, x):
                  """
              :param x: (batch_size, N, F_in, T)
              :return: (B, T, T)
              """
                  _, num_of_nodes, num_of_features, num_of_timesteps = x.shape
      
                  lhs = torch.matmul(torch.matmul(x.permute(0, 3, 2, 1), self.U1), self.U2)
                  # x:(B, N, F_in, T) -> (B, T, F_in, N)
                  # (B, T, F_in, N)(N) -> (B,T,F_in)
                  # (B,T,F_in)(F_in,N)->(B,T,N)
      
                  rhs = torch.matmul(self.U3, x)  # (F)(B,N,F,T)->(B, N, T)
      
                  product = torch.matmul(lhs, rhs)  # (B,T,N)(B,N,T)->(B,T,T)
      
                  E = torch.matmul(self.Ve, torch.sigmoid(product + self.be))  # (B, T, T)
      
                  E_normalized = F.softmax(E, dim=1)
      
                  return E_normalized

在谱图分析中,一个图用它对应的拉普拉斯矩阵表示即L=D-A,其归一化形式为L=I_N-D^{-\frac{1}{2}}AD^{\frac{1}{2}},其中A为邻接矩阵,D为度矩阵且D_{ii}=\sum_jA_{ij}

本文提出的时空卷积模块包括空间维度上的图形卷积(从邻域获取空间依赖性)和时间维度上的卷积(从邻近时间获取时间依赖性)。

  • 时空卷积

    • 谱域图卷积:

    图卷积的操作如下,其中U为正交矩阵,\Lambda为对角矩阵:

    g_{\theta}*Gx=g_{\theta}(L)x=g_{\theta}(U\Lambda U^T)x=Ug_{\theta}(\Lambda)U^Tx

     

    当图过大时,拉普拉斯特征分解复杂度较高,因此采用切比雪夫多项式逼近图卷积核:

    g_{\theta}*Gx=g_{\theta}(L)x=\sum_{k=0}^{K-1}\theta_kT_k(\widetilde{L})x

     

    最后送入ReLU激活函数作为输出:RELU(g_\theta*Gx)

    为了动态调整节点之间的相关性,加入了空间相关性矩阵:g_{\theta}*Gx=g_{\theta}(L)x=\sum_{k=0}^{K-1}\theta_k(T_k(\widetilde{L})\odot S^{'})x,相应代码如下:

    • class ChebConvWithSAt(nn.Module):
          """
          K-order chebyshev graph convolution
          """
      
          def __init__(self, K, cheb_polynomials, in_channels, out_channels):
              """
              :param K: int
              :param in_channles: int, num of channels in the input sequence
              :param out_channels: int, num of channels in the output sequence
              """
              super(ChebConvWithSAt, self).__init__()
              self.K = K
              self.cheb_polynomials = cheb_polynomials
              self.in_channels = in_channels
              self.out_channels = out_channels
              self.DEVICE = cheb_polynomials[0].device
              self.Theta = nn.ParameterList(
                  [nn.Parameter(torch.FloatTensor(in_channels, out_channels).to(self.DEVICE)) for _ in range(K)])
      
          def forward(self, x, spatial_attention):
              """
              Chebyshev graph convolution operation
              :param x: (batch_size, N, F_in, T)
              :return: (batch_size, N, F_out, T)
              """
      
              batch_size, num_of_nodes, in_channels, num_of_timesteps = x.shape
      
              outputs = []
      
              for time_step in range(num_of_timesteps):
      
                  graph_signal = x[:, :, :, time_step]  # (b, N, F_in)
      
                  output = torch.zeros(batch_size, num_of_nodes, self.out_channels).to(self.DEVICE)  # (b, N, F_out)
      
                  for k in range(self.K):
                      T_k = self.cheb_polynomials[k]  # (N,N)
      
                      T_k_with_at = T_k.mul(spatial_attention)  # (N,N)*(N,N) = (N,N) 多行和为1, 按着列进行归一化
      
                      theta_k = self.Theta[k]  # (in_channel, out_channel)
      
                      rhs = T_k_with_at.permute(0, 2, 1).matmul(
                          graph_signal)  # (N, N)(b, N, F_in) = (b, N, F_in) 因为是左乘,所以多行和为1变为多列和为1,即一行之和为1,进行左乘
      
                      output = output + rhs.matmul(theta_k)  # (b, N, F_in)(F_in, F_out) = (b, N, F_out)
      
                  outputs.append(output.unsqueeze(-1))  # (b, N, F_out, 1)
      
              return F.relu(torch.cat(outputs, dim=-1))  # (b, N, F_out, T)

    • 时域图卷积:一个标准的卷积层在时间维度进一步堆叠更新节点的信号通过合并相邻的时间片信息,在第r层进行X_h^{(r)}=ReLU(\Phi*(ReLU(g_\theta*G\hat{X}_h^{(r-1)}))),相应代码如下:

    • # 定义卷积核及参数
      self.time_conv = nn.Conv2d(nb_chev_filter, nb_time_filter, kernel_size=(1, 3), stride=(1, time_strides), padding=(0, 1))
      self.residual_conv = nn.Conv2d(in_channels, nb_time_filter, kernel_size=(1, 1), stride=(1, time_strides))
      self.ln = nn.LayerNorm(nb_time_filter)  # 需要将channel放到最后一个维度上
      
      # 时域图卷积,沿着时间维度卷积,
      # convolution along the time axis
      time_conv_output = self.time_conv(spatial_gcn.permute(0, 2, 1, 3))  # (b,N,F,T)->(b,F,N,T) 用(1,3)的卷积核去做->(b,F,N,T)
      
      # residual shortcut
      x_residual = self.residual_conv(x.permute(0, 2, 1, 3))  # (b,N,F,T)->(b,F,N,T) 用(1,1)的卷积核去做->(b,F,N,T)
      
      x_residual = self.ln(F.relu(x_residual + time_conv_output).permute(0, 3, 2, 1)).permute(0, 2, 3, 1)
      # (b,F,N,T)->(b,T,N,F) -ln-> (b,T,N,F)->(b,N,F,T)

最后,进行多个组件融合:

\hat{Y}=W_h\odot \hat{Y_h}+W_d\odot \hat{Y_d}+W_w\odot \hat{Y_w}

补充说明:

  • 图卷积的核心思想是消息传递,图中的每个节点通过卷积的形式将消息传递给邻居并接收从邻居传递过来的消息。而拉普拉斯矩阵的基本性质有:1、是对称矩阵,可以进行特征分解;2、只在中心顶点和一阶相连的顶点上有非0元素,其余均为0;3、拉普拉斯矩阵和拉普拉斯算子有关,拉普拉斯算子衡量了空间的每一点处,该函数梯度倾向于增加还是减少。

  • 图卷积的操作可以看作处于空域的图信号变换到频域上后对频域属性进行滤波,然后再恢复到原来的图信号所在的空域中从而完成对图信号的降噪与特征提取功能。

你可能感兴趣的:(GCN,深度学习,人工智能)