诸神缄默不语

cs224w（图机器学习）2021冬季课程学习笔记18 Colab 4：异质图

诸神缄默不语-个人CSDN博文目录
cs224w（图机器学习）2021冬季课程学习笔记集合

VX号“PolarisRisingWar”可直接搜索添加作者好友讨论。

更新日志：
2021.11.16 优化排版

文章目录

Question 1. DeepSNAP异质图简介
- 1.1 Question 1.1：分配Node Type and Node Features
- 1.2 Question 1.2：分配Edge Types
- 1.3 NetworkX异质图可视化
- 1.4 将NetworkX异质图转换为DeepSNAP异质图
- 1.5 Question1.3：每一node type有多少个节点
- 1.6 Question 1.4：每一message type有多少条边
- 1.7 Question 1.5：数据集划分：每一个split中有多少个节点？
- 1.8 DeepSNAP数据集可视化
2. 异质图节点预测任务
- 2.1 导包
- 2.2 Heterogeneous GNN Layer
- 2.3 Heterogeneous GNN Wrapper Layer
- 2.4 初始化Heterogeneous GNN Layers
- 2.5 HeteroGNN
- 2.6 构建 `train()` 和 `test()` 函数
- 2.7 设置超参
- 2.8 数据集导入及预处理
- 2.9 Training the Mean Aggregation
- 2.10 Training the Attention Aggregation
- 2.11 Attention for each Message Type
3. 其他正文及脚注未提及的参考资料

这个colab对我来说实在是太难了，我基本上就是直接抄的。勉强算是有所理解吧。我反正是会啥写啥了。
非常欢迎点评指摘。

colab 4 文件原始下载地址

我将写完的colab 4文件发到了GitHub上，有一些个人做笔记的内容，地址：cs224w-2021-winter-colab/CS224W_Colab_4.ipynb at master · PolarisRisingWar/cs224w-2021-winter-colab

本colab主要实现：
对异质图heterogeneous graphs（有不同类的节点和边）的处理，实现heterogenous message passing，即在不同种类的节点和边之间实现不同种类的信息传递。
本colab主要使用DeepSNAP类对异质图进行操作。¹
DeepSNAP官方文档：DeepSNAP Documentation — DeepSNAP 0.2.0 documentation
DeepSNAP官方GitHub项目：snap-stanford/deepsnap: Python library assists deep learning on graphs

Question 1. DeepSNAP异质图简介

表示异质图所需的图属性：

node_feature: 节点特征The feature of each node (torch.tensor)
edge_feature: 边特征The feautre of each edge (torch.tensor)
node_label: 节点标签The label of each node (int)
node_type: 节点类型The node type of each node (string)
edge_type: 边类型The edge type of each edge (string)

在question 1部分，我们将使用图数据集karate club network作为示例。对该数据的介绍可参考我之前写的笔记：图数据集Zachary‘s karate club network详解，包括其在NetworkX、PyG上的获取和应用方式_诸神缄默不语的博客-CSDN博客

首先获取图数据，并按照其不同的类别（指所属club的不同）实现可视化：

from pylab import *
import networkx as nx
from networkx.algorithms.community import greedy_modularity_communities
import matplotlib.pyplot as plt
import copy

G = nx.karate_club_graph()
community_map = {}  #key是节点索引，value是所属community的索引（0或1）
for node in G.nodes(data=True):
  #node第一个元素是索引，第二个元素是相关数据，如在本例中就是{'club': 'Mr. Hi'}
  #默认data=False，就只输出索引
  if node[1]["club"] == "Mr. Hi":
    community_map[node[0]] = 0
  else:
    community_map[node[0]] = 1
node_color = []
color_map = {0: 0, 1: 1}
node_color = [color_map[community_map[node]] for node in G.nodes()]
pos = nx.spring_layout(G)  #见下文介绍
plt.figure(figsize=(7, 7))
nx.draw(G, pos=pos, cmap=plt.get_cmap('coolwarm'), node_color=node_color)
show()

关于 nx.spring_layout(G)：这个是一个用来排布节点的函数，可以美化图可视化图像。
函数文档见：networkx.drawing.layout.spring_layout — NetworkX 2.6.2 documentation
大致功能是输入图数据等参数，返回以节点索引为key、节点对应的坐标为value的dict，dict元素示例：0: array([ 0.42143337, -0.10723518])
排布算法为Fruchterman-Reingold force-directed algorithm²，大致是模拟这样的逻辑：将边视为使所连接节点靠近的弹簧，而节点彼此之间有斥力，模拟演化到平衡状态时的布局。
这个的返回值可以置入 nx.draw() 的入参 pos 中，就让所绘制的图节点按这个字典的坐标来布局。

1.1 Question 1.1：分配Node Type and Node Features

用字典 community_map 和图 G 向 G 中增加 node_type 和 node_label 属性：对属于 “Mr. Hi” 俱乐部的节点赋 n0 为 node type、0 为 node label，对属于 “Officer” 俱乐部的节点赋 n1 为 node type、1为 node label。
给所有节点赋特征 [1, 1, 1, 1, 1]。

参考的NetworkX函数 nx.classes.function.set_node_attributes 文档：networkx.classes.function.set_node_attributes — NetworkX 2.6.2 documentation

函数使用示例：

G_eg = nx.path_graph(3)
bb = nx.betweenness_centrality(G)  #bb是一个字典

nx.set_node_attributes(G_eg, bb, "betweenness")
G_eg.nodes[1]["betweenness"]

0.053936688311688304

问题答案代码：

import torch

def assign_node_types(G, community_map):
  """
  输入NetworkX图G和community map（将节点映射到0/1标签的字典）
  在G中增加node_type这一节点属性
  """
  
  new_cm={}
  for (k,v) in community_map.items():
    if v==0:
      new_cm[k]='n0'
    else:
      new_cm[k]='n1'
  #我参考的答案里另一种比较优雅的写法：
  #node_type_map = {0:'n0', 1:'n1'}
  #node_types = {node:node_type_map[community_map[node]] for node in G.nodes()}
  nx.set_node_attributes(G,new_cm,'node_type')

def assign_node_labels(G, community_map):
  """
  输入NetworkX图G和community map（将节点映射到0/1标签的字典）
  在G中增加node_label这一节点属性
  """
  
  nx.set_node_attributes(G,community_map,'node_label')

def assign_node_features(G):
  """
  输入NetworkX图G
  在G中增加node_feature这一节点属性
  """
  
  feature_vector=[1, 1, 1, 1, 1]
  nx.set_node_attributes(G,feature_vector,'node_feature')



assign_node_types(G, community_map)
assign_node_labels(G, community_map)
assign_node_features(G)

验证函数效果的代码：

for n in G.nodes(data=True):
    print(n)
    break

(0, {‘club’: ‘Mr. Hi’, ‘node_type’: ‘n0’, ‘node_label’: 0, ‘node_feature’: [1, 1, 1, 1, 1]})

1.2 Question 1.2：分配Edge Types

分配标准：

Edges within club “Mr. Hi”: e0
Edges within club “Officer”: e1
Edges between clubs: e2

参考的NetworkX函数 nx.classes.function.set_edge_attributes 文档：networkx.classes.function.set_edge_attributes — NetworkX 2.6.2 documentation

问题答案代码：

def assign_edge_types(G, community_map):
  """
  输入NetworkX图G和community map（将节点映射到0/1标签的字典）
  在G中增加edge_type这一边属性
  """

  #注：我觉得题目原来的意思是让用community_map赋值的，但用club属性应该也无所谓……
  edge2attr_map={}
  for edge in G.edges():
    if G.nodes[edge[0]]['club']=='Mr. Hi' and G.nodes[edge[1]]['club']=='Mr. Hi':
      edge2attr_map[edge]='e0'
    elif G.nodes[edge[0]]['club']=='Officer' and G.nodes[edge[1]]['club']=='Officer':
      edge2attr_map[edge]='e1'
    else:
      edge2attr_map[edge]='e2'
  nx.set_edge_attributes(G,edge2attr_map,'edge_type')


  
assign_edge_types(G, community_map)

验证函数效果的代码：

#PRW
for edge in G.edges(data=True):
    print(edge)
    break

(0, 1, {‘edge_type’: ‘e0’})

1.3 NetworkX异质图可视化

edge_color = {}
for edge in G.edges():
  n1, n2 = edge
  if community_map[n1] == community_map[n2] and community_map[n1] == 0:
    edge_color[edge] = 'blue'
  elif community_map[n1] == community_map[n2] and community_map[n1] == 1:
    edge_color[edge] = 'red'
  else:
    edge_color[edge] = 'green'

G_orig = copy.deepcopy(G)
nx.classes.function.set_edge_attributes(G, edge_color, name='color')
colors = nx.get_edge_attributes(G,'color').values()
labels = nx.get_node_attributes(G, 'node_type')
plt.figure(figsize=(8, 8))
nx.draw(G, pos=pos, cmap=plt.get_cmap('coolwarm'), node_color=node_color, edge_color=colors, labels=labels, font_color='white')
show()

1.4 将NetworkX异质图转换为DeepSNAP异质图

from deepsnap.hetero_graph import HeteroGraph

hete = HeteroGraph(G_orig)

呃注意这部分代码有点难伺候，如果用 G 作为NetworkX backend，就会报 TypeError: Unknown type color in edge attributes. 这个错。
我看了一下对应的源代码：deepsnap.hetero_graph — DeepSNAP 0.2.0 documentation，就发现事情是这样的：

G_orig 的节点属性：

G_orig.nodes(data=True)[0]

输出：

{'club': 'Mr. Hi',
 'node_type': 'n0',
 'node_label': 0,
 'node_feature': [1, 1, 1, 1, 1]}

G_orig 的边属性：

for e in G_orig.edges(data=True):
    print(e)
    break

输出：

(0, 1, {'edge_type': 'e0'})

G 的边属性：

for e in G.edges(data=True):
    print(e)
    break

输出：

(0, 1, {'edge_type': 'e0', 'color': 'blue'})

DeepSNAP中对应的代码：

def _get_edge_attributes(self, key: str):
    r"""
    Similar to the `_get_node_attributes`
    """
    attributes = {}
    indices = None
    # TODO: suspect edge_to_tensor_mapping and edge_to_graph_mapping not useful
    if key == "edge_type":
        indices = {}
    for edge_idx, (head, tail, edge_dict) in enumerate(
        self.G.edges(data=True)
    ):
        if key in edge_dict:
            head_type = self.G.nodes[head]["node_type"]
            tail_type = self.G.nodes[tail]["node_type"]
            edge_type = self._get_edge_type(edge_dict)
            message_type = (head_type, edge_type, tail_type)
            if message_type not in attributes:
                attributes[message_type] = []
            attributes[message_type].append(edge_dict[key])
            if indices is not None:
                if message_type not in indices:
                    indices[message_type] = []
                indices[message_type].append(edge_idx)

    if len(attributes) == 0:
        return None

    for message_type, val in attributes.items():
        if torch.is_tensor(attributes[message_type][0]):
            attributes[message_type] = torch.stack(val, dim=0)
        elif isinstance(attributes[message_type][0], float):
            attributes[message_type] = torch.tensor(val, dtype=torch.float)
        elif isinstance(attributes[message_type][0], int):
            attributes[message_type] = torch.tensor(val, dtype=torch.long)
        elif (
            isinstance(attributes[message_type][0], str)
            and key == "edge_type"
        ):
            continue
        else:
            raise TypeError(f"Unknown type {key} in edge attributes.")

总之简单来说就是除了edge_type之外，边属性都不能是str格式。所以color这个属性就会报错。
但这样我们就很容易产生质疑，那节点属性里面的 club 又是怎么回事呢？然后我简单看了一下 _get_node_attributes() 这个函数，发现反正它没有边属性的那种限制……
我不确定是作者写这玩意时候没整明白，还是我妹整明白，我暂时也懒得问了。如果以后需要用DeepSNAP再去研究。
总之有这么个情况，在此说明。

可以打印出异质图的属性看一下：

for hetero_feature in hete:
    print(hetero_feature)

输出略

1.5 Question1.3：每一node type有多少个节点

hete的note_type属性是一个字典，key为node_type值（如 n0），如果key是str则value为类似这样的list：['n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0', 'n0']；如果key是int则value为Tensor。

def get_nodes_per_type(hete):
  num_nodes_n0=len(hete.node_type['n0'])
  num_nodes_n1=len(hete.node_type['n1'])

  return num_nodes_n0, num_nodes_n1

num_nodes_n0, num_nodes_n1 = get_nodes_per_type(hete)
print("Node type n0 has {} nodes".format(num_nodes_n0))
print("Node type n1 has {} nodes".format(num_nodes_n1))

输出：

Node type n0 has 17 nodes
Node type n1 has 17 nodes

1.6 Question 1.4：每一message type有多少条边

message type是node type和edge type的结合体。

hete.message_types

输出：

[('n0', 'e0', 'n0'), ('n0', 'e2', 'n1'), ('n1', 'e1', 'n1')]

edge_type是键为message_type值的字典，某一元素示例：

hete.edge_type[('n0', 'e0', 'n0')]

输出是一个元素全为 'e0' 的列表，具体略

问题答案代码：

def get_num_message_edges(hete):
  """
  返回一个列表，元素为tuple(message_type, num_edge)
  """

  message_type_edges = []
  for message_type,num_edge in hete.edge_type.items():
    message_type_edges.append((message_type,len(num_edge)))

  return message_type_edges



message_type_edges = get_num_message_edges(hete)
for (message_type, num_edges) in message_type_edges:
  print("Message type {} has {} edges".format(message_type, num_edges))

输出：

Message type ('n0', 'e0', 'n0') has 35 edges
Message type ('n0', 'e2', 'n1') has 11 edges
Message type ('n1', 'e1', 'n1') has 32 edges

1.7 Question 1.5：数据集划分：每一个split中有多少个节点？

DeepSNAP有内置的数据集划分函数。

问题答案代码：

from deepsnap.dataset import GraphDataset

def compute_dataset_split_counts(datasets):
  """
  入参：数据集划分后得到的字典（key为'train'/'val'/'test'，value为对应的GraphSataset）
  返回值：字典（key为'train'/'val'/'test'，value为对应split中含有的有标签节点个数）
  """
  
  data_set_splits = {}
  for ds_name,ds in datasets.items():
    #print(ds_name)  train
    #print(ds[0].node_label_index)  {'n0': tensor([10,  8,  3, 12,  0, 13]), 'n1': tensor([ 0,  8,  1, 15,  5,  7])}
    data_set_splits[ds_name]=ds[0].node_label_index['n0'].shape[0]+ds[0].node_label_index['n1'].shape[0]
    #这里建议用的node_label_index，但是据我猜测用node_label应该也行
    #对node_label_index属性的介绍见下

  return data_set_splits



dataset = GraphDataset([hete], task='node')
# Splitting the dataset
dataset_train, dataset_val, dataset_test = dataset.split(transductive=True, split_ratio=[0.4, 0.3, 0.3])
datasets = {'train': dataset_train, 'val': dataset_val, 'test': dataset_test}

data_set_splits = compute_dataset_split_counts(datasets)
for dataset_name, num_nodes in data_set_splits.items():
  print("{} dataset has {} nodes".format(dataset_name, num_nodes))

输出：

train dataset has 12 nodes
val dataset has 10 nodes
test dataset has 12 nodes

HeteroGraph.node_label_index: Slicing node label to get the corresponding split G.node_label[G.node_label_index].（出自Introduction — DeepSNAP 0.2.0 documentation）
这写的是个什么玩意儿，这谁看得懂……总之意思就是说可以通过node_label_index来讲数据集划分后的节点通过索引对应到原来的标签，举例来说：

data_train=dataset_train[0]
print(data_train.node_label)
print(data_train.node_label_index)
print(hete.node_label)
print(hete.node_label_index)
print(hete.node_label['n0'][data_train.node_label_index['n0']])

输出：

{'n0': tensor([0, 0, 0, 0, 0, 0]), 'n1': tensor([1, 1, 1, 1, 1, 1])}
{'n0': tensor([ 5, 13, 14,  9,  0,  2]), 'n1': tensor([ 6, 11,  4, 13,  9, 15])}
{'n0': tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]), 'n1': tensor([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])}
{'n0': tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16]), 'n1': tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16])}
tensor([0, 0, 0, 0, 0, 0])

1.8 DeepSNAP数据集可视化

from deepsnap.dataset import GraphDataset

dataset = GraphDataset([hete], task='node')
# Splitting the dataset
dataset_train, dataset_val, dataset_test = dataset.split(transductive=True, split_ratio=[0.4, 0.3, 0.3])
titles = ['Train', 'Validation', 'Test']

for i, dataset in enumerate([dataset_train, dataset_val, dataset_test]):
  n0 = hete._convert_to_graph_index(dataset[0].node_label_index['n0'], 'n0').tolist()
  #[21, 5, 7, 8, 16, 11]
  #看上下文应该是返回该split中node_type为n0的节点的索引。_convert_to_graph_index()返回Tensor
  n1 = hete._convert_to_graph_index(dataset[0].node_label_index['n1'], 'n1').tolist()

  plt.figure(figsize=(7, 7))
  plt.title(titles[i])
  nx.draw(G_orig, pos=pos, node_color="grey", edge_color=colors, labels=labels, font_color='white')
  nx.draw_networkx_nodes(G_orig.subgraph(n0), pos=pos, node_color="blue")
  #subgraph()应该是返回node-induced subgraph的意思，但我找不到对应的文档，算了
  nx.draw_networkx_nodes(G_orig.subgraph(n1), pos=pos, node_color="red")
  show()

2. 异质图节点预测任务

这一部分问题应该是修改自DeepSNAP官方的异质图节点预测任务示例代码：deepsnap/node_classification_acm.py at master · snap-stanford/deepsnap
所以我答案也是从别人写的colab4中抄了一部分，从这个里面抄了一部分（毕竟据我猜测老师出这个题就是照着这个官方答案魔改的）。

首先我们假设有一个图 $G$ ，其有2种node types $a$ 和 $b$ ，3种three message types $m_1=(a, r_1, a)$ , $m_2=(a, r_2, b)$ 和 $m_3=(a, r_3, b)$ 。
一个heterogeneous layer要包含3个Heterogeneous GNN layers（本colab中的 HeteroGNNConv），每个 HeteroGNNConv 层只对一种message type做message passing和aggregation。

整体算法流程：

在本colab中，第 $l$ 层heterogeneous GNN layer由第 $l$ 层Heterogeneous GNN Wrapper layer（即本colab中的 HeteroGNNWrapperConv）进行管理，它直接通过上一层的节点嵌入进行信息传递、聚合到下一层的节点嵌入。
整体算法流程：

2.1 导包

import copy
import torch
import deepsnap
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import torch_geometric.nn as pyg_nn

from sklearn.metrics import f1_score
from deepsnap.hetero_gnn import forward_op
from deepsnap.hetero_graph import HeteroGraph
from torch_sparse import SparseTensor, matmul

2.2 Heterogeneous GNN Layer

每个message type： $m = (s, r, d)$

GraphSAGE模型的公式：
$h_v^{(l)[m]} = W^{(l)[m]} \cdot \text{CONCAT} \Big( W_d^{(l)[m]} \cdot h_v^{(l-1)}, W_s^{(l)[m]} \cdot\text{AGG}(\{h_u^{(l-1)}, \forall u \in N_{m}(v) \})\Big)$
为了简化操作，本colab中使用mean作为aggregator：
$\text{AGG}(\{h_u^{(l-1)}, \forall u \in N_{m}(v) \}) = \frac{1}{|N_{m}(v)|} \sum_{u\in N_{m}(v)} h_u^{(l-1)}$

class HeteroGNNConv(pyg_nn.MessagePassing):
    def __init__(self, in_channels_src, in_channels_dst, out_channels):
        super(HeteroGNNConv, self).__init__(aggr="mean")

        self.in_channels_src = in_channels_src
        self.in_channels_dst = in_channels_dst
        self.out_channels = out_channels

        self.lin_dst=nn.Linear(in_channels_dst,out_channels)  #W_d^{(l)[m]}
        self.lin_src=nn.Linear(in_channels_src,out_channels)  #W_s^{(l)[m]}
        self.lin_update=nn.Linear(out_channels*2,out_channels)  #W^{(l)[m]}

    def forward(
        self,
        node_feature_src,
        node_feature_dst,
        edge_index,
        size=None,
        res_n_id=None,
    ):
        return self.propagate(edge_index,size=size,
               node_feature_src=node_feature_src,
               node_feature_dst=node_feature_dst,res_n_id=res_n_id)

    def message_and_aggregate(self, edge_index, node_feature_src):
        # Here edge_index is torch_sparse SparseTensor.
        out=matmul(edge_index,node_feature_src,reduce=self.aggr)
        #实不相瞒，我没看懂，但是算了，以后再说吧

        return out

    def update(self, aggr_out, node_feature_dst, res_n_id):
        aggr_out=self.lin_src(aggr_out)
        node_feature_dst=self.lin_dst(node_feature_dst)
        concat_features = torch.cat((node_feature_dst, aggr_out),dim=-1)
        #维度-1在这里就是维度1
        aggr_out = self.lin_update(concat_features)

        return aggr_out

2.3 Heterogeneous GNN Wrapper Layer

在对每一种message type应用GNN层（HeteroGNNConv）时，我们需要在每一层上将它们聚合起来。
在本colab中将应用两种聚合方式。

第一种：mean
$h_v^{(l)} = \frac{1}{M}\sum_{m=1}^{M}h_v^{(l)[m]}$
节点 $v$ 的node type是 $d$ ， $M$ 是destination node的node type是 $d$ 的message type的数量。

第二种：semantic level attention introduced in HAN (Wang et al. (2019))
$e_{m} = \frac{1}{|V_{d}|} \sum_{v \in V_{d}} q_{attn}^T \cdot \tanh \Big( W_{attn}^{(l)} \cdot h_v^{(l)[m]} + b \Big) \\ \alpha_{m} = \frac{\exp(e_{m})}{\sum_{m=1}^M \exp(e_{m})} \\ h_v^{(l)} = \sum_{m=1}^{M} \alpha_{m} \cdot h_v^{(l)[m]}$
$m$ 是message type， $d$ 是destination node type。

class HeteroGNNWrapperConv(deepsnap.hetero_gnn.HeteroConv):
    #文档：https://snap.stanford.edu/deepsnap/modules/hetero_gnn.html
    def __init__(self, convs, args, aggr="mean"):
        super(HeteroGNNWrapperConv, self).__init__(convs, None)
        self.aggr = aggr

        # Map the index and message type
        self.mapping = {}

        # A numpy array that stores the final attention probability
        self.alpha = None

        self.attn_proj = None

        if self.aggr == "attn":
            self.attn_proj = nn.Sequential(
                nn.Linear(args['hidden_size'], args['attn_size']),
                nn.Tanh(),
                nn.Linear(args['attn_size'], 1, bias=False),
            )

    
    def reset_parameters(self):
        super(HeteroConvWrapper, self).reset_parameters()
        if self.aggr == "attn":
            for layer in self.attn_proj.children():
                layer.reset_parameters()
    
    
    def forward(self, node_features, edge_indices):
    	#edge_indices: 字典，key是message type，value是对应的edge_index Tensor
        message_type_emb = {}
        for message_key, message_type in edge_indices.items():
            src_type, edge_type, dst_type = message_key
            node_feature_src = node_features[src_type]
            node_feature_dst = node_features[dst_type]
            edge_index = edge_indices[message_key]
            message_type_emb[message_key] = (
                self.convs[message_key](
                    node_feature_src,
                    node_feature_dst,
                    edge_index,
                )
            )
        node_emb = {dst: [] for _, _, dst in message_type_emb.keys()}
        mapping = {}        
        for (src, edge_type, dst), item in message_type_emb.items():
            mapping[len(node_emb[dst])] = (src, edge_type, dst)
            node_emb[dst].append(item)
        #mapping示例: {0: ('paper', 'author', 'paper'), 1: ('paper', 'subject', 'paper')}
        self.mapping = mapping
        for node_type, embs in node_emb.items():
            if len(embs) == 1:
                node_emb[node_type] = embs[0]
            else:
                node_emb[node_type] = self.aggregate(embs)
        return node_emb
    


    def aggregate(self, xs):
        #xs是Tensor（message type的embeddings）的list

        if self.aggr == "mean":
            x = torch.stack(xs, dim=-1)
            return x.mean(dim=-1)

        elif self.aggr == "attn":
            N = xs[0].shape[0] # Number of nodes for that node type
            M = len(xs) # Number of message types for that node type

            x = torch.cat(xs, dim=0).view(M, N, -1) # M * N * D
            z = self.attn_proj(x).view(M, N) # M * N * 1
            z = z.mean(1) # M * 1
            alpha = torch.softmax(z, dim=0) # M * 1

            # Store the attention result to self.alpha as np array
            self.alpha = alpha.view(-1).data.cpu().numpy()
            #(len(xs),)
            #self.alpha不用于反向传播等操作，仅用于看不同层对不同message type的attention值
  
            alpha = alpha.view(M, 1, 1)
            x = x * alpha
            return x.sum(dim=0)

2.4 初始化Heterogeneous GNN Layers

def generate_convs(hetero_graph, conv, hidden_size, first_layer=False):
    """
    入参：
    hetero_graph：DeepSNAP `HeteroGraph` object
    conv: HeteroGNNConv
    第一层：输入维度为特征维度，输出维度为隐藏层维度
    非第一层：输入维度为隐藏层维度，输出维度也是隐藏层维度

	返回值：一个 `HeteroGNNConv` 层的字典，key是message types。
    """

    convs = {}
    for message_type in hetero_graph.message_types:
        if first_layer is True:
            src_type = message_type[0]
            dst_type = message_type[2]
            src_size = hetero_graph.num_node_features(src_type)
            dst_size = hetero_graph.num_node_features(dst_type)
            convs[message_type] = conv(src_size,dst_size, hidden_size)
        else:
            convs[message_type] = conv(hidden_size, hidden_size, hidden_size)
    
    return convs

注意这里推荐使用 deepsnap.hetero_graph.HeteroGraph.num_node_features(node_type) 方法，但是经我测试在question 1中建立的异质图 hete 上运行 hete.num_node_features('n1') 会报错：AttributeError: 'list' object has no attribute 'shape'
这应该是因为 hete 上的特征是list格式而非Tensor格式，我觉得这是DeepSNAP尚有不足之处。

2.5 HeteroGNN

我们建立一个包含2层 HeteroGNNWrapperConv 的HeteroGNN模型。
$self.post_mps \text{self.convs1} \rightarrow \text{self.bns1} \rightarrow \text{self.relus1} \rightarrow \text{self.convs2} \rightarrow \text{self.bns2} \rightarrow \text{self.relus2} \rightarrow \text{self.post\_mps}$

class HeteroGNN(torch.nn.Module):
    def __init__(self, hetero_graph, args, aggr="mean"):
        super(HeteroGNN, self).__init__()

        self.aggr = aggr
        self.hidden_size = args['hidden_size']

        self.bns1 = nn.ModuleDict()
        self.bns2 = nn.ModuleDict()
        self.relus1 = nn.ModuleDict()
        self.relus2 = nn.ModuleDict()
        self.post_mps = nn.ModuleDict()
        
        convs1 = generate_convs(hetero_graph, HeteroGNNConv, self.hidden_size, first_layer=True)
        convs2 = generate_convs(hetero_graph, HeteroGNNConv, self.hidden_size)

        self.convs1 = HeteroGNNWrapperConv(convs1, args, aggr=self.aggr)
        self.convs2 = HeteroGNNWrapperConv(convs2, args, aggr=self.aggr)

        for node_type in hetero_graph.node_types:
            self.bns1[node_type] = torch.nn.BatchNorm1d(self.hidden_size, eps=1)
            self.bns2[node_type] = torch.nn.BatchNorm1d(self.hidden_size, eps=1)
            self.post_mps[node_type] = nn.Linear(self.hidden_size, hetero_graph.num_node_labels(node_type))
            self.relus1[node_type] = nn.LeakyReLU()
            self.relus2[node_type] = nn.LeakyReLU()



    def forward(self, node_feature, edge_index):
        #node_feature是一个字典，key是node types，values是对应的feature Tensors
        #edge_index也是一个字典，字典，key是message types，value是对应的edge_index Tensor

        x = node_feature
        x = self.convs1(x, edge_index)
        x = forward_op(x, self.bns1)  #这个方法介绍见下
        x = forward_op(x, self.relus1)
        x = self.convs2(x, edge_index)
        x = forward_op(x, self.bns2)
        x = forward_op(x, self.relus2)
        x = forward_op(x, self.post_mps)
        
        return x

    def loss(self, preds, y, indices):
        loss = 0
        loss_func = F.cross_entropy
        for node_type in preds:
            idx = indices[node_type]
            loss += loss_func(preds[node_type][idx], y[node_type][idx])

        return loss

forward_op(x, module_dict, **kwargs)：
文档：deepsnap.hetero_gnn.forward_op
大意来说就是给定如代码所示格式的 x 和 module_dict 参数，forward_op() 方法会按照二者对应的key来对应地按照给定的参数将 x 的value运行在 module_dict 的value上。

2.6 构建 `train()` 和 `test()` 函数

def train(model, optimizer, hetero_graph, train_idx):
    model.train()
    optimizer.zero_grad()
    preds = model(hetero_graph.node_feature, hetero_graph.edge_index)
    loss = model.loss(preds, hetero_graph.node_label, train_idx)
    loss.backward()
    optimizer.step()
    return loss.item()

def test(model, graph, indices, best_model=None, best_val=0):
    model.eval()
    accs = []
    for index in indices:
        preds = model(graph.node_feature, graph.edge_index)
        num_node_types = 0
        micro = 0
        macro = 0
        for node_type in preds:
            idx = index[node_type]
            pred = preds[node_type][idx]
            pred = pred.max(1)[1]
            label_np = graph.node_label[node_type][idx].cpu().numpy()
            pred_np = pred.cpu().numpy()
            micro = f1_score(label_np, pred_np, average='micro')
            macro = f1_score(label_np, pred_np, average='macro')
            num_node_types += 1
            
        #注意这里，实际上对F1 score求平均是没有意义的
        #但是在我们的例子中其实只有一种node type所以也无所谓了……
        micro /= num_node_types
        macro /= num_node_types
        accs.append((micro, macro))
        
    if accs[1][0] > best_val:
        best_val = accs[1][0]
        best_model = copy.deepcopy(model)
        #注意这里要深拷贝！我就被这个深拷贝浅拷贝坑过！
        #反正先记住这里要深拷贝好了，以后我还准备专门写博文讲一下这个深拷贝浅拷贝直接引用的事
    return accs, best_model, best_val

2.7 设置超参

args = {
    'device': torch.device('cuda' if torch.cuda.is_available() else 'cpu'),
    'hidden_size': 64,
    'epochs': 100,
    'weight_decay': 1e-5,
    'lr': 0.003,
    'attn_size': 32,
}

2.8 数据集导入及预处理

在这一环节中我们将使用Tensor backend而非NetworkX backend了。
在本colab中使用的 ACM(3025) 数据集来源于 HAN (Wang et al. (2019))，本colab的数据集提取自DGL的ACM.mat。
原始的ACM数据集有3种node types和2种edge (relation) types。为简化起见，我们将其简化为1种node type和2种edge types：

所以在我们的数据集中，只有一种node type (paper) 和2种message types (paper, author, paper) and (paper, subject, paper)
数据集下载方式见我的GitHub项目的README文件2021/6/21更新部分：PolarisRisingWar/cs224w-2021-winter-colab: cs224w（图机器学习）2021冬季课程的colab

print("Device: {}".format(args['device']))

# Load the data
data = torch.load("acm.pkl")
#data是一个字典，key是str，value是Tensor

# Message types
message_type_1 = ("paper", "author", "paper")
message_type_2 = ("paper", "subject", "paper")

# Dictionary of edge indices
edge_index = {}
edge_index[message_type_1] = data['pap']
edge_index[message_type_2] = data['psp']

# Dictionary of node features
node_feature = {}
node_feature["paper"] = data['feature']

# Dictionary of node labels
node_label = {}
node_label["paper"] = data['label']

# Load the train, validation and test indices
train_idx = {"paper": data['train_idx'].to(args['device'])}
val_idx = {"paper": data['val_idx'].to(args['device'])}
test_idx = {"paper": data['test_idx'].to(args['device'])}

# Construct a deepsnap tensor backend HeteroGraph
hetero_graph = HeteroGraph(
    node_feature=node_feature,
    node_label=node_label,
    edge_index=edge_index,
    directed=True
)

print(f"ACM heterogeneous graph: {hetero_graph.num_nodes()} nodes, {hetero_graph.num_edges()} edges")

# Node feature and node label to device
for key in hetero_graph.node_feature:
    hetero_graph.node_feature[key] = hetero_graph.node_feature[key].to(args['device'])
for key in hetero_graph.node_label:
    hetero_graph.node_label[key] = hetero_graph.node_label[key].to(args['device'])

# Edge_index to sparse tensor and to device
for key in hetero_graph.edge_index:
    edge_index = hetero_graph.edge_index[key]
    adj = SparseTensor(row=edge_index[0], col=edge_index[1], sparse_sizes=(hetero_graph.num_nodes('paper'), hetero_graph.num_nodes('paper')))
    hetero_graph.edge_index[key] = adj.t().to(args['device'])
print(hetero_graph.edge_index[message_type_1])
print(hetero_graph.edge_index[message_type_2])

输出内容：

Device: cuda
ACM heterogeneous graph: {'paper': 3025} nodes, {('paper', 'author', 'paper'): 26256, ('paper', 'subject', 'paper'): 2207736} edges
SparseTensor(row=tensor([   0,    0,    0,  ..., 3024, 3024, 3024], device='cuda:0'),
             col=tensor([   8,   20,   51,  ..., 2948, 2983, 2991], device='cuda:0'),
             size=(3025, 3025), nnz=26256, density=0.29%)
SparseTensor(row=tensor([   0,    0,    0,  ..., 3024, 3024, 3024], device='cuda:0'),
             col=tensor([  75,  434,  534,  ..., 3020, 3021, 3022], device='cuda:0'),
             size=(3025, 3025), nnz=2207736, density=24.13%)

2.9 Training the Mean Aggregation

best_model = None
best_val = 0

model = HeteroGNN(hetero_graph, args, aggr="mean").to(args['device'])
optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'], weight_decay=args['weight_decay'])

for epoch in range(args['epochs']):
    loss = train(model, optimizer, hetero_graph, train_idx)
    accs, best_model, best_val = test(model, hetero_graph, [train_idx, val_idx, test_idx], best_model, best_val)
    print(
        f"Epoch {epoch + 1}: loss {round(loss, 5)}, "
        f"train micro {round(accs[0][0] * 100, 2)}%, train macro {round(accs[0][1] * 100, 2)}%, "
        f"valid micro {round(accs[1][0] * 100, 2)}%, valid macro {round(accs[1][1] * 100, 2)}%, "
        f"test micro {round(accs[2][0] * 100, 2)}%, test macro {round(accs[2][1] * 100, 2)}%"
    )
best_accs, _, _ = test(best_model, hetero_graph, [train_idx, val_idx, test_idx])
print(
    f"Best model: "
    f"train micro {round(best_accs[0][0] * 100, 2)}%, train macro {round(best_accs[0][1] * 100, 2)}%, "
    f"valid micro {round(best_accs[1][0] * 100, 2)}%, valid macro {round(best_accs[1][1] * 100, 2)}%, "
    f"test micro {round(best_accs[2][0] * 100, 2)}%, test macro {round(best_accs[2][1] * 100, 2)}%"
)

每一轮的输出略，最好模型的输出：

Best model: train micro 99.83%, train macro 99.83%, valid micro 98.33%, valid macro 98.33%, test micro 87.86%, test macro 87.78%

2.10 Training the Attention Aggregation

best_model = None
best_val = 0

output_size = hetero_graph.num_node_labels('paper')
model = HeteroGNN(hetero_graph, args, aggr="attn").to(args['device'])
optimizer = torch.optim.Adam(model.parameters(), lr=args['lr'], weight_decay=args['weight_decay'])

for epoch in range(args['epochs']):
    loss = train(model, optimizer, hetero_graph, train_idx)
    accs, best_model, best_val = test(model, hetero_graph, [train_idx, val_idx, test_idx], best_model, best_val)
    print(
        f"Epoch {epoch + 1}: loss {round(loss, 5)}, "
        f"train micro {round(accs[0][0] * 100, 2)}%, train macro {round(accs[0][1] * 100, 2)}%, "
        f"valid micro {round(accs[1][0] * 100, 2)}%, valid macro {round(accs[1][1] * 100, 2)}%, "
        f"test micro {round(accs[2][0] * 100, 2)}%, test macro {round(accs[2][1] * 100, 2)}%"
    )
best_accs, _, _ = test(best_model, hetero_graph, [train_idx, val_idx, test_idx])
print(
    f"Best model: "
    f"train micro {round(best_accs[0][0] * 100, 2)}%, train macro {round(best_accs[0][1] * 100, 2)}%, "
    f"valid micro {round(best_accs[1][0] * 100, 2)}%, valid macro {round(best_accs[1][1] * 100, 2)}%, "
    f"test micro {round(best_accs[2][0] * 100, 2)}%, test macro {round(best_accs[2][1] * 100, 2)}%"
)

每一轮的输出略，最好模型的输出：

Best model: train micro 99.67%, train macro 99.67%, valid micro 97.67%, valid macro 97.66%, test micro 85.79%, test macro 85.27%

2.11 Attention for each Message Type

if model.convs1.alpha is not None and model.convs2.alpha is not None:
    for idx, message_type in model.convs1.mapping.items():
        print(f"Layer 1 has attention {model.convs1.alpha[idx]} on message type {message_type}")
    for idx, message_type in model.convs2.mapping.items():
        print(f"Layer 2 has attention {model.convs2.alpha[idx]} on message type {message_type}")

输出：

Layer 1 has attention 0.960588812828064 on message type ('paper', 'author', 'paper')
Layer 1 has attention 0.03941113129258156 on message type ('paper', 'subject', 'paper')
Layer 2 has attention 0.30975428223609924 on message type ('paper', 'author', 'paper')
Layer 2 has attention 0.6902456879615784 on message type ('paper', 'subject', 'paper')

3. 其他正文及脚注未提及的参考资料

CS224W_Winter2021/CS224W_Colab_4.ipynb at main · hdvvip/CS224W_Winter2021：这个也是有人写的colab4的一篇答案，我第一遍写的时候前半部分有一些借鉴了这一篇的代码。直到后来我发现了DeepSNAP官方的异质图节点分类代码……我就改去抄那个了。

顺带一提我在写这篇笔记的这几天发现PyG又更新了，支持异质图了，牛逼！ ↩︎
我没仔细看。简单查了一下，这是专门的叫网络布局算法的领域知识。总之如感兴趣可以搜索、参考：网络布局算法之【FR算法(Fruchterman-Reingold)】_漫游学海之旅-CSDN博客 ↩︎

你可能感兴趣的:(人工智能学习笔记,深度学习,GNN,图机器学习,异质图,DeepSNAP)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
android系统selinux中添加新属性property 辉色投像
1.定位/android/system/sepolicy/private/property_contexts声明属性开头：persist.charge声明属性类型：u:object_r:system_prop:s0图12.定位到android/system/sepolicy/public/domain.te删除neverallow{domain-init}default_prop:property
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
pyecharts——绘制柱形图折线图 2224070247 信息可视化 python java 数据可视化
一、pyecharts概述自2013年6月百度EFE(ExcellentFrontEnd）数据可视化团队研发的ECharts1.0发布到GitHub网站以来，ECharts一直备受业界权威的关注并获得广泛好评，成为目前成熟且流行的数据可视化图表工具，被应用到诸多数据可视化的开发领域。Python作为数据分析领域最受欢迎的语言，也加入ECharts的使用行列，并研发出方便Python开发者使用的数据
数据仓库——维度表一致性墨染丶eye 背诵数据仓库
数据仓库基础笔记思维导图已经整理完毕，完整连接为：数据仓库基础知识笔记思维导图维度一致性问题从逻辑层面来看，当一系列星型模型共享一组公共维度时，所涉及的维度称为一致性维度。当维度表存在不一致时，短期的成功难以弥补长期的错误。维度时确保不同过程中信息集成起来实现横向钻取货活动的关键。造成横向钻取失败的原因维度结构的差别，因为维度的差别，分析工作涉及的领域从简单到复杂，但是都是通过复杂的报表来弥补设计
ARM驱动学习之基础小知识 JT灬新一 ARM 嵌入式 arm开发学习
ARM驱动学习之基础小知识•sch原理图工程师工作内容–方案–元器件选型–采购（能不能买到，价格）–原理图（涉及到稳定性）•layout画板工程师–layout（封装、布局，布线，log）（涉及到稳定性）–焊接的一部分工作（调试阶段板子的焊接）•驱动工程师–驱动，原理图，layout三部分的交集容易发生矛盾•PCB研发流程介绍–方案，原理图(网表)–layout工程师（gerber文件）–PCB板
展现思维导图魅力，不断挖掘人生宝藏思维导图讲师Mandy
第13期最强思维导图训练营已经结束一周了，但是我依旧是感觉所有学员还在努力的学习，这些学员中有教师、学生、白领、公务员、宝妈等等，只要你努力，只要你想改变自己，任何行业，任何岗位都可以参与进来，28天足以让你见成效，在这28天中，我们的学员不仅仅是收获了一枚毕业证，最重要的是让自己的思维方式得到升级，今天的你为自己投资，明天的你就会感谢你今天的付出，我们来听一听来自13期最强思维导图训练营优秀学员
2019-11-04复盘——飞来山上千寻塔，闻说鸡鸣见日升。那一叶秋
1、大盘篇先上老图，看习惯了，也就知道走势了图1上证指数日线图还是那张老图，自己可以在自己的相关软件上画出来，快变盘了。2、个股篇未加仓、未减仓。分析量能的时候，突然发现这么一个东西：“放量突破年线，缩量回调。”合众科技日线图其实，最近的N只个股，在技术分析上，都到了变盘的临界时候。结合这么久的走势，特别是ZJH不断放开IPO的申请，本质上说是融资难度变大，或者说是为企业的融资开创便利。但现在市场
数字里的世界17期：2021年全球10大顶级数据中心，中国移动榜首张三叨
你知道吗？2016年，全球的数据中心共计用电4160亿千瓦时，比整个英国的发电量还多40％！前言每天，我们都会创造超过250万TB的数据。并且随着物联网（IOT）的不断普及，这一数据将持续增长。如此庞大的数据被存储在被称为“数据中心”的专用设施中。虽然最早的数据中心建于20世纪40年代，但直到1997-2000年的互联网泡沫期间才逐渐成为主流。当前人类的技术，比如人工智能和机器学习，已经将我们推向
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
Python开发常用的三方模块如下：换个网名有点难 python 开发语言
Python是一门功能强大的编程语言，拥有丰富的第三方库，这些库为开发者提供了极大的便利。以下是100个常用的Python库，涵盖了多个领域：1、NumPy，用于科学计算的基础库。2、Pandas，提供数据结构和数据分析工具。3、Matplotlib，一个绘图库。4、Scikit-learn，机器学习库。5、SciPy，用于数学、科学和工程的库。6、TensorFlow，由Google开发的开源机
Python实现简单的机器学习算法 master_chenchengg python python 办公效率 python开发 IT
Python实现简单的机器学习算法开篇：初探机器学习的奇妙之旅搭建环境：一切从安装开始必备工具箱第一步：安装Anaconda和JupyterNotebook小贴士：如何配置Python环境变量算法初体验：从零开始的Python机器学习线性回归：让数据说话数据准备：从哪里找数据编码实战：Python实现线性回归模型评估：如何判断模型好坏逻辑回归：从分类开始理论入门：什么是逻辑回归代码实现：使用skl
Java企业面试题3 马龙强_ java
1.break和continue的作用(智*图)break：用于完全退出一个循环（如for,while）或一个switch语句。当在循环体内遇到break语句时，程序会立即跳出当前循环体，继续执行循环之后的代码。continue：用于跳过当前循环体中剩余的部分，并开始下一次循环。如果是在for循环中使用continue，则会直接进行条件判断以决定是否执行下一轮循环。2.if分支语句和switch分
父母教育孩子的方式，将影响孩子一生树英教育
为什么有些孩子总是充满自信与快乐？独立、有主见又坚强？而有些孩子却自卑、胆怯，软弱又过度依赖父母？为什么有些孩子总是健康、阳光又富于创造力？而有些孩子却悲观、孤僻又思想空乏？一个孩子的行为取决于孩子的思想，思想取决于环境和自己的认知，认知取决于教育。父母是孩子人生中的第一位教育者，父母养育孩子的方式，将决定他们人生的高度，影响他们的一生。网络图，侵权即删优秀的父母就像园丁，既要浇水施肥，又要修剪杂
系统架构设计师需求分析篇二 AmHardy 软件架构设计师系统架构需求分析面向对象分析分析模型 UML和SysML
面向对象分析方法1.用例模型构建用例模型一般需要经历4个阶段：识别参与者：识别与系统交互的所有事物。合并需求获得用例：将需求分配给予其相关的参与者。细化用例描述：详细描述每个用例的功能。调整用例模型：优化用例之间的关系和结构，前三个阶段是必需的。2.用例图的三元素参与者：使用系统的用户或其他外部系统和设备。用例：系统所提供的服务。通信关联：参与者和用例之间的关系，或用例与用例之间的关系。3.识别参
黄景瑜工作人员怒怼营销号！肖战事件就是他的前车之鉴板凳吃瓜小分队
无论社会怎样浮躁，我们自己也不可以浮躁。战胜浮躁的关键是明白自己真正的需要，保持一颗平常心，不要盲目攀比，不要羡慕别人，更不要唯利是图。一辈子很短，我们不能总是望着别人的精彩，羡慕着别人的人生，而忘记了经营自己生活，要知道，通过努力，你也能成为让人仰望的明星。如今，随着娱乐产业越来越成熟，每年的新星也是扎堆冒出。在我看来，与前几年不同的是，如今的新生代质量明显好过从前。“更专业了，更有礼貌了”也是
2023-06-19【感恩日记】第246篇 o泡沫o
思想日记：坚持下去，相信自己一定可以的【感恩日记】第246篇1.我真是太幸福啦！感恩孩子早起阅读，放学到学生之家完成作业，平安度过美好的一天。感恩！感恩！感恩！❤️2.我真是太幸福啦！感恩自己早起给孩子煮早餐，完成计划的工作，晚上学习。感恩！感恩！感恩！❤️3.我真是太幸福啦！感恩为我设计效果图的老师。感恩！感恩！感恩！❤️4.我真是太幸福啦！感恩父母养育了我，有妈的孩子真幸福。感恩！感恩！感恩！
摄影小白，怎么才能拍出高大上产品图片？是波妞唉
很多人以为文案只要会码字，会排版就OK了！说实话，没接触到这一行的时候，我的想法更简单，以为只要会写字就行！可是真做了文案才发现，码字只是入门级的基本功。一篇文章离不开排版、配图，说起来很简单！从头做到尾你就会发现，写文章用两个小时，找合适的配图居然要花掉半天的时间，甚至更久！图片能找到合适的就不怕，还有找不到的，比如产品图，只能亲自拍。拿着摆弄了半天，就是拍不出想要的效果，光线不好、搭出来丑破天
【Bugs】Python：“ModuleNotFoundError: No module named ‘XXX‘” 系'辞工具箱 python bug anaconda
问题描述Python使用库的前提是必须已安装了相应的库，往往利用“命令行指令”实现安装，一般安装解法类似。但，还是具有延伸问题，本博客对此作记录。【1】Nomodulenamed‘seaborn’(1.1):情况1：为Anaconda安装【图1-2】.定位Anaconda路径【图3】.Anaconda路径加入Path>&
3286、穿越网格图的安全路径 Lenyiin 题解 c++算法 leetcode
3286、[中等]穿越网格图的安全路径1、题目描述给你一个mxn的二进制矩形grid和一个整数health表示你的健康值。你开始于矩形的左上角(0,0)，你的目标是矩形的右下角(m-1,n-1)。你可以在矩形中往上下左右相邻格子移动，但前提是你的健康值始终是正数。对于格子(i,j)，如果grid[i][j]=1，那么这个格子视为不安全的，会使你的健康值减少1。如果你可以到达最终的格子，请你返回tr
遥感影像的切片处理 sand&wich 计算机视觉 python 图像处理
在遥感影像分析中，经常需要将大尺寸的影像切分成小片段，以便于进行详细的分析和处理。这种方法特别适用于机器学习和图像处理任务，如对象检测、图像分类等。以下是如何使用Python和OpenCV库来实现这一过程，同时确保每个影像片段保留正确的地理信息。准备环境首先，确保安装了必要的Python库，包括numpy、opencv-python和xml.etree.ElementTree。这些库将用于图像处理
阅读《别说你懂思维导图》21～23章day27 Ling宝尔
合理期待——思维导图的应用效果很多人问我，思维导图真的有用么？我常常回答，如果你觉得是它“没用”，一定是因为你没“用”，有“用”才“有用”。实际上，学习思维导图和学习木工、驾驶等技能型学习一样，都要经历从了解到应用、从应用到受益的过程。在使用前，我们很多人的思维处于“无意识的低效”状态，经过一段时间的学习，虽然掌握了思维导图的基本使用方法，但可能并没有太好的效果，这个阶段可称为“有意识的低效”状态
GenVisR 基因组数据可视化实战(三) 11的雾
3.genCov画每个突变位点附件的coverage，跟igv有点相似。这个操作起来很复杂，但是图还是挺有用的。可以考虑。由于我的referencegenomebuild是hg38BiocManager::install(c("TxDb.Hsapiens.UCSC.hg38.knownGene","BSgenome.Hsapiens.UCSC.hg38"))library(TxDb.Hsapien
小西妈双语工程打卡2018-1-18 慢蜗牛Erica
这是送给妈妈的，还有一张是爸爸的，现在看着这张小图，觉得好温暖。早上看到了我把它折上了，还好一顿不高兴。妈妈这个是爸爸。爸爸希望之星，Herewecome.复赛通知书这是送给妈妈的小鹿，栩栩如生吧，不过妈妈不确定这是他一个人完成的。还送了妈妈一个小蝴蝶发卡，很暖心哦。小鹿上完课回家就很晚了，自己看了好几本书，没有录阅读打卡。听peppa第一季3集。
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
这样旅行的人，值得拥有丰富而饱满的体验究竟
01“一张车票就实现了来拉萨的梦想。原以为很遥远，现也觉得旅途值得。也不过山河故人而已。”打开朋友圈，看到了强子新发的动态，配了两张图，一张图里是拉萨火车站，另一张图里是二十来张排列得整整齐齐的火车票，终点站都是拉萨。又想起几天前，姑娘秀了一波在青海湖的美照，照片里的她，身穿鲜艳的红色长裙，坐在牦牛背上，阳光打下来，她笑靥如花。橙色的旗子风中飘扬，那蓝绿色的青海湖和天空再美，也都成了陪衬。再看看自
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
轻风拂柳《春意萦怀》之六轻风拂柳
图/来自网络轻风拂柳《春意萦怀》之六轻风拂柳《春意萦怀》原韵烂熳芳林赏丽容，春光明媚盼相逢。娇桃绽蕊仙姿艳，淑杏凝脂玉色浓。对对黄莺穿树影，双双彩蝶逐花踪。风情小雅灵犀有，景美难将笔墨封。图/来自网络步轻风拂柳《春意萦怀》原韵（一）诗·时就三月阳春思丽容，花红柳绿也相逢。不歆桃蕊风姿艳，只慕书斋墨色浓。期翼共窗难觅影，时望携手苦寻踪。天涯海角君何有？一颗痴心哪日封？（二）诗·大漠孤烟滴翠丛林展媚容
ai绘画工具midjourney怎么下载？附作品管理教程设计师早上好
Midjourney是一款功能强大的AI绘画工具，它使用机器学习技术和深度神经网络等算法，可以生成各种艺术风格的绘画作品。在创意设计、广告宣传等方面有着广泛的应用前景。那么，ai绘画工具midjourney怎么下载？本文将为您介绍Midjourney的下载以及作品的相关管理。一、Midjourney下载Midjourney的下载非常简单，只需打开Midjourney官网（点击“GetMidjour
新月|图卡5-8《心》一切始于心，终于心新月_f578
大家好，我是坚持做图卡，不断精进的新月，近期阅读书籍《心。》，持续输出图卡……截止目前已经读完本书，输出卡片9张~借助9张卡片，回顾本书的整体内容，结构上可以分为：始于心-修心-终于心。首先明确：我们为什么要这么做？其次懂得如何去做，落实到具体的方式方法上，就是修心的过程。最后是知道目标在哪，不断自我提升，向目标靠进，使修心贯穿始终。
Linux的Initrd机制被触发 linux
Linux 的 initrd 技术是一个非常普遍使用的机制，linux2.6 内核的 initrd 的文件格式由原来的文件系统镜像文件转变成了 cpio 格式，变化不仅反映在文件格式上， linux 内核对这两种格式的 initrd 的处理有着截然的不同。本文首先介绍了什么是 initrd 技术，然后分别介绍了 Linux2.4 内核和 2.6 内核的 initrd 的处理流程。最后通过对 Lin
maven本地仓库路径修改 bitcarter maven
默认maven本地仓库路径：C:\Users\Administrator\.m2 修改maven本地仓库路径方法： 1.打开E:\maven\apache-maven-2.2.1\conf\settings.xml 2.找到
XSD和XML中的命名空间 darrenzhu xml xsd schema namespace 命名空间
http://www.360doc.com/content/12/0418/10/9437165_204585479.shtml http://blog.csdn.net/wanghuan203/article/details/9203621 http://blog.csdn.net/wanghuan203/article/details/9204337 http://www.cn
Java 求素数运算周凡杨 java 算法素数
网络上对求素数之解数不胜数，我在此总结归纳一下，同时对一些编码，加以改进，效率有成倍热提高。第一种：原理: 6N(+-)1法任何一个自然数，总可以表示成为如下的形式之一： 6N，6N+1，6N+2，6N+3，6N+4，6N+5 (N=0，1，2，…)
java 单例模式 g21121 java
想必单例模式大家都不会陌生，有如下两种方式来实现单例模式： class Singleton { private static Singleton instance=new Singleton(); private Singleton(){} static Singleton getInstance() { return instance; }
Linux下Mysql源码安装 510888780 mysql
1.假设已经有mysql-5.6.23-linux-glibc2.5-x86_64.tar.gz (1)创建mysql的安装目录及数据库存放目录解压缩下载的源码包，目录结构，特殊指定的目录除外：
32位和64位操作系统墙头上一根草 32位和64位操作系统
32位和64位操作系统是指：CPU一次处理数据的能力是32位还是64位。现在市场上的CPU一般都是64位的，但是这些CPU并不是真正意义上的64 位CPU，里面依然保留了大部分32位的技术，只是进行了部分64位的改进。32位和64位的区别还涉及了内存的寻址方面，32位系统的最大寻址空间是2 的32次方= 4294967296（bit）= 4（GB）左右，而64位系统的最大寻址空间的寻址空间则达到了
我的spring学习笔记10-轻量级_Spring框架 aijuans Spring 3
一、问题提问： → 请简单介绍一下什么是轻量级？轻量级（Leightweight）是相对于一些重量级的容器来说的，比如Spring的核心是一个轻量级的容器，Spring的核心包在文件容量上只有不到1M大小，使用Spring核心包所需要的资源也是很少的，您甚至可以在小型设备中使用Spring。
mongodb 环境搭建及简单CURD antlove Web Install curd NoSQL mongo
一搭建mongodb环境 1. 在mongo官网下载mongodb 2. 在本地创建目录 "D:\Program Files\mongodb-win32-i386-2.6.4\data\db" 3. 运行mongodb服务 [mongod.exe --dbpath "D:\Program Files\mongodb-win32-i386-2.6.4\data\
数据字典和动态视图百合不是茶 oracle 数据字典动态视图系统和对象权限
数据字典（data dictionary）是 Oracle 数据库的一个重要组成部分，这是一组用于记录数据库信息的只读（read-only）表。随着数据库的启动而启动,数据库关闭时数据字典也关闭数据字典中包含数据库中所有方案对象（schema object）的定义(包括表，视图，索引，簇，同义词，序列，过程，函数，包，触发器等等) 数据库为一
多线程编程一般规则 bijian1013 java thread 多线程 java多线程
如果两个工两个以上的线程都修改一个对象，那么把执行修改的方法定义为被同步的，如果对象更新影响到只读方法，那么只读方法也要定义成同步的。不要滥用同步。如果在一个对象内的不同的方法访问的不是同一个数据，就不要将方法设置为synchronized的。
将文件或目录拷贝到另一个Linux系统的命令scp bijian1013 linux unix scp
一.功能说明 scp就是security copy，用于将文件或者目录从一个Linux系统拷贝到另一个Linux系统下。scp传输数据用的是SSH协议，保证了数据传输的安全，其格式如下： scp 远程用户名@IP地址：文件的绝对路径
【持久化框架MyBatis3五】MyBatis3一对多关联查询 bit1129 Mybatis3
以教员和课程为例介绍一对多关联关系，在这里认为一个教员可以叫多门课程，而一门课程只有1个教员教，这种关系在实际中不太常见，通过教员和课程是多对多的关系。示例数据：地址表： CREATE TABLE ADDRESSES ( ADDR_ID INT(11) NOT NULL AUTO_INCREMENT, STREET VAR
cookie状态判断引发的查找问题 bitcarter form cgi
先说一下我们的业务背景： 1.前台将图片和文本通过form表单提交到后台，图片我们都做了base64的编码，并且前台图片进行了压缩 2.form中action是一个cgi服务 3.后台cgi服务同时供PC，H5，APP 4.后台cgi中调用公共的cookie状态判断方法（公共的，大家都用，几年了没有问题）问题：（折腾两天。。。。） 1.PC端cgi服务正常调用，cookie判断没
通过Nginx,Tomcat访问日志(access log)记录请求耗时 ronin47
一、Nginx通过$upstream_response_time $request_time统计请求和后台服务响应时间 nginx.conf使用配置方式： log_format main '$remote_addr - $remote_user [$time_local] "$request" ''$status $body_bytes_sent "$http_r
java-67- n个骰子的点数。把n个骰子扔在地上，所有骰子朝上一面的点数之和为S。输入n，打印出S的所有可能的值出现的概率。 bylijinnan java
public class ProbabilityOfDice { /** * Q67 n个骰子的点数 * 把n个骰子扔在地上，所有骰子朝上一面的点数之和为S。输入n，打印出S的所有可能的值出现的概率。 * 在以下求解过程中，我们把骰子看作是有序的。 * 例如当n=2时，我们认为（1，2）和（2，1）是两种不同的情况 */ private stati
看别人的博客，觉得心情很好 Cb123456 博客心情
以为写博客，就是总结，就和日记一样吧，同时也在督促自己。今天看了好长时间博客: 职业规划: http://www.iteye.com/blogs/subjects/zhiyeguihua android学习: 1.http://byandby.i
[JWFD开源工作流]尝试用原生代码引擎实现循环反馈拓扑分析 comsci 工作流
我们已经不满足于仅仅跳跃一次，通过对引擎的升级，今天我测试了一下循环反馈模式，大概跑了200圈，引擎报一个溢出错误在一个流程图的结束节点中嵌入一段方程，每次引擎运行到这个节点的时候，通过实时编译器GM模块，计算这个方程，计算结果与预设值进行比较，符合条件则跳跃到开始节点，继续新一轮拓扑分析，直到遇到
JS常用的事件及方法 cwqcwqmax9 js
事件描述 onactivate 当对象设置为活动元素时触发。 onafterupdate 当成功更新数据源对象中的关联对象后在数据绑定对象上触发。 onbeforeactivate 对象要被设置为当前元素前立即触发。 onbeforecut 当选中区从文档中删除之前在源对象触发。 onbeforedeactivate 在 activeElement 从当前对象变为父文档其它对象之前立即
正则表达式验证日期格式 dashuaifu 正则表达式 IT其它 java其它
正则表达式验证日期格式 function isDate(d){ var v = d.match(/^(\d{4})-(\d{1,2})-(\d{1,2})$/i); if(!v) { this.focus(); return false; } } <input value="2000-8-8" onblu
Yii CModel.rules() 方法、validate预定义完整列表、以及说说验证 dcj3sjt126com yii
public array rules () {return} array 要调用 validate() 时应用的有效性规则。返回属性的有效性规则。声明验证规则，应重写此方法。每个规则是数组具有以下结构：array('attribute list', 'validator name', 'on'=>'scenario name', ...validation
UITextAttributeTextColor = deprecated in iOS 7.0 dcj3sjt126com ios
In this lesson we used the key "UITextAttributeTextColor" to change the color of the UINavigationBar appearance to white. This prompts a warning "first deprecated in iOS 7.0." Ins
判断一个数是质数的几种方法 EmmaZhao Math python
质数也叫素数，是只能被1和它本身整除的正整数，最小的质数是2，目前发现的最大的质数是p=2^57885161-1【注1】。判断一个数是质数的最简单的方法如下： def isPrime1(n): for i in range(2, n): if n % i == 0: return False return True 但是在上面的方法中有一些冗余的计算，所以
SpringSecurity工作原理小解读坏我一锅粥 SpringSecurity
SecurityContextPersistenceFilter ConcurrentSessionFilter WebAsyncManagerIntegrationFilter HeaderWriterFilter CsrfFilter LogoutFilter Use
JS实现自适应宽度的Tag切换 ini JavaScript html Web css html5
效果体验：http://hovertree.com/texiao/js/3.htm 该效果使用纯JavaScript代码，实现TAB页切换效果，TAB标签根据内容自适应宽度，点击TAB标签切换内容页。 HTML文件代码： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"
Hbase Rest API : 数据查询 kane_xie REST hbase
hbase（hadoop）是用java编写的，有些语言（例如python）能够对它提供良好的支持，但也有很多语言使用起来并不是那么方便，比如c#只能通过thrift访问。Rest就能很好的解决这个问题。Hbase的org.apache.hadoop.hbase.rest包提供了rest接口，它内嵌了jetty作为servlet容器。启动命令：./bin/hbase rest s
JQuery实现鼠标拖动元素移动位置（源码+注释）明子健 jquery js 源码拖动鼠标
欢迎讨论指正！ print.html代码： <!DOCTYPE html> <html> <head> <meta http-equiv=Content-Type content="text/html;charset=utf-8"> <title>发票打印</title> &l
Postgresql 连表更新字段语法 update qifeifei PostgreSQL
下面这段sql本来目的是想更新条件下的数据，可是这段sql却更新了整个表的数据。sql如下： UPDATE tops_visa.visa_order SET op_audit_abort_pass_date = now() FROM tops_visa.visa_order as t1 INNER JOIN tops_visa.visa_visitor as t2 ON t1.
将redis,memcache结合使用的方案? tcrct redis cache
公司架构上使用了阿里云的服务，由于阿里的kvstore收费相当高，打算自建，自建后就需要自己维护，所以就有了一个想法，针对kvstore(redis)及ocs(memcache)的特点，想自己开发一个cache层，将需要用到list，set，map等redis方法的继续使用redis来完成，将整条记录放在memcache下，即findbyid，save等时就memcache，其它就对应使用redi
开发中遇到的诡异的bug wudixiaotie bug
今天我们服务器组遇到个问题：我们的服务是从Kafka里面取出数据，然后把offset存储到ssdb中，每个topic和partition都对应ssdb中不同的key，服务启动之后，每次kafka数据更新我们这边收到消息，然后存储之后就发现ssdb的值偶尔是-2,这就奇怪了，最开始我们是在代码中打印存储的日志，发现没什么问题，后来去查看ssdb的日志，才发现里面每次set的时候都会对同一个key

cs224w（图机器学习）2021冬季课程学习笔记18 Colab 4：异质图

文章目录

Question 1. DeepSNAP异质图简介

1.1 Question 1.1：分配Node Type and Node Features

1.2 Question 1.2：分配Edge Types

1.3 NetworkX异质图可视化

1.4 将NetworkX异质图转换为DeepSNAP异质图

1.5 Question1.3：每一node type有多少个节点

1.6 Question 1.4：每一message type有多少条边

1.7 Question 1.5：数据集划分：每一个split中有多少个节点？

1.8 DeepSNAP数据集可视化

2. 异质图节点预测任务

2.1 导包

2.2 Heterogeneous GNN Layer

2.3 Heterogeneous GNN Wrapper Layer

2.4 初始化Heterogeneous GNN Layers

2.5 HeteroGNN

2.6 构建 train() 和 test() 函数

2.7 设置超参

2.8 数据集导入及预处理

2.9 Training the Mean Aggregation

2.10 Training the Attention Aggregation

2.11 Attention for each Message Type

3. 其他正文及脚注未提及的参考资料

你可能感兴趣的:(人工智能学习笔记,深度学习,GNN,图机器学习,异质图,DeepSNAP)

2.6 构建 `train()` 和 `test()` 函数