PGL图学习

图学习初印象

  • 什么是图
  • 什么是图学习
  • 图学习的应用
  • 图学习如何做
  • 图学习库简介
  • PGL使用
    • 环境安装
    • 使用PGL来创建一张图

什么是图

  • 图的两个基本元素:点、边
  • 图是一种同意描述复杂事物的语言
  • 常见的图:社交网络、推荐系统、化学分子结构…

什么是图学习

  • 图学习:Graph Learning。深度学习中的一个子领域,强调处理的数据对象为图。
  • 与一般深度学习的区别:能够方便处理不规则数据(树、图),同时也可以处理规则数据(如图像)。

图学习的应用

三大类应用:

  • 节点级别任务:金融诈骗检测(典型的节点分类)、自动驾驶中的3D点云目标检测
  • 边级别任务:推荐系统(典型的边预测)
  • 图级别任务:气味识别(典型的图分类)、发现“宇宙”
    PGL图学习_第1张图片

图学习如何做

  • 图游走类算法:通过在图上的游走,获得多个节点序列,再利用skip gram模型得到节点表示
  • 图神经网络:端到端模型,利用消息传递机制实现
  • 知识图谱嵌入算法:专门用于知识图谱的相关算法

图学习库简介

  • Github链接:https://github.com/PaddlePaddle/PGL
  • API文档:https://pgl.readthedocs.io/en/latest/

PGL使用

环境安装

# 安装PGL学习库
!pip install pgl
Looking in indexes: https://mirror.baidu.com/pypi/simple/
Collecting pgl
  Downloading https://mirror.baidu.com/pypi/packages/e2/84/6aac242f80a794f1169386d73bdc03f2e3467e4fa85b1286979ddf51b1a0/pgl-1.2.1-cp37-cp37m-manylinux1_x86_64.whl (7.9MB)
     |████████████████████████████████| 7.9MB 10.9MB/s eta 0:00:01
Requirement already satisfied: visualdl>=2.0.0b; python_version >= "3" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pgl) (2.0.3)
Collecting redis-py-cluster (from pgl)
  Downloading https://mirror.baidu.com/pypi/packages/2b/c5/3236720746fa357e214f2b9fe7e517642329f13094fc7eb339abd93d004f/redis_py_cluster-2.1.0-py2.py3-none-any.whl (41kB)
     |████████████████████████████████| 51kB 18.8MB/s eta 0:00:01
Requirement already satisfied: cython>=0.25.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pgl) (0.29)
Requirement already satisfied: numpy>=1.16.4 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pgl) (1.16.4)
Requirement already satisfied: requests in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0b; python_version >= "3"->pgl) (2.22.0)
Requirement already satisfied: flake8>=3.7.9 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0b; python_version >= "3"->pgl) (3.8.2)
Requirement already satisfied: Flask-Babel>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0b; python_version >= "3"->pgl) (1.0.0)
Requirement already satisfied: pre-commit in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0b; python_version >= "3"->pgl) (1.21.0)
Requirement already satisfied: flask>=1.1.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0b; python_version >= "3"->pgl) (1.1.1)
Requirement already satisfied: six>=1.14.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0b; python_version >= "3"->pgl) (1.15.0)
Requirement already satisfied: protobuf>=3.11.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0b; python_version >= "3"->pgl) (3.12.2)
Requirement already satisfied: Pillow>=7.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from visualdl>=2.0.0b; python_version >= "3"->pgl) (7.1.2)
Collecting redis<4.0.0,>=3.0.0 (from redis-py-cluster->pgl)
  Downloading https://mirror.baidu.com/pypi/packages/a7/7c/24fb0511df653cf1a5d938d8f5d19802a88cef255706fdda242ff97e91b7/redis-3.5.3-py2.py3-none-any.whl (72kB)
     |████████████████████████████████| 81kB 14.7MB/s eta 0:00:01
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0b; python_version >= "3"->pgl) (3.0.4)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0b; python_version >= "3"->pgl) (2019.9.11)
Requirement already satisfied: idna<2.9,>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0b; python_version >= "3"->pgl) (2.8)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from requests->visualdl>=2.0.0b; python_version >= "3"->pgl) (1.25.6)
Requirement already satisfied: mccabe<0.7.0,>=0.6.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0b; python_version >= "3"->pgl) (0.6.1)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0b; python_version >= "3"->pgl) (0.23)
Requirement already satisfied: pycodestyle<2.7.0,>=2.6.0a1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0b; python_version >= "3"->pgl) (2.6.0)
Requirement already satisfied: pyflakes<2.3.0,>=2.2.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flake8>=3.7.9->visualdl>=2.0.0b; python_version >= "3"->pgl) (2.2.0)
Requirement already satisfied: Jinja2>=2.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl>=2.0.0b; python_version >= "3"->pgl) (2.10.3)
Requirement already satisfied: pytz in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl>=2.0.0b; python_version >= "3"->pgl) (2019.3)
Requirement already satisfied: Babel>=2.3 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Flask-Babel>=1.0.0->visualdl>=2.0.0b; python_version >= "3"->pgl) (2.8.0)
Requirement already satisfied: nodeenv>=0.11.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0b; python_version >= "3"->pgl) (1.3.4)
Requirement already satisfied: pyyaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0b; python_version >= "3"->pgl) (5.1.2)
Requirement already satisfied: toml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0b; python_version >= "3"->pgl) (0.10.0)
Requirement already satisfied: cfgv>=2.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0b; python_version >= "3"->pgl) (2.0.1)
Requirement already satisfied: aspy.yaml in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0b; python_version >= "3"->pgl) (1.3.0)
Requirement already satisfied: virtualenv>=15.2 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0b; python_version >= "3"->pgl) (16.7.9)
Requirement already satisfied: identify>=1.0.0 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from pre-commit->visualdl>=2.0.0b; python_version >= "3"->pgl) (1.4.10)
Requirement already satisfied: click>=5.1 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl>=2.0.0b; python_version >= "3"->pgl) (7.0)
Requirement already satisfied: Werkzeug>=0.15 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl>=2.0.0b; python_version >= "3"->pgl) (0.16.0)
Requirement already satisfied: itsdangerous>=0.24 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from flask>=1.1.1->visualdl>=2.0.0b; python_version >= "3"->pgl) (1.1.0)
Requirement already satisfied: setuptools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from protobuf>=3.11.0->visualdl>=2.0.0b; python_version >= "3"->pgl) (41.4.0)
Requirement already satisfied: zipp>=0.5 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->flake8>=3.7.9->visualdl>=2.0.0b; python_version >= "3"->pgl) (0.6.0)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from Jinja2>=2.5->Flask-Babel>=1.0.0->visualdl>=2.0.0b; python_version >= "3"->pgl) (1.1.1)
Requirement already satisfied: more-itertools in /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages (from zipp>=0.5->importlib-metadata; python_version < "3.8"->flake8>=3.7.9->visualdl>=2.0.0b; python_version >= "3"->pgl) (7.2.0)
Installing collected packages: redis, redis-py-cluster, pgl
Successfully installed pgl-1.2.1 redis-3.5.3 redis-py-cluster-2.1.0

使用PGL来创建一张图

假设我们有下面这张图,其中包含10个节点,14条边。
PGL图学习_第2张图片
我们的目的是:训练一个图模型,使得该图模型可以区分图上的绿色节点和黄色节点。我们可以使用以下代码来构图。

import pgl
from pgl import graph # 导入PGL中的图模块
import paddle.fluid as fluid # 导入飞桨框架
import numpy as np

def build_graph():
# 定义图中的节点数目,我们使用数字来表示图中的每个节点
num_nodes = 10

# 定义图中的边集
edge_list = [(2,0),(2,1),(3,1),(4,0),(5,0),(6, 0), (6, 4), (6, 5), (7, 0), (7, 1),
             (7, 2), (7, 3), (8, 0), (9, 7)]
# 随机初始化节点特征,特征维度为d
d = 16
feature = np.random.randn(num_nodes, d).astype("float32")

# 随机地为每条边赋值一个权重
edge_feature = np.random.randn(len(edge_list), 1).astype("float32")

# 创建图对象,最多四个输入
g = graph.Graph(num_nodes = num_nodes, edges = edge_list, node_feat = {'feature': feature}, edge_feat = {'edge_feat': edge_feature})

return g

g = build_graph

定义图模型

# 定义一个同时传递节点特征和边权重的简单模型层
def model_layer(gw, nfeat, efeat, hidden_size, name, activation):
	'''
    gw: GraphWrapper 图数据容器,用于在定义模型的时候使用,后续训练时再feed入真实数据
    nfeat: 节点特征
    efeat: 边权重
    hidden_size: 模型隐藏层维度
    activation: 使用的激活函数
    '''
    # 定义send函数
    def send_func(src_feat, dst_feat, edge_feat):
    	# 将源节点的节点特征和边权重共同作为消息发送
    	return src_feat['h'] * edge_feat['e']
	# 定义recv函数
	def recv_func(feat):
		# 目标节点接收源节点信息,采用sum的聚合方式
		return fluid.layers.sequence_pool(feat, pool_type='sum')
	# 触发消息传递机制
	msg = gw.send(send_func, nfeat_list=[('h', nfeat)], efeat_list = [('e', efeat)]
	output = gw.recv(msg, recv_func)
	output = fluid.layers.fc(output, size=hidden_size, bias_attr=False, act=activation, name=name)
	return output

模型定义

class Model(object):
    def __init__(self, graph):
        """
        graph: 我们前面创建好的图
        """
        # 创建 GraphWrapper 图数据容器,用于在定义模型的时候使用,后续训练时再feed入真实数据
        self.gw = pgl.graph_wrapper.GraphWrapper(name='graph',
                    node_feat=graph.node_feat_info(),
                    edge_feat=graph.edge_feat_info())
        # 作用同 GraphWrapper,此处用作节点标签的容器
        self.node_label = fluid.layers.data("node_label", shape=[None, 1],
                    dtype="float32", append_batch_size=False)

    def build_model(self):
        # 定义两层model_layer
        output = model_layer(self.gw, 
                             self.gw.node_feat['feature'], 
                             self.gw.edge_feat['edge_feature'],
                             hidden_size=8, 
                             name='layer_1', 
                             activation='relu')
        output = model_layer(self.gw, 
                             output, 
                             self.gw.edge_feat['edge_feature'],
                             hidden_size=1, 
                             name='layer_2', 
                             activation=None)
                             
        # 对于二分类任务,可以使用以下 API 计算损失
        loss = fluid.layers.sigmoid_cross_entropy_with_logits(x=output, 
                                                              label=self.node_label)
        # 计算平均损失
        loss = fluid.layers.mean(loss)
        
        # 计算准确率
        prob = fluid.layers.sigmoid(output)
        pred = prob > 0.5
        pred = fluid.layers.cast(prob > 0.5, dtype="float32")
        correct = fluid.layers.equal(pred, self.node_label)
        correct = fluid.layers.cast(correct, dtype="float32")
        acc = fluid.layers.reduce_mean(correct)

        return loss, acc

训练前准备

# 是否在 GPU 或 CPU 环境运行
use_cuda = False
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()

# 定义程序,也就是我们的 Program
startup_program = fluid.Program() # 用于初始化模型参数
train_program = fluid.Program()   # 训练时使用的主程序,包含前向计算和反向梯度计算
test_program = fluid.Program()    # 测试时使用的程序,只包含前向计算

with fluid.program_guard(train_program, startup_program):
    model = Model(g)
    # 创建模型和计算 Loss
    loss, acc = model.build_model()
    # 选择Adam优化器,学习率设置为0.01
    adam = fluid.optimizer.Adam(learning_rate=0.01)
    adam.minimize(loss) # 计算梯度和执行梯度反向传播过程

# 复制构造 test_program,与 train_program的区别在于不需要梯度计算和反向过程。
test_program = train_program.clone(for_test=True)

# 定义一个在 place(CPU)上的Executor来执行program
exe = fluid.Executor(place)
# 参数初始化
exe.run(startup_program) 

# 获取真实图数据
feed_dict = model.gw.to_feed(g) 
# 获取真实标签数据
# 由于我们是做节点分类任务,因此可以简单的用0、1表示节点类别。其中,黄色点标签为0,绿色点标签为1。
y = [0,1,1,1,0,0,0,1,0,1]
label = np.array(y, dtype="float32")
label = np.expand_dims(label, -1)
feed_dict['node_label'] = label
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/pgl/graph_wrapper.py:151: UserWarning: The edge features in argument `efeat_list` should be fetched from a instance of `pgl.graph_wrapper.GraphWrapper`, because we have sorted the edges and the order of edges is changed.
Therefore, if you use external edge features, the order of features of each edge may not match its edge, which can cause serious errors.
If you use the `efeat_list` correctly, please ignore this warning.
  "The edge features in argument `efeat_list` should be fetched "

开始训练

for epoch in range(30):
	train_loss = exe.run(train_program, feed=feed_dict, fetch_list=[loss], return_numpy=True)[0]
	print('Epoch %d | Loss: %f' % (epoch, train_loss))
Epoch 0 | Loss: 0.906734
Epoch 1 | Loss: 0.839262
Epoch 2 | Loss: 0.777020
Epoch 3 | Loss: 0.722640
Epoch 4 | Loss: 0.678117
Epoch 5 | Loss: 0.642708
Epoch 6 | Loss: 0.621016
Epoch 7 | Loss: 0.607005
Epoch 8 | Loss: 0.597986
Epoch 9 | Loss: 0.592153
Epoch 10 | Loss: 0.588311
Epoch 11 | Loss: 0.585515
Epoch 12 | Loss: 0.583296
Epoch 13 | Loss: 0.581387
Epoch 14 | Loss: 0.579625
Epoch 15 | Loss: 0.577935
Epoch 16 | Loss: 0.576279
Epoch 17 | Loss: 0.574633
Epoch 18 | Loss: 0.572992
Epoch 19 | Loss: 0.571337
Epoch 20 | Loss: 0.569672
Epoch 21 | Loss: 0.568003
Epoch 22 | Loss: 0.566335
Epoch 23 | Loss: 0.564675
Epoch 24 | Loss: 0.563316
Epoch 25 | Loss: 0.562035
Epoch 26 | Loss: 0.560765
Epoch 27 | Loss: 0.559507
Epoch 28 | Loss: 0.558265
Epoch 29 | Loss: 0.557040

模型测试

test_acc = exe.run(test_program, feed=feed_dict, fetch_list=[acc], return_numpy=True)[0]
print("Test Acc: %f" % test_acc)
Test Acc: 0.700000
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:613: UserWarning: The variable graph/edges_dst is not found in program. It is not declared or is pruned.
  % name)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:613: UserWarning: The variable graph/indegree is not found in program. It is not declared or is pruned.
  % name)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:613: UserWarning: The variable graph/graph_lod is not found in program. It is not declared or is pruned.
  % name)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:613: UserWarning: The variable graph/num_graph is not found in program. It is not declared or is pruned.
  % name)

你可能感兴趣的:(图网络,pgl,图学习,python)