主要参考:https://docs.dgl.ai/guide/graph-graphs-nodes-edges.html
之前讲了用dgl.DGLGraph()创建图的几种方式。这里提到用dgl.graph()也可以创建DGLGraph图。
与dgl.DGLGraph((u,v))类似,dgl.graph((u,v))中,u和v分别为头节点列表和尾节点列表,列表对应位置的元素确定一条边。
# edges 0->1, 0->2, 0->3, 1->3
u, v = th.tensor([0, 0, 0, 1]), th.tensor([1, 2, 3, 3])
g = dgl.graph((u, v))
print(g) # number of nodes are inferred from the max node IDs in the given edges
输出节点IDs
# Node IDs
print(g.nodes())
输出边,与创建时一样,按照头节点列表和尾节点列表形式。
# Edge end nodes
print(g.edges())
输出边的详细信息,除了头节点列表和尾节点列表,还有边的IDs。
# Edge end nodes and edge IDs
print(g.edges(form='all'))
如果有孤立节点,需要在创建边时显示设置节点数量
# If the node with the largest ID is isolated (meaning no edges),
# then one needs to explicitly set the number of nodes
g = dgl.graph((u, v), num_nodes=8)
也可以将单向图转化为双向图
bg = dgl.to_bidirected(g)
bg.edges()
DGL可以选择32-bit或64-bit整型来存储节点和边的IDs。如果一个图的节点数或边数少于 2 31 − 1 2^{31}-1 231−1个,则使用32-bit整型以节省内存。
DGL默认使用64-bit整型,可以在创建时手动设置,也可以通过long(),int()修改。
g32 = dgl.graph(edges, idtype=th.int32) # create a int32 graph
print(g32.idtype)
g64_2 = g32.long() # convert to int64
print(g64_2.idtype)
g32_2 = g64.int() # convert to int32
print(g32_2.idtype)
1.使用g.ndata[‘x’]可以为节点添加特征或访问节点特征
2.使用g.edata[‘x’]可以为边添加特征或访问边特征
3.只有数值型的才可以作为特征,可以是标量、向量、张量
4.节点特征名称可以与边特征名称重复
5.同一名称的特征只能有同一维度和类型
6.对于加权图,可以将权重作为边特征
1.可以从SciPy稀疏矩阵和NetworkX图创建。
import dgl
import torch as th
import scipy.sparse as sp
spmat = sp.rand(100, 100, density=0.05) # 5% nonzero entries
dgl.from_scipy(spmat) # from SciPy
import networkx as nx
nx_g = nx.path_graph(5) # a chain 0-1-2-3-4
dgl.from_networkx(nx_g) # from networkx
注意用nx.path_graph(5)转化而来的图有8条边,这是因为NetworkX图是无向图,而DGLGraph是有向图,一条无向边转为了两条有向边。
如果要避免这种情况,需要用networkx.DiGraph()构建有向图:
nxg = nx.DiGraph([(2, 1), (1, 2), (2, 3), (0, 0)])
dgl.from_networkx(nxg)
2.从磁盘中加载图
1)可以是CSV格式
2)JSON/GML格式
3)DGL Binary格式
使用这两个API,dgl.save_graphs(), dgl.load_graphs()可以实现图的保存和加载
异构图可以有不同类型的节点和边
格式为:‘关系:节点元组’。
其中‘关系’具体形式为:[头节点类型,边类型,尾节点类型];
节点元组具体形式为:([U], [V]),其中U和V分别代表头节点列表和尾节点列表。
import dgl
import torch as th
# Create a heterograph with 3 node types and 3 edges types.
graph_data = {
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'interacts', 'gene'): (th.tensor([0, 1]), th.tensor([2, 3])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))
}
g = dgl.heterograph(graph_data)
print(g.ntypes)
# ['disease', 'drug', 'gene']
print(g.etypes)
# ['interacts', 'interacts', 'treats']
print(g.canonical_etypes)
# [('drug', 'interacts', 'drug'),
# ('drug', 'interacts', 'gene'),
# ('drug', 'treats', 'disease')]
# A homogeneous graph
dgl.heterograph({
('node_type', 'edge_type', 'node_type'): (u, v)})
# A bipartite graph
dgl.heterograph({
('source_type', 'edge_type', 'destination_type'): (u, v)})
元图中只有各类型节点以及他们之间的各类型边:
print(g)
# Graph(num_nodes={'disease': 3, 'drug': 3, 'gene': 4},
# num_edges={('drug', 'interacts', 'drug'): 2,
# ('drug', 'interacts', 'gene'): 2,
# ('drug', 'treats', 'disease'): 1},
# metagraph=[('drug', 'drug', 'interacts'),
# ('drug', 'gene', 'interacts'),
# ('drug', 'disease', 'treats')])
print(g.metagraph().edges())
# OutMultiEdgeDataView([('drug', 'drug'), ('drug', 'gene'), ('drug', 'disease')])
1.在访问节点和边时,需要明确节点和边的类型
2.在访问节点和边的特征时,使用g.nodes[‘node_type’].data[‘feat_name’] 和g.edges[‘edge_type’].data[‘feat_name’].
# Get the number of all nodes in the graph
print(g.num_nodes())
# 10
# Get the number of drug nodes
print(g.num_nodes('drug'))
# 3
# Nodes of different types have separate IDs,
# hence not well-defined without a type specified
print(g.nodes())
# DGLError: Node type name must be specified if there are more than one node types.
print(g.nodes('drug'))
# tensor([0, 1, 2])
3.如果图只有一种节点或边类型,则不需要明确类型
g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
... ('drug', 'is similar', 'drug'): (th.tensor([0, 1]), th.tensor([2, 3]))
... })
print(g.nodes())
# tensor([0, 1, 2, 3])
# To set/get feature with a single type, no need to use the new syntax
g.ndata['hv'] = th.ones(4, 1)
如果从异构图中提取只包含某些关系的子图,构成边类型子图:
g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'interacts', 'gene'): (th.tensor([0, 1]), th.tensor([2, 3])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))
})
g.nodes['drug'].data['hv'] = th.ones(3, 1)
# Retain relations ('drug', 'interacts', 'drug') and ('drug', 'treats', 'disease')
# All nodes for 'drug' and 'disease' will be retained
eg = dgl.edge_type_subgraph(g, [('drug', 'interacts', 'drug'),
('drug', 'treats', 'disease')])
print(eg)
# Graph(num_nodes={'disease': 3, 'drug': 3},
# num_edges={('drug', 'interacts', 'drug'): 2, ('drug', 'treats', 'disease'): 1},
# metagraph=[('drug', 'drug', 'interacts'), ('drug', 'disease', 'treats')])
# The associated features will be copied as well
print(eg.nodes['drug'].data['hv'])
# tensor([[1.],
# [1.],
# [1.]])
使用dgl.DGLGraph.to_homogeneous()。会为所有类型的节点和边从0重新编号;合并指定的特征。
g = dgl.heterograph({
('drug', 'interacts', 'drug'): (th.tensor([0, 1]), th.tensor([1, 2])),
('drug', 'treats', 'disease'): (th.tensor([1]), th.tensor([2]))})
g.nodes['drug'].data['hv'] = th.zeros(3, 1)
g.nodes['disease'].data['hv'] = th.ones(3, 1)
g.edges['interacts'].data['he'] = th.zeros(2, 1)
g.edges['treats'].data['he'] = th.zeros(1, 2)
# By default, it does not merge any features
hg = dgl.to_homogeneous(g)
print('hv' in hg.ndata)
# False
# Copy edge features
# For feature copy, it expects features to have
# the same size and dtype across node/edge types
hg = dgl.to_homogeneous(g, edata=['he'])
# DGLError: Cannot concatenate column ‘he’ with shape Scheme(shape=(2,), dtype=torch.float32) and shape Scheme(shape=(1,), dtype=torch.float32)
# Copy node features
hg = dgl.to_homogeneous(g, ndata=['hv'])
print(hg.ndata['hv'])
# tensor([[1.],
# [1.],
# [1.],
# [0.],
# [0.],
# [0.]])
有两种方式,一种方式是在构建DGLGraph的时候,传递两个GPU张量;另一种方式是先在CPU上构建DGLGraph,然后通过to()复制到GPU
另外,GPU上的DGLGraph只接受GPU上的特征数据
import dgl
import torch as th
u, v = th.tensor([0, 1, 2]), th.tensor([2, 3, 4])
g = dgl.graph((u, v))
g.ndata['x'] = th.randn(5, 3) # original feature is on CPU
print(g.device)
# device(type='cpu')
cuda_g = g.to('cuda:0') # accepts any device objects from backend framework
print(cuda_g.device)
# device(type='cuda', index=0)
print(cuda_g.ndata['x'].device) # feature data is copied to GPU too
# device(type='cuda', index=0)
# A graph constructed from GPU tensors is also on GPU
u, v = u.to('cuda:0'), v.to('cuda:0')
g = dgl.graph((u, v))
print(g.device)
# device(type='cuda', index=0)