序列模型,只考虑上一节点到当前节点的过渡关系 | 图模型,考虑更复杂的过渡关系,包括若干出点和入点与当前节点的联系 |
仅考虑用户的当前兴趣 | 使用Attention机制,考虑用户的当前兴趣和整体兴趣 |
循环次数由序列长度决定 | 循环次数是个超参,SRGNN默认为1 |
有负采样,pairwise loss(BPR、TOP1) | 交叉熵损失 |
minibatch,代码比较复杂 | data augmentation,实现比较简单(但序列长度较长时不适用于RNN) |
import networkx as nx
import matplotlib.pyplot as plt
edges = [(1, 2), (2, 3), (3, 2), (2, 4)]
# 1. 初始化有向图
G = nx.DiGraph()
# 2. 通过边集加载数据
# 3. 打印所有节点
# 4. 打印所有边
# 5. 画图
# 6.显示
则我们这里通过公式(1)来对其邻居信息进行聚合, 这里主要通过矩阵 A s , i A_{s, i} As,i 和用户的序列 [ v 1 t − 1 , … , v n t − 1 ] T ∈ R n × d \left[v_1^{t-1}, \ldots, v_n^{t-1}\right]^T \in R^{n \times d} [v1t−1,…,vnt−1]T∈Rn×d 的矩阵乘法进行聚合的。
A : [batch,n,2n] 图的矩阵
hidden : [batch,n,d] 用户序列的emb
in矩阵:A[:, :, :A.size(1)]
out矩阵:A[:, :, A.size(1):2 * A.size(1)]
inputs : 就是公式1中的 a
input_in = paddle.matmul(A[:, :, :A.shape[1]], self.linear_edge_in(hidden)) + self.b_iah
input_out = paddle.matmul(A[:, :, A.shape[1]:], self.linear_edge_out(hidden)) + self.b_ioh
# [batch_size, max_session_len, embedding_size * 2]
inputs = paddle.concat([input_in, input_out], 2)
在得到公式(1)中的 a s , i t a_{s, i}^t as,it 之后, 根据公式(2)(3)计算出两个中间变量 z s , i t , r s , i t z_{s, i}^t, r_{s, i}^t zs,it,rs,it 可以简单的类比LSTM, 认为 z s , i t , r s , i t z_{s, i}^t, r_{s, i}^t zs,it,rs,it 分别是遗忘门和更新门。
z s , i t = σ ( W z a s , i t + U z v i t − 1 ) ∈ R d (2) z_{s, i}^t=\sigma\left(W_z a_{s, i}^t+U_z v_i^{t-1}\right) \in R^d \tag{2} zs,it=σ(Wzas,it+Uzvit−1)∈Rd(2)
r s , i t = σ ( W r a s , i t + U r v i t − 1 ) ∈ R d (3) r_{s, i}^t=\sigma\left(W_r a_{s, i}^t+U_r v_i^{t-1}\right) \in R^d \tag{3} rs,it=σ(Wras,it+Urvit−1)∈Rd(3)
注意:我们在计算 z s , i t , r s , i t z_{s, i}^t, r_{s, i}^t zs,it,rs,it 的逻辑是完全一样的, 唯一的区别就是用了不同的参数权重而已. 在得到公式(2)(3)的中间变量之后,我们通过公式(4)计算出更新门下一步更新的特征, 以及根据公式(5)来得出最终结果
v i t ∼ = tanh ( W o a s , i t + U o ( r s , i t ⊙ v i t − 1 ) ) ∈ R d (4) \begin{gathered} v_i^{t^{\sim}}=\tanh \left(W_o a_{s, i}^t+U_o\left(r_{s, i}^t \odot v_i^{t-1}\right)\right) \in R^d \tag{4} \end{gathered} vit∼=tanh(Woas,it+Uo(rs,it⊙vit−1))∈Rd(4)
v i t = ( 1 − z s , i t ) ⊙ v i t − 1 + z s , i t ⊙ v i t ∼ ∈ R d (5) \begin{gathered} v_i^t=\left(1-z_{s, i}^t\right) \odot v_i^{t-1}+z_{s, i}^t \odot v_i^{t^{\sim}} \in R^d \tag{5} \end{gathered} vit=(1−zs,it)⊙vit−1+zs,it⊙vit∼∈Rd(5)
[batch_size, max_session_len, embedding_size * 3]
inputs : 公式(1)中的a
hidden : 用户序列,也就是v^{t-1}
# gi.size equals to gh.size, shape of [batch_size, max_session_len, embedding_size * 3]
gi = paddle.matmul(inputs, self.w_ih) + self.b_ih
gh = paddle.matmul(hidden, self.w_hh) + self.b_hh
# (batch_size, max_session_len, embedding_size)
i_r, i_i, i_n = gi.chunk(3, 2) # 三个W*a
h_r, h_i, h_n = gh.chunk(3, 2) # 三个U*v
reset_gate = F.sigmoid(i_r + h_r) #公式(2)
input_gate = F.sigmoid(i_i + h_i) #公式(3)
new_gate = paddle.tanh(i_n + reset_gate * h_n) #公式(4)
hy = (1 - input_gate) * hidden + input_gate * new_gate # 公式(5)
在通过GNN获取了Item的嵌入表征之后,, 剩下的就是讲用户序列的多个Item的嵌入表征融合成一个整体的序列的嵌入表征。
这里SR-GNN首先利用了Attention机制来获取序列中每一个Item对于序列中最后一个Item v n ( s 1 ) v_n\left(s_1\right) vn(s1) 的attention score, 然后将其加权求和,其具体的计算过程如下
a i = q T σ ( W 1 v n + W 2 v i + c ) ∈ R 1 s g = ∑ i = 1 n a i v I ∈ R d \begin{gathered} a_i=\mathbf{q}^T \sigma\left(W_1 v_n+W_2 v_i+c\right) \in R^1 \\ s_g=\sum_{i=1}^n a_i v_I \in R^d \end{gathered} ai=qTσ(W1vn+W2vi+c)∈R1sg=i=1∑naivI∈Rd
在得到 s g s_g sg 之后,我们将 s g s_g sg 与序列中的最后一个Item信息相结合,得到最终的序列的嵌入表征:
s h = W 3 [ s 1 ; s g ] ∈ R d s_h=W_3\left[s_1 ; s_g\right] \in R^d sh=W3[s1;sg]∈Rd
seq_hidden : 序列中每一个item的emb
ht : 序列中最后一个item的emb,就是公式6~7中的v_n(s_1)
q1 : 公式(6)中的 W_1 v_n
q2 : 公式(6)中的 W_2 v_i
alpha : 公式(6)中的alpha
a : 公式(6)中的s_g
seq_hidden = paddle.take_along_axis(hidden,alias_inputs,1)
# fetch the last hidden state of last timestamp
item_seq_len = paddle.sum(mask,axis=1)
ht = self.gather_indexes(seq_hidden, item_seq_len - 1)
q1 = self.linear_one(ht).reshape([ht.shape[0], 1, ht.shape[1]])
q2 = self.linear_two(seq_hidden)
alpha = self.linear_three(F.sigmoid(q1 + q2))
a = paddle.sum(alpha * seq_hidden * mask.reshape([mask.shape[0], -1, 1]), 1)
user_emb = self.linear_transform(paddle.concat([a, ht], axis=1))
使用交叉熵损失函数: L ( y ^ ) = − ∑ i = 1 m y i log ( y ^ i ) + ( 1 − y i ) log ( 1 − y ^ i ) \mathcal{L}(\hat{\mathbf{y}})=-\sum_{i=1}^m \mathbf{y}_i \log \left(\hat{\mathbf{y}}_i\right)+\left(1-\mathbf{y}_i\right) \log \left(1-\hat{\mathbf{y}}_i\right) L(y^)=−i=1∑myilog(y^i)+(1−yi)log(1−y^i)
class SeqnenceDataset(Dataset):
def __init__(self, config, df, phase='train'):
self.config = config
self.df = df
self.max_length = self.config['max_length']
self.df = self.df.sort_values(by=['user_id', 'timestamp'])
self.user2item = self.df.groupby('user_id')['item_id'].apply(list).to_dict()
self.user_list = self.df['user_id'].unique()
self.phase = phase
def __len__(self, ):
return len(self.user2item)
def __getitem__(self, index):
if self.phase == 'train':
user_id = self.user_list[index]
item_list = self.user2item[user_id]
hist_item_list = []
hist_mask_list = []
k = random.choice(range(4, len(item_list))) # 从[8,len(item_list))中随机选择一个index
# k = np.random.randint(2,len(item_list))
item_id = item_list[k] # 该index对应的item加入item_id_list
if k >= self.max_length: # 选取seq_len个物品
hist_item_list.append(item_list[k - self.max_length: k])
hist_mask_list.append([1.0] * self.max_length)
hist_item_list.append(item_list[:k] + [0] * (self.max_length - k))
hist_mask_list.append([1.0] * k + [0.0] * (self.max_length - k))
return paddle.to_tensor(hist_item_list).squeeze(0), paddle.to_tensor(hist_mask_list).squeeze(
0), paddle.to_tensor([item_id])
user_id = self.user_list[index]
item_list = self.user2item[user_id]
hist_item_list = []
hist_mask_list = []
k = int(0.8 * len(item_list))
# k = len(item_list)-1
if k >= self.max_length: # 选取seq_len个物品
hist_item_list.append(item_list[k - self.max_length: k])
hist_mask_list.append([1.0] * self.max_length)
hist_item_list.append(item_list[:k] + [0] * (self.max_length - k))
hist_mask_list.append([1.0] * k + [0.0] * (self.max_length - k))
return paddle.to_tensor(hist_item_list).squeeze(0), paddle.to_tensor(hist_mask_list).squeeze(
0), item_list[k:]
def get_test_gd(self):
self.test_gd = {}
for user in self.user2item:
item_list = self.user2item[user]
test_item_index = int(0.8 * len(item_list))
self.test_gd[user] = item_list[test_item_index:]
return self.test_gd
class GNN(nn.Layer):
def __init__(self, embedding_size, step=1):
super(GNN, self).__init__()
self.step = step
self.embedding_size = embedding_size
self.input_size = embedding_size * 2
self.gate_size = embedding_size * 3
self.w_ih = self.create_parameter(shape=[self.input_size, self.gate_size])
self.w_hh = self.create_parameter(shape=[self.embedding_size, self.gate_size])
self.b_ih = self.create_parameter(shape=[self.gate_size])
self.b_hh = self.create_parameter(shape=[self.gate_size])
self.b_iah = self.create_parameter(shape=[self.embedding_size])
self.b_ioh = self.create_parameter(shape=[self.embedding_size])
self.linear_edge_in = nn.Linear(self.embedding_size, self.embedding_size)
self.linear_edge_out = nn.Linear(self.embedding_size, self.embedding_size)
def GNNCell(self, A, hidden):
input_in = paddle.matmul(A[:, :, :A.shape[1]], self.linear_edge_in(hidden)) + self.b_iah
input_out = paddle.matmul(A[:, :, A.shape[1]:], self.linear_edge_out(hidden)) + self.b_ioh
# [batch_size, max_session_len, embedding_size * 2]
inputs = paddle.concat([input_in, input_out], 2)
# gi.size equals to gh.size, shape of [batch_size, max_session_len, embedding_size * 3]
gi = paddle.matmul(inputs, self.w_ih) + self.b_ih
gh = paddle.matmul(hidden, self.w_hh) + self.b_hh
# (batch_size, max_session_len, embedding_size)
i_r, i_i, i_n = gi.chunk(3, 2)
h_r, h_i, h_n = gh.chunk(3, 2)
reset_gate = F.sigmoid(i_r + h_r)
input_gate = F.sigmoid(i_i + h_i)
new_gate = paddle.tanh(i_n + reset_gate * h_n)
hy = (1 - input_gate) * hidden + input_gate * new_gate
return hy
def forward(self, A, hidden):
for i in range(self.step):
hidden = self.GNNCell(A, hidden)
return hidden
部分如下,用到上面的GNN Class
,同时和之前说的一样,经过attention的 s g s_g sg 与序列中的最后一个Item信息相结合,得到最终的序列的嵌入表征:
s h = W 3 [ s 1 ; s g ] ∈ R d s_h=W_3\left[s_1 ; s_g\right] \in R^d sh=W3[s1;sg]∈Rd
该user embedding: s h s_h sh和item embedding内积计算score(如上图所示),使用交叉熵损失函数:
z ^ i = s h ⊤ v i . \hat{\mathbf{z}}_i=\mathbf{s}_{\mathrm{h}}^{\top} \mathbf{v}_i . z^i=sh⊤vi.
y ^ = softmax ( z ^ ) , \hat{\mathbf{y}}=\operatorname{softmax}(\hat{\mathbf{z}}), y^=softmax(z^),
L ( y ^ ) = − ∑ i = 1 m y i log ( y ^ i ) + ( 1 − y i ) log ( 1 − y ^ i ) \mathcal{L}(\hat{\mathbf{y}})=-\sum_{i=1}^m \mathbf{y}_i \log \left(\hat{\mathbf{y}}_i\right)+\left(1-\mathbf{y}_i\right) \log \left(1-\hat{\mathbf{y}}_i\right) L(y^)=−i=1∑myilog(y^i)+(1−yi)log(1−y^i)
class SRGNN(nn.Layer):
r"""SRGNN regards the conversation history as a directed graph.
In addition to considering the connection between the item and the adjacent item,
it also considers the connection with other interactive items.
Such as: A example of a session sequence(eg:item1, item2, item3, item2, item4) and the connection matrix A
Outgoing edges:
=== ===== ===== ===== =====
\ 1 2 3 4
=== ===== ===== ===== =====
1 0 1 0 0
2 0 0 1/2 1/2
3 0 1 0 0
4 0 0 0 0
=== ===== ===== ===== =====
Incoming edges:
=== ===== ===== ===== =====
\ 1 2 3 4
=== ===== ===== ===== =====
1 0 0 0 0
2 1/2 0 1/2 0
3 0 1 0 0
4 0 1 0 0
=== ===== ===== ===== =====
def __init__(self, config):
super(SRGNN, self).__init__()
# load parameters info
self.config = config
self.embedding_size = config['embedding_dim']
self.step = config['step']
self.n_items = self.config['n_items']
# define layers and loss
# item embedding
self.item_emb = nn.Embedding(self.n_items, self.embedding_size, padding_idx=0)
# define layers and loss
self.gnn = GNN(self.embedding_size, self.step)
self.linear_one = nn.Linear(self.embedding_size, self.embedding_size)
self.linear_two = nn.Linear(self.embedding_size, self.embedding_size)
self.linear_three = nn.Linear(self.embedding_size, 1, bias_attr=False)
self.linear_transform = nn.Linear(self.embedding_size * 2, self.embedding_size)
self.loss_fun = nn.CrossEntropyLoss()
# parameters initialization
def gather_indexes(self, output, gather_index):
"""Gathers the vectors at the specific positions over a minibatch"""
# gather_index = gather_index.view(-1, 1, 1).expand(-1, -1, output.shape[-1])
gather_index = gather_index.reshape([-1, 1, 1])
gather_index = paddle.repeat_interleave(gather_index,output.shape[-1],2)
output_tensor = paddle.take_along_axis(output, gather_index, 1)
return output_tensor.squeeze(1)
def calculate_loss(self,user_emb,pos_item):
all_items = self.item_emb.weight
scores = paddle.matmul(user_emb, all_items.transpose([1, 0]))
return self.loss_fun(scores,pos_item)
def output_items(self):
return self.item_emb.weight
def reset_parameters(self, initializer=None):
for weight in self.parameters():
def _get_slice(self, item_seq):
# Mask matrix, shape of [batch_size, max_session_len]
mask = (item_seq>0).astype('int32')
items, n_node, A, alias_inputs = [], [], [], []
max_n_node = item_seq.shape[1]
item_seq = item_seq.cpu().numpy()
for u_input in item_seq:
node = np.unique(u_input)
items.append(node.tolist() + (max_n_node - len(node)) * [0])
u_A = np.zeros((max_n_node, max_n_node))
for i in np.arange(len(u_input) - 1):
if u_input[i + 1] == 0:
u = np.where(node == u_input[i])[0][0]
v = np.where(node == u_input[i + 1])[0][0]
u_A[u][v] = 1
u_sum_in = np.sum(u_A, 0)
u_sum_in[np.where(u_sum_in == 0)] = 1
u_A_in = np.divide(u_A, u_sum_in)
u_sum_out = np.sum(u_A, 1)
u_sum_out[np.where(u_sum_out == 0)] = 1
u_A_out = np.divide(u_A.transpose(), u_sum_out)
u_A = np.concatenate([u_A_in, u_A_out]).transpose()
alias_inputs.append([np.where(node == i)[0][0] for i in u_input])
# The relative coordinates of the item node, shape of [batch_size, max_session_len]
alias_inputs = paddle.to_tensor(alias_inputs)
# The connecting matrix, shape of [batch_size, max_session_len, 2 * max_session_len]
A = paddle.to_tensor(A)
# The unique item nodes, shape of [batch_size, max_session_len]
items = paddle.to_tensor(items)
return alias_inputs, A, items, mask
def forward(self, item_seq, mask, item, train=True):
if train:
alias_inputs, A, items, mask = self._get_slice(item_seq)
hidden = self.item_emb(items)
hidden = self.gnn(A, hidden)
alias_inputs = alias_inputs.reshape([-1, alias_inputs.shape[1],1])
alias_inputs = paddle.repeat_interleave(alias_inputs, self.embedding_size, 2)
seq_hidden = paddle.take_along_axis(hidden,alias_inputs,1)
# fetch the last hidden state of last timestamp
item_seq_len = paddle.sum(mask,axis=1)
ht = self.gather_indexes(seq_hidden, item_seq_len - 1)
q1 = self.linear_one(ht).reshape([ht.shape[0], 1, ht.shape[1]])
q2 = self.linear_two(seq_hidden)
# attention机制
alpha = self.linear_three(F.sigmoid(q1 + q2))
a = paddle.sum(alpha * seq_hidden * mask.reshape([mask.shape[0], -1, 1]), 1)
# attention_emb + last_item_emb
user_emb = self.linear_transform(paddle.concat([a, ht], axis=1))
loss = self.calculate_loss(user_emb,item)
output_dict = {
'user_emb': user_emb,
'loss': loss
alias_inputs, A, items, mask = self._get_slice(item_seq)
hidden = self.item_emb(items)
hidden = self.gnn(A, hidden)
alias_inputs = alias_inputs.reshape([-1, alias_inputs.shape[1],1])
alias_inputs = paddle.repeat_interleave(alias_inputs, self.embedding_size, 2)
seq_hidden = paddle.take_along_axis(hidden, alias_inputs,1)
# fetch the last hidden state of last timestamp
item_seq_len = paddle.sum(mask, axis=1)
ht = self.gather_indexes(seq_hidden, item_seq_len - 1)
q1 = self.linear_one(ht).reshape([ht.shape[0], 1, ht.shape[1]])
q2 = self.linear_two(seq_hidden)
alpha = self.linear_three(F.sigmoid(q1 + q2))
a = paddle.sum(alpha * seq_hidden * mask.reshape([mask.shape[0], -1, 1]), 1)
user_emb = self.linear_transform(paddle.concat([a, ht], axis=1))
output_dict = {
'user_emb': user_emb,
return output_dict
其实如果为了更加方便写GNN,也可以直接使用pyg或dgl框架(GNN模型的GNN layer部分完成message function、aggregation function、update function,如上图),关于pyg的下载需要三个东西:
import os
if 'IS_GRADESCOPE_ENV' not in os.environ:
!pip install torch-scatter -f
!pip install torch-sparse -f
!pip install torch-geometric
