ALS(Alternating Least Square,交替最小二乘法)指使用最小二乘法的一种协同推荐算法。在UserCF和ItemCF中,我们需要计算用户-用户相似性矩阵/商品-商品相似性矩阵,对于大数据量的情况下很难处理好。那我们能否像PCA,word embedding那样,用低维度的向量来表示用户和商品呢?
ALS算法将user-item评分矩阵 R R R拆分成两个矩阵 U U U和 V T V^T VT,其中 U u . U_{u.} Uu.代表了用户u在d个维度上的潜在个人偏好, V i . V_{i.} Vi.代表了物品i在d个维度上的特征。
U u . = [ U u 1 , . . . , U u k , . . . , U u d ] V i . = [ V v 1 , . . . , V v k , . . . , V v d ] U_{u.}=[U_{u1},...,U_{uk},...,U_{ud}] \\ V_{i.}=[V_{v1},...,V_{vk},...,V_{vd}] Uu.=[Uu1,...,Uuk,...,Uud]Vi.=[Vv1,...,Vvk,...,Vvd]
我们要寻找合适的U和V,使得
R ^ = U V T ≈ R \hat R = UV^T \approx R R^=UVT≈R
最后我们使用学到的 U U U和 V T V^T VT对未知的用户评分 r u i r_{ui} rui进行预测,有
r u i = U u . V i . T r_{ui}=U_{u.}V_{i.}^T rui=Uu.Vi.T
这实际上是一个最优化问题,我们需要找到合适的U和V,使得 R R R和 R ^ \hat R R^的差距最小,写出目标函数如下
min θ ∑ u = 1 n ∑ i = 1 m y u i [ 1 2 ( r u i − U u . V i . T ) 2 + α u 2 ∣ ∣ U u . ∣ ∣ 2 + α v 2 ∣ ∣ v i . ∣ ∣ 2 ] \min_\theta \sum_{u=1}^n\sum_{i=1}^my_{ui}\left[\frac{1}{2}(r_{ui}-U_{u.}V_{i.}^T)^2+\frac{\alpha_u}{2}||U_{u.}||^2+\frac{\alpha_v}{2}||v_{i.}||^2\right] θminu=1∑ni=1∑myui[21(rui−Uu.Vi.T)2+2αu∣∣Uu.∣∣2+2αv∣∣vi.∣∣2]
其中 y u i y_{ui} yui表示如果用户u对物品i有评分,则输出1,否则输出0。令 f f f表示目标函数,则有
∇ U u . = ∂ f ∂ U u , = U u . ∑ i = 1 m y u i ( V I . T V i . + α u I ) − ∑ i = 1 m y u i r u i V i . \nabla U_{u.}=\frac{\partial f}{\partial U_{u,}}=U_{u.}\sum_{i=1}^my_{ui}(V^T_{I.}V_{i.}+\alpha_uI)-\sum_{i=1}^my_{ui}r_{ui}V_{i.} ∇Uu.=∂Uu,∂f=Uu.i=1∑myui(VI.TVi.+αuI)−i=1∑myuiruiVi.
令偏导为0,有
U u . = b u A u − 1 . . . . . . . . . . ( 1 ) b u = ∑ i = 1 m y u i r u i V i . A u = ∑ i = 1 m y u i ( V i . T V i . + α u I ) \begin{aligned} &U_{u.}=b_uA_u^{-1}..........(1) \\ &b_u=\sum_{i=1}^my_{ui}r_{ui}V_{i.} \\ &A_u=\sum_{i=1}^my_{ui}(V^T_{i.}V_{i.}+\alpha_uI) \end{aligned} Uu.=buAu−1..........(1)bu=i=1∑myuiruiVi.Au=i=1∑myui(Vi.TVi.+αuI)
同理,对于 V i . V_{i.} Vi.有
∇ V i . = ∂ f ∂ V i , = V i . ∑ i = 1 n y u i ( U u . T U u . + α v I ) − ∑ i = 1 n y u i r u i U u . \begin{aligned} &\nabla V_{i.}=\frac{\partial f}{\partial V_{i,}}=V_{i.}\sum_{i=1}^ny_{ui}(U^T_{u.}U_{u.}+\alpha_vI)-\sum_{i=1}^ny_{ui}r_{ui}U_{u.} \end{aligned} ∇Vi.=∂Vi,∂f=Vi.i=1∑nyui(Uu.TUu.+αvI)−i=1∑nyuiruiUu.
求偏导后,有
V i . = b i A i − 1 . . . . . . . . . . ( 2 ) b i = ∑ i = 1 n y u i r u i U u . A i = ∑ i = 1 n y u i ( U u . T U u . + α v I ) \begin{aligned} &V_{i.}=b_iA_i^{-1}..........(2) \\ &b_i=\sum_{i=1}^ny_{ui}r_{ui}U_{u.} \\ &A_i=\sum_{i=1}^ny_{ui}(U^T_{u.}U_{u.}+\alpha_vI) \end{aligned} Vi.=biAi−1..........(2)bi=i=1∑nyuiruiUu.Ai=i=1∑nyui(Uu.TUu.+αvI)
求完偏导数之后,我们的算法也就结束了, 在每轮迭代的时候,使用上面的式子1和2更新权重即可。由于上面的求解过程每次都要遍历一轮样本,因此也有另一个版本的算法——随机梯度下降SGD,每次只选取一个样本进行更新,最终也可以收敛。
前面的ALS算法只是简单地用 R ^ = U V T \hat R=UV^T R^=UVT对未知评分进行预测,而RSVD考虑进了用户偏好 b u b_u bu,物品自身的偏置 b i b_i bi,以及全局的平均值 μ \mu μ。其目标函数为
r ^ u i = U u . V i . T + b u + b i + μ \hat r_{ui}=U_{u.}V_{i.}^T+b_u+b_i+\mu r^ui=Uu.Vi.T+bu+bi+μ
同时为了防止过拟合,对 b u b_u bu和 b i b_i bi进行惩罚,加入了正则项,目标函数变为
min θ ∑ u = 1 n ∑ i = 1 m y u i [ 1 2 ( r u i − r ^ u i ) 2 + α u 2 ∣ ∣ U u . ∣ ∣ 2 + α v 2 ∣ ∣ v i . ∣ ∣ 2 + β u 2 b u 2 + β v 2 b i 2 ] \min_\theta \sum_{u=1}^n\sum_{i=1}^my_{ui}\left[\frac{1}{2}(r_{ui}-\hat r_{ui})^2+\frac{\alpha_u}{2}||U_{u.}||^2+\frac{\alpha_v}{2}||v_{i.}||^2+\frac{\beta_u}{2}b_u^2+\frac{\beta_v}{2}b_i^2\right] θminu=1∑ni=1∑myui[21(rui−r^ui)2+2αu∣∣Uu.∣∣2+2αv∣∣vi.∣∣2+2βubu2+2βvbi2]
我们使用随机梯度下降求解,所以只考虑
f u i = 1 2 ( r u i − U u . V i . T ) 2 + α u 2 ∣ ∣ U u . ∣ ∣ 2 + α v 2 ∣ ∣ v i . ∣ ∣ 2 + β u 2 b u 2 + β v 2 b i 2 f_{ui}=\frac{1}{2}(r_{ui}-U_{u.}V_{i.}^T)^2+\frac{\alpha_u}{2}||U_{u.}||^2+\frac{\alpha_v}{2}||v_{i.}||^2+\frac{\beta_u}{2}b_u^2+\frac{\beta_v}{2}b_i^2 fui=21(rui−Uu.Vi.T)2+2αu∣∣Uu.∣∣2+2αv∣∣vi.∣∣2+2βubu2+2βvbi2
令 e u i = r u i − r ^ u i e_{ui}=r_{ui}-\hat r_{ui} eui=rui−r^ui,则有
∇ μ = − e u i ∇ b u = − e u i + β u b u ∇ b i = − e u i + β v b i ∇ U u . = − e u i V i . + α u U u . ∇ V i . = − e u i U u . + α v V i . \begin{aligned} &\nabla_\mu=-e_{ui}\\ &\nabla b_u=-e_{ui}+\beta_ub_u \\ &\nabla b_i= -e_{ui}+\beta_vb_i \\ &\nabla U_{u.}=-e_{ui}V_{i.}+\alpha_uU_{u.} \\ &\nabla V_{i.}=-e_{ui}U_{u.}+\alpha_vV_{i.} \end{aligned} ∇μ=−eui∇bu=−eui+βubu∇bi=−eui+βvbi∇Uu.=−euiVi.+αuUu.∇Vi.=−euiUu.+αvVi.
之后使用下列式子进行更新
μ = μ − γ b u = b u − γ ∇ b u b i = b i − γ ∇ b i U u . = U u . − γ ∇ U u . V i . = V i . − γ ∇ V i . \begin{aligned} &\mu=\mu-\gamma \\ &b_u=b_u-\gamma\nabla b_u \\ & b_i= b_i-\gamma\nabla b_i \\ &U_{u.}=U_{u.}-\gamma\nabla U_{u.} \\ &V_{i.}=V_{i.}-\gamma\nabla V_{i.} \end{aligned} μ=μ−γbu=bu−γ∇bubi=bi−γ∇biUu.=Uu.−γ∇Uu.Vi.=Vi.−γ∇Vi.
考虑初始化的赋值,可以使用下列的初始化方式:
μ = ∑ u = 1 n ∑ i = 1 m y u i r u i / ∑ u = 1 n ∑ i = 1 m y u i b u = ∑ i = 1 m y u i ( r u i − μ ) / ∑ i = 1 m y u i b i = ∑ u = 1 n y u i ( r u i − μ ) / ∑ i = 1 n y u i U u k = ( r − 0.5 ) × 0.01 , k = 1 , 2 , . . . , d V i k = ( r − 0.5 ) × 0.01 , k = 1 , 2 , . . . , d \begin{aligned} &\mu=\sum_{u=1}^n\sum_{i=1}^my_{ui}r_{ui}/\sum_{u=1}^n\sum_{i=1}^my_{ui} \\ &b_u=\sum_{i=1}^my_{ui}(r_{ui}-\mu)/\sum_{i=1}^my_{ui} \\ & b_i= \sum_{u=1}^ny_{ui}(r_{ui}-\mu)/\sum_{i=1}^ny_{ui} \\ &U_{uk}=(r-0.5)\times 0.01 ,k=1,2,...,d \\ &V_{ik}=(r-0.5)\times 0.01 ,k=1,2,...,d \end{aligned} μ=u=1∑ni=1∑myuirui/u=1∑ni=1∑myuibu=i=1∑myui(rui−μ)/i=1∑myuibi=u=1∑nyui(rui−μ)/i=1∑nyuiUuk=(r−0.5)×0.01,k=1,2,...,dVik=(r−0.5)×0.01,k=1,2,...,d
我们只要使用SGD让权重收敛即可。
附上代码(代码中并没有使用上述初始化,而是随机初始化):
import random
import math
import pandas as pd
import numpy as np
class RSVD():
def __init__(self, allfile, trainfile, testfile, latentFactorNum=20,alpha_u=0.01,alpha_v=0.01,beta_u=0.01,beta_v=0.01,learning_rate=0.01):
data_fields = ['user_id', 'item_id', 'rating', 'timestamp']
# all data file
allData = pd.read_table(allfile, names=data_fields)
# training set file
self.train_df = pd.read_table(trainfile, names=data_fields)
# testing set file
self.test_df=pd.read_table(testfile, names=data_fields)
# get factor number
self.latentFactorNum = latentFactorNum
# get user number
self.userNum = len(set(allData['user_id'].values))
# get item number
self.itemNum = len(set(allData['item_id'].values))
# learning rate
self.learningRate = learning_rate
# the regularization lambda
self.alpha_u=alpha_u
self.alpha_v=alpha_v
self.beta_u=beta_u
self.beta_v=beta_v
# initialize the model and parameters
self.initModel()
# initialize all parameters
def initModel(self):
self.mu = self.train_df['rating'].mean()
self.bu = np.zeros(self.userNum)
self.bi = np.zeros(self.itemNum)
self.U = np.mat(np.random.rand(self.userNum,self.latentFactorNum))
self.V = np.mat(np.random.rand(self.itemNum,self.latentFactorNum))
# self.bu = [0.0 for i in range(self.userNum)]
# self.bi = [0.0 for i in range(self.itemNum)]
# temp = math.sqrt(self.latentFactorNum)
# self.U = [[(0.1 * random.random() / temp) for i in range(self.latentFactorNum)] for j in range(self.userNum)]
# self.V = [[0.1 * random.random() / temp for i in range(self.latentFactorNum)] for j in range(self.itemNum)]
print("Initialize end.The user number is:%d,item number is:%d" % (self.userNum, self.itemNum))
def train(self, iterTimes=100):
print("Beginning to train the model......")
preRmse = 10000.0
for iter in range(iterTimes):
for index in self.train_df.index:
if index % 20000 == 0 :
print("第%s轮进度:%s%%" %(iter,index/len(self.train_df.index)*100))
user = int(self.train_df.loc[index]['user_id'])-1
item = int(self.train_df.loc[index]['item_id'])-1
rating = float(self.train_df.loc[index]['rating'])
pscore = self.predictScore(self.mu, self.bu[user], self.bi[item], self.U[user], self.V[item])
eui = rating - pscore
# update parameters bu and bi(user rating bais and item rating bais)
self.mu= -eui
self.bu[user] += self.learningRate * (eui - self.beta_u * self.bu[user])
self.bi[item] += self.learningRate * (eui - self.beta_v * self.bi[item])
temp = self.U[user]
self.U[user] += self.learningRate * (eui * self.V[user] - self.alpha_u * self.U[user])
self.V[item] += self.learningRate * (temp * eui - self.alpha_v * self.V[item])
# for k in range(self.latentFactorNum):
# temp = self.U[user][k]
# # update U,V
# self.U[user][k] += self.learningRate * (eui * self.V[user][k] - self.alpha_u * self.U[user][k])
# self.V[item][k] += self.learningRate * (temp * eui - self.alpha_v * self.V[item][k])
#
# calculate the current rmse
curRmse = self.test(self.mu, self.bu, self.bi, self.U, self.V)
print("Iteration %d times,RMSE is : %f" % (iter + 1, curRmse))
if curRmse > preRmse:
break
else:
preRmse = curRmse
print("Iteration finished!")
# test on the test set and calculate the RMSE
def test(self, mu, bu, bi, U, V):
cnt = self.test_df.shape[0]
rmse = 0.0
buT=bu.reshape(bu.shape[0],1)
predict_rate_matrix = mu + np.tile(buT,(1,self.itemNum))+ np.tile(bi,(self.userNum,1)) + self.U * self.V.T
for i in self.test_df.index:
user = int(self.test_df.loc[i]['user_id']) - 1
item = int(self.test_df.loc[i]['item_id']) - 1
score = float(self.test_df.loc[i]['rating'])
#pscore = self.predictScore(mu, bu[user], bi[item], U[user], V[item])
pscore = predict_rate_matrix[user,item]
rmse += math.pow(score - pscore, 2)
RMSE=math.sqrt(rmse / cnt)
return RMSE
# calculate the inner product of two vectors
def innerProduct(self, v1, v2):
result = 0.0
for i in range(len(v1)):
result += v1[i] * v2[i]
return result
def predictScore(self, mu, bu, bi, U, V):
#pscore = mu + bu + bi + self.innerProduct(U, V)
pscore = mu + bu + bi + np.multiply(U,V).sum()
if pscore < 1:
pscore = 1
if pscore > 5:
pscore = 5
return pscore
if __name__ == '__main__':
s = RSVD("../datasets/ml-100k/u.data", "../datasets/ml-100k/u1.base", "../datasets/ml-100k/u1.test")
s.train()