Collaborative Deep Learning for Recommender Systems
Authors:Hao Wang,Naiyan Wang,Dit-Yan Yeung
ABSTRACT
- Collaborative filtering (CF) is a successful approach commonly used by many recommender systems.
- Conventional CF-based methods use the ratings given to items by users as the sole source of information for learning to make recommendation.
- To address the ++ratings sparsity problem++, auxiliary information may be utilized.
- Collaborative topic regression (CTR) is an appealing recent method taking this approach which tightly couples the two components that learn from two different sources of information.
- To address this problem that ++the auxiliary information is very sparse++, we generalize recent advances in deep learning from i.i.d. input to non-i.i.d. (CF-based) input and propose in this paper a hierarchical Bayesian model called collaborative deep learning (==CDL==), which jointly performs ++deep representation learning for the content information++ and ++collaborative filtering for the ratings (feedback) matrix++.
- Result:CDL can significantly advance the state of the art.
Categories and Subject Descriptors:
[Information Systems]: Models and Principles| General;
[Computer Applications]: Social and Behavioral Sciences
Keywords:
Recommender systems; Deep learning; Topic model; Text mining
1. INTRODUCTION
Due to the abundance of choice in many online services, recommender systems (RS) now play an increasingly significant role .
Existing methods for RS can roughly be categorized into three classes:
- ==content-based methods==:make use of user profiles or product descriptions for recommendation.
- ==collaborative filtering (CF) based methods==:use the past activities or preferences, such as user ratings on items, without using user or product content information.
- ==hybrid methods==:seek to get the best of both worlds by combining content-based and CF-based methods.
Because of CF-based methods prediction accuracy often drops significantly when ++the ratings are
very sparse++. Moreover, ++they cannot be used for recommending new products++ which have yet to receive rating information from users. Consequently, it is inevitable for CF-based methods to exploit auxiliary information and hence hybrid methods have gained popularity in recent years.
According to whether two-way interaction exists between ++the rating information++ and ++auxiliary information++, hybrid methods into two sub-categories:
- ==Loosely coupled==:process the auxiliary information once and then use it to provide features for the CF models. (information flow is one-way)
- ==Tightly coupled methods==:the rating information can guide the learning of features, and the extracted features can further improve the predictive power of the CF models. (two-way interaction) 重点
With two-way interaction, tightly coupled methods can automatically learn features from the auxiliary information and naturally balance the influence of the rating and auxiliary information.
目前最好的方法,也是本文提出来的方法 collaborative deep learning (CDL)的基础:Collaborative topic regression (CTR) is a probabilistic graphical model that seamlessly integrates a topic model, latent Dirichlet allocation (LDA) , and a model-based CF method, probabilistic matrix factorization (PMF).
目的:This calls for integrating deep learning with CF by performing deep learning collaboratively.
deep learning models for CF(综述):
- [28] uses restricted Boltzmann machines instead of the conventional matrix factorization formulation to perform CF.(CF-based methods because they do not incorporate content information)
- [9] extends this work by incorporating user-user and item-item correlations. (CF-based methods because they do not incorporate content information)
- [24] uses low-rank matrix factorization in the last weight layer of a deep network to significantly reduce the number of model parameters and speed up training.
- On music recommendation, [21, 39] directly use conventional CNN or deep belief networks (DBN) to assist representation learning for content information.
To address the challenges above, we develop a hierarchical Bayesian model called ==collaborative deep learning (CDL)== as a novel tightly coupled method for RS.
- We first present a Bayesian formulation of a deep learning model called stacked denoising autoencoder (SDAE).
- With this, we then present our CDL model which tightly couples deep representation learning for the content information and collaborative filtering for the ratings (feedback) matrix, allowing two-way interaction between the two.
Experiments show that CDL significantly outperforms the state of the art.
(Note: Although we present CDL as using SDAE for its feature learning component, CDL is actually a more general framework which can also admit other deep learning models such as deep Boltzmann machines, recurrent neural networks , and convolutional neural networks.)
==The main contribution:==
- By performing deep learning collaboratively, ++CDL can simultaneously extract an effective deep feature representation from content and capture the similarity and implicit relationship between items (and users).++ The learned representation may also be used for tasks other than recommendation.
- Unlike previous deep learning models which use simple target like classification and reconstruction, ++we propose to use CF as a more complex target in a probabilistic framework++.
- Besides the algorithm for attaining maximum a posteriori (MAP) estimates, ++we also derive a sampling-based algorithm for the Bayesian treatment of CDL++, which, interestingly, turns out to be a Bayesian generalized version of back-propagation.
- To the best of our knowledge, CDL is ++the first hierarchical Bayesian model to bridge the gap between stateof-the-art deep learning models and RS++. Besides, due to its Bayesian nature, CDL can be easily extended to incorporate other auxiliary information to further boost the performance.
- Extensive experiments on three real-world datasets from different domains show that ++CDL can significantly advance the state of the art++.
2. NOTATION AND PROBLEM FORMULATION
Defination:
- The entire collection of J items is represented by a J-by-S matrix
$X_c$
, where row j is the bag-of-words vector$X_{c,j*}$
for item j based on a vocabulary of size S. - With I users, we define an I-by-J binary rating matrix
$R=[R_{ij}]_{I*J} $
.
Given part of the ratings in R and the content information $X_c$
, the problem is to predict the other ratings in R.
(Note : an L=2-layer SDAE corresponds to an L-layer network.)
3. COLLABORATIVE DEEP LEARNING
3.1 Stacked Denoising Autoencoders
SDAE is a ++feedforward neural network++ for learning representations (encoding) of the input data by learning to predict the clean input itself in the output.
SDAE 是一种++前馈神经网络++,用于通过学习预测输出中的干净输入本身来学习输入数据的表示(编码),如图2所示。
3.2 Generalized Bayesian SDAE
(Note: If λs goes to infinity, the Gaussian distribution in Equation (1) will become a ++Dirac delta distribution++. The model will degenerate to be a ++Bayesian formulation of SDAE++. )
(Note: the first L=2 layers of the network act as an encoder and the last L=2 layers act as a decoder.)
3.3 Collaborative Deep Learning
(Note: ++the middle layer++ XL=2 serves as a bridge between the ratings and content information. This middle layer, along with the latent offset �j, is the key that ++enables CDL to simultaneously learn an effective feature representation and capture the similarity and (implicit) relationship between items (and users)++. )
The graphical model of CDL when λs approaches positive infinity :
3.4 Maximum A Posteriori Estimates
An EM-style algorithm for obtaining the MAP estimates:
- ++when λs approaches positive infinity++, training of the probabilistic graphical model of CDL would degenerate to simultaneously training two neural networks overlaid together with a common input layer (the corrupted input) but different output layers.
- ++When the ratio λn=λv approaches positive infinity++, it will degenerate to a two-step model in which the latent representation learned using SDAE is put directly into the CTR.
- ++when λn=λv goes to zero++ where the decoder of the SDAE essentially vanishes.
3.5 Prediction
Let D be the observed test data. We use the point estimates of ui, W+ and �j to calculate the predicted rating:
we approximate the predicted rating as:
4. EXPERIMENTS
4.1 Datasets
- citeulike-a: citeulike-a contains 5551 users and 16980 items.
- citeulike-t: citeulike-t, the numbers are 7947 and 25975.
- Netflix: 407261 users, 9228 movies, and 15348808 ratings.
( Note: After removing stop words, the top S discriminative words according to the ++tf-idf values++ are chosen to form the vocabulary (S is 8000, 20000, and 20000 for the three datasets).)
4.2 Evaluation Scheme
We use recall as the performance measure because the rating information is in the form of implicit feedback.
Another evaluation metric is the mean average precision (mAP).
4.3 Baselines and Experimental Settings
- CMF: Collective Matrix Factorization is a model incorporating different sources of information by simultaneously factorizing multiple matrices. In this paper, the two factorized matrices are R and Xc.
- SVDFeature: SVDFeature is a model for featurebased collaborative filtering. In this paper we use the content information Xc as raw features to feed into SVDFeature.
- DeepMusic: DeepMusic is a model for music recommendation. We use the variant, a loosely coupled method, that achieves the best performance as our baseline.
- CTR: Collaborative Topic Regression is a model performing topic modeling and collaborative filtering simultaneously as mentioned in the previous section.
- CDL: Collaborative Deep Learning is our proposed model as described above. It allows different levels of model complexity by varying the number of layers.
4.4 Quantitative Comparison
4.5 Qualitative Comparison
With a more effective representation, CDL can capture the key points of articles and the user preferences more accurately. Besides, it can model the co-occurrence and relations of words better.
CDL is sensitive enough to changes of user taste and hence can provide more accurate recommendation.
5. COMPLEXITY ANALYSIS AND IMPLEMENTATION
the total time complexity is O(JSK1 + K2J 2 + K2I2 + K3).
CDL is very scalable.
6. CONCLUSION AND FUTURE WORK
- We have demonstrated in this paper that state-of-the-art performance can be achieved by jointly performing ++deep representation learning for the content information++ and ++collaborative filtering for the ratings (feedback) matrix++.
- As far as we know, CDL is the first hierarchical Bayesian model to bridge the gap between state-of-the-art deep learning models and RS.
- The Bayesian nature of CDL also provides potential performance boost if other side information is incorporated as in . Besides, as remarked above, CDL actually provides a framework that can also ++admit deep learning models other than SDAE++.
刘丽
2017-10-26