基于记忆(Memory-Based)与基于模型(Model-Based)的辨析

转载自:https://yasserebrahim.wordpress.com/2012/10/13/memory-based-vs-model-based-recommendation-systems/

对这两个定义有一个比较客观的分析。

Memory-Based vs. Model-Based Recommendation Systems

Anywhere you’d try to read on recommendation systems you’ll catch a mention of this categorization: memory-based versus model-based recommendation systems. I’ve seen some terrible explanations of this categorization, so I’ll try to put it as simple as I can.

Memory-based techniques use the data (likes, votes, clicks, etc) that you have to establish correlations (similarities?) between either users (Collaborative Filtering) or items (Content-Based Recommendation) to recommend an item i to a user u who’s never seen it before. In the case of collaborative filtering, we get the recommendations from items seen by the user’s who are closest to u, hence the term collaborative. In contrast, content-based recommendation tries to compare items using their characteristics (movie genre, actors, book’s publisher or author… etc) to recommend similar new items.

In a nutshell, memory-based techniques rely heavily on simple similarity measures (Cosine similarity, Pearson correlation, Jaccard coefficient… etc) to match similar people or items together. If we have a huge matrix with users on one dimension and items on the other, with the cells containing votes or likes, then memory-based techniques use similarity measures on two vectors (rows or columns) of such a matrix to generate a number representing similarity.

Model-based techniques on the other hand try to further fill out this matrix. They tackle the task of “guessing” how much a user will like an item that they did not encounter before. For that they utilize several machine learning algorithms to train on the vector of items for a specific user, then they can build a model that can predict the user’s rating for a new item that has just been added to the system.

Since I’ll be working on news recommendations, the latter technique sounds much more interesting. Particularly since news items emerge very quickly (and disappear also very quickly), it makes sense that the system develops some smart way of detecting when a new piece of news will be interesting to the user even before other users see/rate it.

Popular model-based techniques are Bayesian Networks, Singular Value Decomposition, and Probabilistic Latent Semantic Analysis (or Probabilistic Latent Semantic Indexing). For some reason, all model-based techniques do not enjoy particularly happy-sounding names.

你可能感兴趣的:(推荐系统)