矩阵奇异值分解特征值分解_推荐系统中的奇异值分解与矩阵分解

矩阵奇异值分解特征值分解

Recently, after watching the Recommender Systems class of Prof. Andrew Ng’s Machine Learning course, I found myself very discomforted not understanding how Matrix Factorization works.

最近,在观看了Andrew Ng教授的机器学习课程的“推荐系统” 课程后 ,我发现自己不了解矩阵因式分解的工作原理感到非常不自在。

I know sometimes the math in Machine Learning is very obscure. It’s better if we think about it as a black box, but that model was very “magical” for my standards.

我知道有时机器学习中的数学非常模糊。 如果我们将其视为黑匣子,那会更好,但是对于我的标准而言,该模型非常“神奇”。

In such situations, I usually try to search on Google for more references to better grasp the concept. This time I got even more confused. While Prof. Ng called the algorithm as (Low Factor) Matrix Factorization, I found a different nomenclature on the internet: Singular Value Decomposition.

在这种情况下,我通常会尝试在Google上搜索更多参考,以更好地理解该概念。 这次我变得更加困惑。 吴教授将算法称为(低因子)矩阵分解,但我在互联网上发现了另一种命名法:奇异值分解。

What confused me the most was that Singular Value Decomposition was very different from what Prof. Ng had taught. People kept suggesting they were both the same thing.

最让我感到困惑的是,奇异值分解与Ng教授的教导截然不同。 人们不断暗示着他们都是同一件事。

In this text, I will summarize my findings and try to clear up some of the confusion those terms can cause.

在本文中,我将总结我的发现,并尝试消除这些术语可能引起的一些混乱。

推荐系统 (Recommender Systems)

Recommender Systems (RS) are just automated ways to recommend something to someone. Such systems are broadly used by e-commerce companies, streaming services and news websites. It helps to reduce the friction of users when trying to find something they like.

推荐系统(RS)只是向某人推荐某些东西的自动化方法。 电子商务公司,流媒体服务和新闻网站广泛使用这种系统。 它有助于减少用户在尝试找到自己喜欢的东西时的摩擦。

RS are definitely not a new thing: they have been featured since at least 1990. In fact, part of the recent Machine Learning hype can be attributed to interest in RS. In 2006, Netflix made a splash when they sponsored a competition to find the best RS for their movies. As we will see soon, that event is related to the nomenclature mess that followed.

RS绝对不是什么新鲜事物:至少自1990年以来,它们就已经成为特色。 实际上,最近的机器学习炒作的一部分可以归因于对RS的兴趣。 在2006年,Netflix赞助了一场竞赛,为他们的电影寻找最佳的RS,引起了轰动。 我们将很快看到,该事件与之后的命名混乱有关。

矩阵表示 (The matrix representation)

There are a lot of ways a person can think of recommending a movie to someone. One strategy that turned out to be very good is treating movie ratings as a Users x Movies matrix like this:

一个人可以通过多种方式向他人推荐电影。 事实证明很好的一种策略是将电影分级视为“用户x电影”矩阵,如下所示:

In that matrix, the question marks represent the movies a user has not rated. The strategy then is to predict those ratings somehow and recommend to users the movies they will probably like.

在该矩阵中,问号代表用户未评分的电影。 然后,策略是以某种方式预测这些收视率,并向用户推荐他们可能会喜欢的电影。

矩阵分解 (Matrix Factorization)

A really smart realization made by the guys who entered the Netflix’s competition (notably Simon Funk) was that the users’ ratings weren’t just random guesses. Raters probably follow some logic where they weight the things they like in a movie (a specific actress or a genre) against things they don’t like (long duration or bad jokes) and then come up with a score.

参加Netflix竞赛的人(尤其是Simon Funk )做出的一个非常聪明的认识是,用户的评分不仅仅是随机的猜测。 评分者可能遵循某种逻辑,即将电影中喜欢的事物(特定的女演员或流派)与不喜欢的事物(持续时间长或坏笑话)权衡,然后得出分数。

That process can be represented by a linear formula of the following kind:

该过程可以由以下线性公式表示:

where xₘ is a column vector with the values of the features of the movie m and θᵤ is another column vector with the weights that user u gives to each feature. Each user has a different set of weights and each film has a different set of values for its features.

其中xₘ是具有电影m的特征值的列向量, θᵤ是具有用户u赋予每个特征权重的另一个列向量。 每个用户都有不同的权重设置,每个影片的功能都有不同的值设置。

It turns out that if we arbitrarily fix the number of features and ignore the missing ratings, we can find a set of weights and features values that create a new matrix with values close to the original rating matrix. This can be done with gradient descent, very much like the one used in linear regression. Instead of that now we are optimizing two sets of parameters (weights and features) at the same time.

事实证明,如果我们任意固定要素的数量而忽略缺失的等级,则可以找到一组权重和要素值,这些权重和要素值将创建一个新矩阵,其值与原始等级矩阵接近。 这可以通过梯度下降来完成,与线性回归中使用的非常相似。 取而代之的是,我们现在同时优化两组参数(权重和特征)。

Using the table I gave as an example above, the result of the optimization problem would generate the following new matrix:

使用上面作为示例给出的表,优化问题的结果将生成以下新矩阵:

Notice that the resulting matrix can’t be an exact copy of the original one in most real datasets because in real life people are not doing multiplications and summations to rate a movie. In most cases, the rating is just a gut feeling that can also be affected by all kinds of external factors. Still, our hope is that the linear formula is a good way to express the main logic that drives users ratings.

请注意,在大多数真实数据集中,所得矩阵不能完全是原始矩阵的精确副本,因为在现实生活中,人们并没有进行乘法和求和来对电影评分。 在大多数情况下,评级只是一种直觉,也可能会受到各种外部因素的影响。 尽管如此,我们希望线性公式是表达驱动用户评分的主要逻辑的好方法。

OK, now we have an approximate matrix. But how the heck does that help us to predict the missing ratings? Remember that to build the new matrix, we created a formula to fill all the values, including the ones that are missing in the original matrix. So if we want to predict the missing rating of a user on a movie, we just take all the feature values of that movie, multiply by all the weights of that user and sum everything up. So, in my example, if I want to predict User 2’s rating of Movie 1, I can do the following calculation:

好,现在我们有了一个近似矩阵。 但是,这到底如何帮助我们预测缺失的评分? 请记住,要构建新矩阵,我们创建了一个公式来填充所有值,包括原始矩阵中缺少的值。 因此,如果我们要预测电影上用户的缺失评分,我们只需获取该电影的所有特征值,然后乘以该用户的所有权重,然后将所有内容相加即可。 因此,在我的示例中,如果要预测用户2对电影1的评分,则可以执行以下计算:

To make things clearer, we can disassociate the θ’s and x’s and put them into their own matrices (say P and Q). That is effectively a Matrix Factorization, hence the name used by Prof. Ng.

为了使情况更清楚,我们可以解开θx的关系并将它们放入自己的矩阵中(例如PQ )。 这实际上是矩阵分解,因此由Ng教授使用。

That Matrix Factorization is basically what Funk did. He got third place in Netflix’s competition, attracting a lot of attention (which is an interesting case of a third place being more famous than the winners). His approach has been replicated and refined since then and is still in use in many applications.

矩阵因式分解基本上就是Funk所做的。 他在Netflix的比赛中获得第三名,吸引了很多关注(这是一个有趣的例子,第三名比优胜者更著名)。 从那时起,他的方法一直在被复制和完善,并且仍在许多应用中使用。

奇异值分解 (Singular Value Decomposition)

Enter Singular Value Decomposition (SVD). SVD is a fancy way to factorizing a matrix into three other matrices (A = UΣVᵀ). The way SVD is done guarantees those 3 matrices carry some nice mathematical properties.

输入奇异值分解(SVD)。 SVD是将矩阵分解为其他三个矩阵( A =UΣVᵀ )的一种理想方法 。 SVD的完成方式可确保这3个矩阵具有良好的数学特性。

There are many applications for SVD. One of them is Principal Component Analysis (PCA), which is just reducing a dataset of dimension n to dimension k (k < n).

SVD有许多应用程序 。 其中之一是主成分分析(PCA),它只是将维度n的数据集简化为维度k ( k )。

I won’t give you any further detail on SVDs because I don’t know myself. The point is that it’s not the same thing as we did with Matrix Factorization. The biggest evidence is that SVD creates 3 matrices while Funk’s Matrix Factorization creates only 2.

因为我不认识我,所以我不会为您提供有关SVD的更多详细信息。 关键是, 与我们对矩阵分解进行的处理不同。 最大的证据是,SVD创建了3个矩阵,而Funk的矩阵分解仅创建了2个矩阵。

So why SVD keeps popping up every time I search for Recommender Systems? I had to dig a little bit, but eventually, I found some hidden gems. According to Luis Argerich:

那么,为什么每次我搜索推荐系统时SVD都会不断弹出? 我不得不挖掘一点,但最终,我发现了一些隐藏的宝石。 根据路易斯·阿尔格里希 ( Luis Argerich)的说法:

The matrix factorization algorithms used for recommender systems try to find two matrices: P,Q such as P*Q matches the KNOWN values of the utility matrix.

用于推荐系统的矩阵分解算法尝试找到两个矩阵:P,Q(例如P * Q)与效用矩阵的KNOWN值匹配。

This principle appeared in the famous SVD++ “Factorization meets the neighborhood” paper that unfortunately used the name “SVD++” for an algorithm that has absolutely no relationship with the SVD.

该原理出现在著名的SVD ++“分解与邻域”中,该论文不幸地使用了名称“ SVD ++”作为与SVD毫无关系的算法。

For the record, I think Funk, not the authors of SVD++, first proposed the mentioned matrix factorization for recommender systems. In fact, SVD++, as its name suggests, is an extension of Funk’s work.

作为记录,我认为Funk(不是SVD ++的作者)首先提出了针对推荐系统的提到的矩阵分解。 实际上,顾名思义,SVD ++是Funk工作的扩展。

Xavier Amatriain gives us a bigger picture:

Xavier Amatriain为我们提供了更大的前景 :

Let’s start by pointing out that the method usually referred to as “SVD” that is used in the context of recommendations is not strictly speaking the mathematical Singular Value Decomposition of a matrix but rather an approximate way to compute the low-rank approximation of the matrix by minimizing the squared error loss. A more accurate, albeit more generic, way to call this would be Matrix Factorization. The initial version of this approach in the context of the Netflix Prize was presented by Simon Funk in his famous Try This at Home blogpost. It is important to note that the “true SVD” approach had been indeed applied to the same task years before, with not so much practical success.

让我们首先指出,在建议的上下文中使用的通常称为“ SVD”的方法并不是严格说来是矩阵的数学奇异值分解 ,而是一种近似的方法来计算矩阵的低秩近似通过最小化平方误差损失。 一种更准确(尽管更通用)的调用方式是矩阵分解。 西蒙·冯克(Simon Funk)在他著名的“在家中尝试此”博客文章中介绍了这种方法在Netflix大奖方面的最初版本。 重要的是要注意,“真正的SVD”方法实际上早在几年前就已应用于同一任务,但实际应用并没有那么多。

Wikipedia also has similar information buried in its Matrix factorization (recommender systems) article:

维基百科的“ 矩阵分解”(推荐系统)文章中也包含类似的信息:

The original algorithm proposed by Simon Funk in his blog post factorized the user-item rating matrix as the product of two lower-dimensional matrices, the first one has a row for each user, while the second has a column for each item. The row or column associated with a specific user or item is referred to as latent factors. Note that, despite its name, in FunkSVD no singular value decomposition is applied.

Simon Funk在他的博客文章中提出的原始算法将用户项目评分矩阵分解为两个低维矩阵的乘积,第一个对每个用户都有一行,第二个对每个项目都有一列。 与特定用户或项目关联的行或列称为潜在因素。 请注意,尽管有其名称,但在FunkSVD中未应用任何奇异值分解。

To summarize:

总结一下:

1. SVD is a somewhat complex mathematical technique that factorizes matrices intro three new matrices and has many applications, including PCA and RS.

1. SVD是一种比较复杂的数学技术,可以将矩阵分解为三个新矩阵,并具有许多应用,包括PCA和RS。

2. Simon Funk applied a very smart strategy in the 2006 Netflix competition, factorizing a matrix into two other ones and using gradient descent to find optimal values of features and weights. It’s not SVD, but he used that term anyway to describe his technique.

2.西蒙·冯克(Simon Funk)在2006年的Netflix竞赛中采用了一种非常聪明的策略,将一个矩阵分解为另外两个矩阵,并使用梯度下降法来找到要素和权重的最佳值。 这不是SVD ,但他仍然使用该术语来描述他的技术。

3. The more appropriate term for what Funk did is Matrix Factorization.

3.对于Funk而言,更合适的术语是矩阵分解。

4. Because of the good results and the fame that followed, people still call that technique SVD because, well, that’s how the author named it.

4.由于取得了良好的结果和名声,人们仍然称其为SVD技术,因为作者就是这样命名的。

I hope this helps to clarify things a bit.

我希望这有助于澄清一些事情。

翻译自: https://www.freecodecamp.org/news/singular-value-decomposition-vs-matrix-factorization-in-recommender-systems-b1e99bc73599/

矩阵奇异值分解特征值分解

你可能感兴趣的:(算法,python,机器学习,人工智能,深度学习)