个性化推荐系统_如何开发超个性化的推荐系统

个性化推荐系统

by Mariya Yao

姚iya(Mariya Yao)

如何开发超个性化的推荐系统 (How to develop a hyper-personalized recommendation system)

Expedia的Jack Chua访谈 (Interview with Jack Chua of Expedia)

As part of our AI For Growth executive education series, we interview top executives at leading global companies who have successfully applied AI to grow their enterprises. Today, we sit down with Jack Chua, Director of Data Science at Expedia.

作为“ AI促进增长”高管教育系列的一部分 ,我们采访了成功应用AI来发展企业的全球领先公司的高层管理人员。 今天,我们与Expedia数据科学总监Jack Chua坐在一起。

Jack has built automated trading systems for multi-million dollar investment portfolios, terabyte-scale recommender systems for Amazon, and personalized marketing for an iconic American beverage brand. Previously he led industrial applications of machine learning for global clients of Boston Consulting Group (BCG). Now he heads the data science team at Expedia, where he applies his deep expertise in product and pricing optimization.

Jack为数百万美元的投资组合构建了自动交易系统,为Amazon建立了TB级的推荐系统,并为一个标志性的美国饮料品牌进行了个性化营销。 此前,他为波士顿咨询集团(BCG)的全球客户领导了机器学习的工业应用。 现在,他领导Expedia的数据科学团队,在此领域他将深厚的专业知识应用于产品和价格优化。

Watch the episode (or read the transcript below) to learn:

观看剧集(或阅读下面的成绩单)以学习:

  1. A brief history about the evolution of the recommender system

    推荐系统发展的简要历史
  2. How to choose the right KPI metric to come up with the right mixture of recommendations

    如何选择正确的KPI指标以提出正确的建议组合
  3. Pitfalls to avoid when building your own recommender system

    建立自己的推荐系统时要避免的陷阱
  4. Hot topics in applying AI to retail in the next three to five years

    未来三到五年内将AI应用于零售的热门话题

Marlene Jia: Thank you all for joining our AI for Growth Executive series. In this interview series, we learn from executives at leading global companies who have successfully applied AI to their enterprise and their team.

Marlene Jia:谢谢大家加入我们的AI for Growth Executive系列。 在本系列访谈中,我们向成功地将AI应用于企业及其团队的全球领先公司的高管们学习。

My name is Marlene — you can call me MJ — and today we’ll be chatting with Jack Chua, who’s the Director of Data Science at Expedia. He’s also had a background at Amazon, working at BCG, so I think we’ll have a lot of really interesting things to learn from Jack today.

我叫Marlene,您可以叫我MJ。今天,我们将与Expedia数据科学总监Jack Chua聊天。 他还曾在亚马逊(BCG)任职,曾在BCG工作,所以我认为今天我们将向杰克学习很多非常有趣的东西。

Jack, thanks for joining us. For our first question, we would love to hear more about you, your story, and how you first became interested in AI.

杰克,谢谢你加入我们。 对于第一个问题,我们很想听到更多关于您,您的故事以及您最初对AI感兴趣的信息。

Jack Chua: It’s a field that a lot of us inadvertently land into. Obviously it’s super interesting and very high impact.

杰克·蔡:这是我们很多人不经意进入的领域。 显然,它非常有趣并且影响很大。

My path has been a bit unconventional. I started studying theoretical math at the University of Chicago. Maybe two or three years in, I thought to myself, “What the heck am I going to do with this degree?”

我的道路有点不合常规。 我开始在芝加哥大学学习理论数学。 大概两三年后,我对自己说:“这个学位我要做什么呢?”

I looked into what was hot at the time. This was back in 2006. A lot of my peers were going into investment banking, hedge funds, that kind of space. I was thinking, “Is there a way to apply the theoretical math that I learned and merge that into the sexy M&A banking type of thing?”, and I stumbled into the field of quantitative finance.

我看着当时最热的东西。 那是在2006年。我的许多同龄人都在投资银行,对冲基金等领域。 我当时在想:“有没有办法将我学到的理论数学应用到并购银行这种性感的事情中呢?”,我偶然发现了量化金融领域。

I started voraciously reading books about option pricing, volatility trading, things where you can basically determine the underlying stochastic process of the instrument and trade, given its dynamics.

我开始大量阅读有关期权定价,波动率交易的书籍,这些书籍使您可以基本确定工具和交易的潜在随机过程(考虑到其动态)。

I was lucky. I got in before the whole subprime financial crisis hit, but I would say that was how I started bridging the gap between theoretical math and industry.

我很幸运。 在整个次贷危机爆发之前,我就参与进来,但是我要说的是,我开始弥合理论数学与行业之间的鸿沟。

Another point in time came when I started doing high-frequency trading. It’s a field where not just the statistical elements of the trade are necessary but also the engineering elements, and I realized the gap between knowing the theory [and] actually being able to implement the theory, whether in C++ or Python or what-have-you.

另一个时间点是我开始进行高频交易时。 在这个领域中,不仅交易的统计要素是必要的,工程要素也是如此,而且我意识到了了解理论和实际能够实施该理论之间的差距,无论是在C ++,Python还是什么条件下,您。

At the time, coming from a pretty theoretical academic background and job, I had to sit down and learn how to code from scratch. That’s what led me to go back to graduate school in Georgia Tech in applied mathematics, and that’s where machine learning was really starting to burgeon. I started taking computational data analysis courses, learned from some of the best professors in the world there. A few years later, here we are today.

当时,由于具有相当理论的学术背景和工作,我不得不坐下来学习如何从头开始编写代码。 这就是让我回到佐治亚理工学院应用数学专业的研究生院的原因,而机器学习才真正开始蓬勃发展。 我开始学习计算数据分析课程,是从那里的一些世界上最好的教授那里学到的。 几年后,今天我们到了。

MJ: You’ve worked at Amazon, [and] you’ve worked on numerous projects within recommendation engines. How did you get started there and where did that lead you? Obviously you’re at Expedia now.

MJ:您曾在Amazon工作,[并且]在推荐引擎中从事过多个项目。 您是如何开始的,那又是什么导致的? 显然,您现在在Expedia。

JC: Recommendation engines are a small part of what I do, but it’s a very important part. The way that I like to describe it is “surfacing the right content to the right customer at the right time and in the right channel.” On top of the content, it’s [a] whole ecosystem of how that content is displayed and what’s the context.

JC:推荐引擎只是我所做工作的一小部分,但这是非常重要的一部分。 我要描述的方式是“在正确的时间和正确的渠道向正确的客户显示正确的内容。” 在内容之上,是[整个]生态系统,显示内容的显示方式和上下文。

Maybe this is pre-empting other things you might ask me, but when you think about recommending something to someone, there’s a real business reason why you might want to do that, whether it’s to encourage cross-selling of products, or increasing the frequency of someone coming back to your website because you’ve got great information and so on. I think recommender engines have a great tie-in to the underlying KPI of what you want to drive.

也许这是先发制人的事情,您可能会问我,但是当您考虑向某人推荐某物时,出于商业目的,无论是鼓励产品交叉销售还是提高频率,您都想这样做的人因为您掌握了很多信息而返回您的网站等等。 我认为推荐引擎与您要驱动的基础KPI有很大的联系。

I think that’s what drove Amazon to invest in data scientists and engineers to work on the recommender systems, because the underlying context of why someone buys something is so complex. It could be the fact that it’s seasonal, maybe there’s a big discount that’s about to happen, or maybe it’s utilitarian, so they actually just thought of something [like] “Hey, I want this USB stick”, so they just go to our website and buy it.

我认为,这就是促使亚马逊投资数据科学家和工程师来开发推荐系统的原因,因为某人购买某物的根本背景非常复杂。 可能是事实,这是季节性的,也许会有很大的折扣,或者也许是功利主义,所以他们实际上只是想到了一些东西,例如“嘿,我想要这个USB随身碟”,于是他们去找我们。网站并购买。

All these contextual clues combine in a way that classical business intelligence or business rules cannot capture. It’s like a pure play AI problem, it’s something that using business rules or human rules would be suboptimal.

所有这些上下文线索都以传统商业智能或业务规则无法捕获的方式结合在一起。 这就像一个纯粹的AI问题,使用业务规则或人工规则将不是最佳选择。

MJ: Before we dive into all the details, I would love for you to tell us exactly what a recommendation engine really is, walk us through how you would go about building it, and what the factors of consideration are, which you started talking about.

MJ:在我们深入探讨所有细节之前,我希望您能确切地告诉我们一个推荐引擎的真正含义,引导我们逐步了解如何构建推荐引擎以及考虑因素是什么,您开始谈论这些内容。

JC: Just starting from pure axiomatic definition, a recommender system is real estate on a channel of some sort, whether it’s email or a website or mobile app, what-have-you. You’ve probably seen it on Amazon, it’s a ribbon that has multiple products on it. It could be an email that has multiple components that has different products. There’s building blocks that comprise these emails or these marketing materials, and it’s the job of the human to figure out what should go in there.

JC:仅仅从纯粹的公理定义开始,推荐系统就是某种渠道上的房地产,无论是电子邮件,网站还是移动应用程序,都可以。 您可能已经在亚马逊上看到过它,它是一条丝带,上面有多种产品。 可能是一封电子邮件,其中包含具有不同产品的多个组件。 包含这些电子邮件或这些营销材料的构建基块,弄清楚应该放入其中的内容是人的工作。

What drives the system is usually a whole engineering work streaming ecosystem that goes into a simple email that falls into your mailbox. It could have UX designers designing what the email looks like, whether it’s summer festive-looking or maybe something that looks more transactional.

驱动系统的通常是整个工程工作流生态系统,它进入一条简单的电子邮件,并进入您的邮箱。 UX设计师可以设计电子邮件的外观,无论是夏季喜庆的外观,还是看起来更具交易性的外观。

It could also just be data-driven — what data sciences tend to do — this is called a batch process. They’ll train the models in the backend, and the models could look at historical transaction data, they could look at a customer demographics, they could look at the product metadata itself, [like] the difference between a USB stick or a TV or a toy, take all this information and provide the marketers with a list of customer IDs to all the products that they think are going to be relevant.

它也可能只是数据驱动的-数据科学倾向于做什么-这称为批处理 。 他们将在后端训练模型,模型可以查看历史交易数据,可以查看客户的人口统计信息,可以查看产品元数据本身,例如,USB记忆棒或电视或玩具,获取所有这些信息,并向营销人员提供他们认为相关的所有产品的客户ID列表。

There are also real-time processes where literally the minute someone clicks on the email, it sends a signal back to the data scientists, [who] then immediately incorporates it back into the next touch point. Where I’ve seen that done best is probably Amazon, or actually in travel websites like Booking.com and Expedia.

还有实时过程 ,从字面上看,只要有人单击电子邮件,它就会将信号发送回数据科学家,然后由后者立即将其合并到下一个接触点。 我认为做得最好的地方可能是亚马逊,或者实际上是在Booking.com和Expedia等旅游网站中。

Retail is a space where the margins [are] so tight that to really innovate in this space, you have to think of different contexts and ways to understand what the customer’s trying to buy that’s out of the ordinary

零售是一个利润空间非常狭窄的空间,要在这个空间中进行真正的创新,您必须考虑不同的环境和方式,以了解客户试图购买的商品与众不同

MJ: You talked about Amazon doing it the best. Obviously they’ve been working on this for a long time. What spurred the initial development of this? It makes sense to us today, but Amazon and Google were some of the first companies to do this.

MJ:您谈到了亚马逊做到最好。 显然,他们已经为此工作了很长时间。 是什么促使了这一最初的发展? 今天对我们来说这很有意义,但亚马逊和谷歌是最早这样做的公司。

Can you tell us about the thought process and how it’s evolved since the time that you are there?

您能给我们介绍一下思考过程以及自从您进入那里以来它是如何演变的吗?

JC: Evolution might be the more natural point to start. All of these things in industry are generally driven by a business purpose. Recommender systems for retail [center around this idea of] “Hey, we have this dynamic real estate that no longer is limited to signposts or billboards, and it can change every second or maybe even every time someone lands on the website.” It’s now dynamic real estate, and in order to capture the dynamic-ness of the real estate, there needs to be some way to incorporate data seamlessly to update it.

JC:进化可能是更自然的起点。 工业上的所有这些东西通常都是由商业目的驱动的。 用于零售的推荐系统[以以下想法为中心:“嘿,我们拥有的动态房地产不再仅限于路标或广告牌,它可以每秒或什至每次有人登陆网站时发生变化。” 现在它是动态房地产,并且为了捕获房地产的动态性,需要某种方式无缝整合数据以对其进行更新。

Usually that’s how it started. There’s many things in practice that could be called recommendation systems, but we can just focus on websites for now.

通常就是这样开始的。 在实践中有很多东西可以被称为推荐系统,但是我们现在只关注网站。

If you land on Amazon, you have multiple ribbons, and a good example of how different KPIs can be targeted is every ribbon has a different purpose. One of them could be “these are things you’ve searched before” or “these are things we think you like” that are based on what you bought before.” Amazon is usually pretty good about explaining what the ribbon is, so you land on it, and there’s a contextual anchoring, so you see exactly what the recommendation is for.

如果您登陆亚马逊,则有多个功能区,并且如何针对不同的KPI定位的一个很好的例子是每个功能区都有不同的用途。 其中之一可能是“这些是您之前搜索过的东西”或“这些我们认为您喜欢的东西”,这些都是基于您之前购买的商品。 亚马逊通常会很好地解释功能区是什么,因此您可以落在功能区上,并且具有上下文锚定功能,因此您可以确切地了解建议的用途。

Historically, those ribbons have been business-ruled, so a very simple business rule might be “show things that you’ve bought before so you’ll buy it again.” Shockingly, that led to a lot of lift when Amazon implemented the Buy-It-Again module, so that’s something that doesn’t require any intelligence, just looking at what someone has bought before inserting the exact same thing. That can work for things that customers have irregularity buying like pet food or various beauty products, but it tends to not work in things like fashion, where if you buy one shirt, you’re probably not going to buy the exact same shirt again.

从历史上看,这些功能区是业务规则,因此一个非常简单的业务规则可能是“显示您以前购买的东西,以便再次购买。” 令人震惊的是,当亚马逊实施“购买即买即买”模块时,这带来了很大的提升,因此不需要任何情报,只需在插入完全相同的东西之前先查看某人已经购买了什么。 这可以用于顾客不定期购买诸如宠物食品或各种美容产品之类的东西,但是在诸如时装之类的事物中往往不起作用,在这种情况下,如果您购买一件衬衫,您可能不会再购买完全相同的衬衫。

That’s what drives the evolution of the recommender system, like, what is product [and] what is the KPI for the product, what are you trying to incentivize a purchase of the same thing or something like it? Based on that, you can tailor your algorithm.

那就是推动推荐系统发展的因素,例如什么是产品[以及]什么是产品的KPI,您打算如何激励购买同一件或类似产品? 基于此,您可以定制算法。

A simple business-rule algorithm for cross-sell might be association rule mining: looking at things that tend to be bought in the same basket.

交叉销售的一种简单的业务规则算法可能是关联规则挖掘 :查看倾向于在同一篮子中购买的商品。

In a grocery store context, there’s staples and things that are often cross-sold because they have a higher margin, maybe milk, bread, and eggs. Someone goes into the grocery store to buy milk, maybe then a natural thing to get next is bread.

在杂货店的情况下,经常会交叉销售一些主食和东西,因为它们的利润较高,例如牛奶,面包和鸡蛋。 有人去杂货店买牛奶,也许接下来要买的自然是面包。

That’s something that doesn’t require predictive analytics. It’s really just mining the data to see what patterns emerge.

不需要预测分析。 它实际上只是挖掘数据以查看出现了哪些模式。

Another way of doing it is based on something called collaborative filtering. This is [before] predictive models…actually I would say it’s not outdated because it’s newer than some predictive models…but it’s a relatively simple approach.

另一种方法是基于称为协作过滤的方法 。 这是预测模型之前的…实际上,我要说它还不算过时,因为它比某些预测模型新……但是,这是一个相对简单的方法。

The idea is to look at people that are similar to you in transactions.

这样做的目的是要查看在交易中与您相似的人。

Marlene and I are pretty similar at Amazon, so maybe we’ve both bought the exact same things except for one item. A natural thing to do is say, “Okay, because Marlene and Jack have been almost exactly the same until that one item, let’s just recommend that item to both Marlene and Jack and see what happens.”

Marlene和我在亚马逊非常相似,所以也许我们都买了完全相同的东西,除了一件。 很自然地会说,“好吧,因为直到一件商品为止,玛琳和杰克几乎是完全一样的,所以我们只向玛琳和杰克推荐那一件,然后看看会发生什么。”

It’s not as simple as that, because the similarity can go multiple ways. It could be I’m 10% of Marlene and 10% of someone else and 80% of someone else, and that combines into essentially a linear combination of different people. That score that’s extracted for every product for me is a combination of different people’s weights. That was a breakthrough, and Netflix and the movie recommendation problem spurred the development of collaborative filtering

这不是那么简单,因为相似性可以通过多种途径进行。 可能我是Marlene的10%,其他人的10%,其他人的80%,这实际上是不同人的线性组合。 对我来说,为每种产品提取的得分都是不同人的权重的组合。 那是一个突破,Netflix和电影推荐问题刺激了协同过滤的发展

Then it moved into, “Can we make collaborative filtering even better?” Are there ways to start incorporating variables into the collaborative filtering so [that] not only do we know that Jack and Marlene are similar, Jack and Marlene are similar because these products are similar? Or, maybe Jack and Marlene are similar because they both live in the same city and they’re both in the same demographic? That removed one of the fallbacks of the traditional collaborative filtering, which had no variables.

然后它变成了“我们可以使协作过滤变得更好吗?” 是否有办法开始将变量合并到协作过滤中,以便不仅让我们知道杰克和马琳相似,而且杰克和马琳相似,因为这些产品相似? 或者,也许杰克和马琳是相似的,因为他们俩住在同一个城市,并且都在同一人口中? 这消除了传统协作过滤没有任何变量的后备之一。

Another thing that came up was the idea of neural networks, and obviously that’s been a big thing with deep learning and so on. Because deep learning enables someone to take in the raw transactions, you can incorporate so much more information and just let the algorithm do things a priori.

出现的另一件事是神经网络的概念,很显然,这对深度学习来说是一件大事。 因为深度学习使某人可以进行原始交易,所以您可以合并更多的信息,只需让该算法先验即可。

With a neural network, you can take in transactions, you can take in product information, you can take in customer information…. All that can just flow in its raw form. There’s no need to create any new variables, and the algorithm will just figure out the rank order of products that you like to buy.

使用神经网络,您可以接受交易,可以接受产品信息,可以接受客户信息…。 所有这些都可以原始形式流动。 无需创建任何新变量,该算法仅会确定您要购买的产品的排名。

MJ: Deep learning is such a popular term and method [that] people are starting to explore. I would love to hear more.

MJ:深度学习是一个很流行的术语和方法,人们开始探索。 我希望听到更多。

JC: The traditional way you can think about neural nets are kind of like functional approximators. Actually, a linear regression is a neural net, just with one layer and one set of weights. Think back to middle school or high school arithmetic, when you do the chain rule you have an f(g(h)) equals to something. That’s essentially what a neural net is, but it’s actually discovering the correct f, g, and h that most accurately models your data.

JC:关于神经网络的传统思考方式类似于函数近似器。 实际上,线性回归是一个神经网络,只有一层和一组权重。 回想初中或高中的算术,当您执行链式规则时,您的f(g(h))等于某物。 从本质上讲,这就是神经网络的本质,但实际上是在发现最正确地建模数据的正确的f,g和h。

The most generic form of the deep neural net is a multi-layer feed-forward neural network. You can create multiple layers that are called dense layers. It’s dense because each node, which represents a variable, will connect to another node in the next layer.

深度神经网络的最通用形式是多层前馈神经网络。 您可以创建多个称为密集层的层。 之所以密集,是因为每个代表变量的节点都将连接到下一层中的另一个节点。

The node could be the types of a product you buy, or a node can be a customer. A neural net could be tens of thousands of nodes. With the advancements of back-propagation and all the new inference techniques, it’s now way easier to train a deep neural network than it has been in the past.

节点可以是您购买的产品类型,也可以是客户。 一个神经网络可能是数以万计的节点。 随着反向传播技术的发展和所有新的推理技术的发展,现在比以往任何时候都更容易训练深度神经网络。

I’m not an expert in the underlying technology, but definitely an advanced practitioner.

我不是基础技术专家,但绝对是高级从业者。

MJ: You had mentioned earlier that some of these methods are actually not predictive. Which of the methods are predictive, then? What are the current methods that companies like Netflix and Amazon are using?

MJ:您之前已经提到过,其中某些方法实际上并不是预测性的。 那么,哪种方法可以预测? Netflix和亚马逊等公司目前正在使用什么方法?

JC: Collaborative filtering, I think it maybe could be predictive, but most of how people use it is not predictive. That’s historically how Netflix has used it, and that’s just a limitation of the collaborative filtering and matrix factorization family of algorithms.

JC:协作过滤,我认为它可能是可预测的,但是人们使用它的大多数方式都不是可预测的。 从历史上看,这就是Netflix使用它的方式,而这仅仅是协作过滤和矩阵分解算法系列的局限。

The reason is because the way you think about it is actually a matrix completion problem. If you think of a matrix, each customer would represent one row, and every column would represent the product, and every cell in a matrix is a score that represents how much a customer likes that product. In Netflix’s case, [it would be] how much a customer likes that movie. As you can imagine, that matrix is probably pretty sparse, because not every customer has seen every movie.

原因是因为您的思考方式实际上是矩阵完成问题。 如果考虑矩阵,则每个客户将代表一行,每一列将代表产品,矩阵中的每个单元格都是一个分数,表示一个顾客对产品的喜欢程度。 就Netflix而言,[将是]客户对这部电影的喜欢程度。 可以想象,该矩阵可能很稀疏,因为不是每个客户都看过每部电影。

That’s one of the drawbacks of collaborative filtering: it’s not only not predictive, but often you’ve got a really sparse matrix.

这是协作过滤的缺点之一:它不仅不是预测性的,而且经常会有一个非常稀疏的矩阵。

The way to initialize this matrix is to use some historical figure. Maybe this is Jack’s rank on the movie, because he watched the movie last week and he gave it a 5. It’s not predictive in the sense that maybe I would give the movie a five next week.

初始化此矩阵的方法是使用一些历史数据。 也许这是杰克在电影中的排名,因为他上周看了电影并给了它5分。从某种意义上说,我下周可能给电影给5分,这并不是预料之中的。

I think part of Netflix decided that this was fine because movie preference is generally pretty stable over time. If you watched a movie a year ago, there’s a good chance you probably still like the movie now, so the predictive component is not as important for this problem.

我认为Netflix的一部分认为这很好,因为随着时间的流逝,电影偏好通常相当稳定。 如果您一年前看过电影,那么很有可能您现在仍然喜欢这部电影,因此预测性组件对于这个问题并不那么重要。

In transactions or retail, it’s quite possible that your preference changes rapidly, or the minute you buy something, it no longer has the same recommendation power.

在交易或零售中,您的偏好很有可能会Swift变化,或者在您购买商品的那一刻,它就不再具有相同的推荐力。

That’s where the predictive elements come in, and you can frame recommender systems in a similar way as you frame other problems like churn detection or anomaly detection. You have an X matrix that contains all your variables, and these X variables are historical. You know if you’re at a time T, these are things that are time T minus one before, and your Y variable is the thing that’s predictive, so it’s actually time t+1, or t plus a month, or whatever forward lead period you have.

这就是预测元素的来源,您可以采用与其他问题(如流失检测或异常检测)相同的方式来构建推荐系统。 您有一个包含所有变量的X矩阵,并且这些X变量是历史变量。 您知道您是否在时间T上,这些是时间T减去前一个,而Y变量是可预测的,因此实际上是时间t + 1或t加一个月,或任何前导期间。

That’s how most classification and regression problems are structured. If you train a deep neural network, that’s pretty much exactly the information that goes into training your neural network.

这就是大多数分类和回归问题的结构方式。 如果您训练一个深层的神经网络,那几乎就是训练您的神经网络所需要的信息。

I’m not talking about tensors or anything that’s more dimensional than that, but generally speaking, you’ll have a dataset that contains your historical data, you’ll have an objective function or Y variable that contains the future information you want to predict.

我不是在讨论张量或其他任何维数,但通常来说,您将拥有一个包含历史数据的数据集,您将拥有一个目标函数或Y变量,其中包含您要预测的未来信息。

What happens is, when you’re trying to predict something given your current point in time, you’re not leaking information. You’re only using the information you know at time t to predict time t+1.

发生的事情是,当您尝试根据给定的当前时间点预测某件事时,您不会泄漏信息。 您仅使用在时间t知道的信息来预测时间t + 1。

MJ: To your point about all of these use cases that you mentioned, like retail, movies, so many brands (both small and large) talk about building a recommender engine of some sort. What really makes it powerful?

MJ:关于您提到的所有这些用例,例如零售,电影,这么多的品牌(无论大小)都在谈论构建某种推荐引擎。 是什么真正使其强大?

With Amazon obviously you can say that they have a big data set, but what are some of the other variables that allow a recommendation engine to be better than the next?

显然,借助Amazon,您可以说他们拥有大量数据集,但哪些其他变量可以使推荐引擎比下一个更好呢?

JC: Data is number one, [like] the data assets that your company has housed. If it’s Amazon, obviously it’s the incredible number of customers and transactions they have, the incredible product diversity they have, so not just in the popular products but even in the long tail.

JC:数据是第一位的,[就像]您公司存放的数据资产。 如果是亚马逊,很明显,这是他们拥有难以置信的客户和交易数量,拥有令人难以置信的产品多样性,不仅是受欢迎的产品,甚至是长尾巴。

After data, then the algorithm is what can help differentiate. If you’re a company that’s using simple business rules versus deep neural networks or something like that, that makes a big difference as well.

在获得数据之后,该算法才可以帮助区分。 如果您是使用简单业务规则而不是深度神经网络之类的公司,那也将有很大的不同。

Another thing that a lot of people underestimate but is super important is the customer experience. Instead of just throwing a ribbon with a bunch of recommendations, thinking through a whole ecosystem with things like “what are all the touch points that customers are receiving”, “am I fatiguing the customer with too many touch points”, “is it information overload”, “am I representing the intent correctly”?

许多人低估了但超级重要的另一件事是客户体验。 而不是仅仅提出一些建议,而是通过整个生态系统思考诸如“客户正在接受的所有接触点是什么”,“我是否通过过多的接触点使客户疲劳”,“这是信息吗?超载”,“我是否正确表示了意图”?

Amazon, for instance, historically has been utilitarian. A good number of customers [go] to Amazon with an explicit agenda in mind, like they wanted to buy specific things, [so] they type it in and bought it. [Amazon] tried really hard to leverage that, because it’s a good thing, it’s good that people come to Amazon for the purpose. Leverage that, lean into that to see how we can cross sell better, selling things in different channels [such as] digital media, sell hardware [like] Kindles.

例如,亚马逊历史上一直是功利主义者。 许多客户带着明确的议程去亚马逊,就像他们想购买特定的东西一样,所以他们输入并购买了它。 [Amazon]竭尽全力利用这一点,因为这是一件好事,人们来到此地来亚马逊是一件好事。 利用这一点,了解如何才能更好地交叉销售,以不同的渠道(例如,数字媒体)销售产品,销售硬件(例如,Kindle)。

MJ: I don’t know if you saw the Mary Meeker report. She had said that 49% of people who go to Amazon start and basically end with Amazon. They search through Amazon and then they purchase through Amazon.

MJ:我不知道您是否看过Mary Meeker的报告 。 她曾说过,去亚马逊的人中有49%开始并基本上以亚马逊为终点。 他们通过亚马逊搜索,然后通过亚马逊购买。

To your point, I think you’re right, most people do come with the intent of purchasing there. It’s very utilitarian.

就您的观点而言,我认为您是对的,大多数人的确会在此购买商品。 这是非常功利的。

JC: Another really great example that I love to give is Starbucks. If you’re part of the Starbucks loyalty program, more often than not you’ve seen games that pop up either through email or your app.

JC:我喜欢举的另一个非常好的例子是星巴克。 如果您是星巴克会员计划的一部分,那么您经常会看到通过电子邮件或应用弹出的游戏。

It’s a game so it’s a different medium than your traditional recommender system, but underlying that game is actually a heavy customer data-driven recommender engine, which determines what products you like, what is the type of engagement you need in order to be a more valuable customer, and so on.

这是一款游戏,因此与传统的推荐系统不同,但该游戏实际上是一个由大量客户数据驱动的推荐引擎,它决定了您喜欢的产品,成为哪种类型的订婚产品。有价值的客户,等等。

The main point is that data scientists have to work in lockstep with either designers or marketers or business analyst in order to come up with the optimal experience.

要点是,数据科学家必须与设计师,市场营销人员或业务分析师保持同步,以便获得最佳体验。

Otherwise, something like a game requires so many different cross functional lines of thought that I don’t think either a data scientist or a marketer could have done it by themselves. Data scientists would have just tried to figure out the recommender problem and not thought about the experience, whereas marketers would not even realize that data science could be used to optimize something. Literally, to as many customers that they can have, you can pull as many levers. If you have 15 million customers, you literally could have 15 million variants using data science.

否则,诸如游戏之类的东西需要太多不同的跨职能思路,以至于我不认为数据科学家或营销人员都可以自己完成。 数据科学家只是试图找出推荐者的问题,而没有考虑经验,而营销人员甚至不会意识到可以使用数据科学来优化某些东西。 从字面上看,对于可能拥有的尽可能多的客户,您可以拉很多杠杆。 如果您有1500万客户,那么使用数据科学就可以拥有1500万个变体。

MJ: We’ve already talked about some examples where [companies have] built a really strong recommendation engine. Can you give some other examples of good recommendation engine you’ve seen out there, whether it be brands or specific use cases within companies?

MJ:我们已经讨论了一些[公司]建立了非常强大的推荐引擎的示例。 您能否列举一些您所见过的良好推荐引擎的例子,无论是品牌还是公司内部的特定用例?

JC: There’s so many. I love Spotify. I think they do a great job with both the customer-based recommender system, meaning finding out which other people are like you and what they like to listen to…

JC:有很多。 我喜欢Spotify。 我认为他们在基于客户的推荐系统上都做得很好,这意味着找出哪些人喜欢您以及他们喜欢听什么。

MJ: They squash SoundCloud at this point, in my opinion. I used SoundCloud before Spotify…

MJ:在我看来,他们现在压扁了SoundCloud。 我在Spotify之前使用了SoundCloud…

JC: I did too actually. But yes, Spotify definitely is — in terms of engineering and the sophistication of what they what they recommend — they just do so much more.

JC:我实际上也这样做。 但是,是的,Spotify绝对是-从工程学和他们所推荐的建议的复杂性方面,他们所做的更多。

The interesting thing — most likely, I don’t know for sure because I don’t work there — they actually look at the music itself. Whether it’s the tags of the music, like what kind of genre is it, who’s the artist…they also probably look at what’s the tempo, what are the instruments, and they’re actually digging into the DNA of the music, similar to what Pandora was doing and use that to figure out what types of music you’d like. So, going beyond genre as well.

有趣的是-很可能是我不确定,因为我不在那儿工作-他们实际上是在看音乐本身。 无论是音乐的标签,例如音乐的流派,艺术家是谁……他们还可能查看节奏是什么,乐器是什么,并且它们实际上是在挖掘音乐的DNA,类似于潘多拉(Pandora)正在这样做,并以此来确定您想要哪种音乐。 因此,不仅限于类型。

I’ve noticed that sometimes Spotify recommends me things that I didn’t think I would ever like listening to, but the more I listen to it, the more I realize it actually is similar to things that I like.

我注意到有时Spotify向我推荐一些我认为我不想听的东西,但是我听的越多,我就越意识到它实际上与我喜欢的东西相似。

Another one is YouTube. They have slightly different KPIs in retail — in the sense that most of your KPIs are not transaction-based, they center on engagement — so they’re designed to keep you on the website for longer.

另一个是YouTube。 它们在零售中的KPI略有不同-从某种意义上说,您的大多数KPI都不基于交易,而是以参与度为中心-因此,它们的设计目的是使您在网站上停留的时间更长。

I think their process probably starts with the design, so what is the actual UX that we want to enable someone that’s currently on the platform to do, so kind of like the whole customer journey mapping and figuring out at this point in a journey, what is the right experience for the customer.

我认为他们的过程可能始于设计,那么我们要使平台上当前的某人能够执行的实际UX是什么,就像整个客户旅程映射并确定旅程中的这一点一样,是客户的正确体验。

An example might be when you’ve just finished watching a video, that’s a great piece of real estate to step in and figure out based on what was just watched, so contextual on Marlene or Jack finishing this video on fuzzy cats, what is the next thing that we think that they would like to watch?

例如,当您刚看完视频时,这是一个很大的领域,可以根据刚刚观看的内容来弄清楚,因此在Marlene或Jack的上下文环境中使用模糊猫来完成此视频,这是什么呢?我们认为他们想观看的下一件事?

Often times, these contextual hints pop up [at] opportune moments on a customer journey, and I think YouTube figures out what these journeys are and what these optimal points are, and then builds bottles around that.

通常,这些上下文提示会在客户旅程中的适当时刻弹出,我认为YouTube会弄清楚这些旅程是什么以及这些最佳点是什么,然后围绕这些线索建立瓶子。

MJ: Now you’ve spurred a different question, which is how exactly people go about creating these engines? You mentioned that with different use cases, you have to start at different questions and different points. For example, you said with YouTube, you might start with a question of design and engagement, but with Spotify, it might be different.

MJ:现在您引发了一个不同的问题,那就是人们到底要如何去创造这些引擎? 您提到,对于不同的用例,您必须从不同的问题和不同的地方入手。 例如,您说过在YouTube上,您可能会从设计和参与度问题入手,但对于Spotify,可能有所不同。

What are the different ways you would even begin to design this engine for a good customer experience?

您甚至会开始设计这种引擎以获得良好的客户体验的不同方式是什么?

JC: That’s a good question. I wish there was a one-size-fits-all answer, but I don’t think there is. It’s kind of a gray thing because there’s multiple ways that YouTube or Spotify could have designed their experience but I think it’s generally just working cross-functionally and figuring out what it is, what is the rank order of things that customers care about.

JC:这是一个很好的问题。 我希望有一个千篇一律的答案,但我认为没有答案。 这是一件灰色的事情,因为YouTube或Spotify可以采用多种方式设计他们的体验,但我认为通常只是跨功能工作并弄清楚它是什么,客户关心的事物的等级顺序是什么。

For Amazon, it was, “Hey, our customers are really utilitarian, let’s cross-sell to them a bit more. For Expedia or travel companies, it’s “Let’s figure out customer segments.” Can we figure out if someone is a luxury traveler or a business traveler? Are they [at] the end of the customer lifecycle, where they’re off to churn and go to some other website, or are they in the beginning, so it’s more about education?

对于亚马逊来说,这是“嘿,我们的客户确实是功利主义者,让我们再交叉销售给他们。 对于Expedia或旅游公司,它是“让我们找出客户群”。 我们能否确定某人是豪华旅行者还是商务旅行者? 他们是在客户生命周期的尽头,他们会流失并访问其他网站,还是一开始就是他们,所以更多地是关于教育?

There’s just so many ways to characterize a customer, but I think it generally starts with understanding the customer better, like what segments they’re in, how they engage. Of the people that are falling off, what are the leading indicators that might tell us you’re falling off? Of the people that are just starting, what are the indicators if someone’s actually growing into a stable trajectory? Those are just your typical customer lifecycle modeling, and from there, designing the right experience to maximize the life cycle.

表征客户的方法有很多,但我认为通常首先要更好地了解客户,例如他们所处的细分市场,如何参与。 在流失的人中,哪些主要指标可以告诉我们您正在流失? 在刚刚起步的人们中,如果某人真正成长为稳定的轨迹,有哪些指标? 这些只是您典型的客户生命周期建模,然后从那里设计正确的体验以最大化生命周期。

MJ: I assume KPIs are similar. The KPIs for Spotify, for example, or for Amazon are probably different than YouTube, to your point around [whether] you’re going for the purchase or the next song or engagement.

MJ:我认为KPI相似。 例如,关于[无论]您要购买商品还是下一首歌曲或参与活动,Spotify或Amazon的KPI可能与YouTube不同。

What would you say are some baseline KPIs that you track for these engines?

您要说的是您为这些引擎跟踪的一些基准KPI?

JC: That’s a really great question. Conversion, whether it’s converting to a transaction or converting into a click, that’s generally the de facto standard for a recommender system. I’ve also seen recommender systems optimized for other things like revenue and profit and maybe even some more esoteric things, like revenue on items that have a higher margin than 30% net of returns, so that objective would be “I want to recommend things that are high margin and people don’t just return it.” So you can be very precise with these KPIs.

JC:这是一个非常好的问题。 转换,无论是转换为交易还是转换为点击,通常都是推荐系统的事实上的标准。 我还看到推荐系统针对其他方面进行了优化,例如收入和利润,甚至还有一些更深奥的事情,例如利润率高于30%的净收益的商品收入,因此目标是“我想推荐事物那是高利润,人们不仅会退还它。” 因此,使用这些KPI可以非常精确。

The thing to realize, in choosing between these, is what am I enabling? In choosing this KPI, what is actually happening to the things that I recommend?

在这两者之间进行选择时,要实现的目标是使我实现了什么? 在选择此KPI时,我建议的事情实际上发生了什么?

For the margin one, what’s most likely to happen is if we build a data science model that optimizes profit, naturally you’re gonna start recommending things that are high margin. It may not necessarily be what the customer wants. Even if it doesn’t say high profit, like you don’t explicitly call that on your website, it still biases your algorithm to pick things that are not representative of a broader set.

对于保证金来说,最有可能发生的是,如果我们建立了一个优化利润的数据科学模型,那么您自然会开始推荐高利润率的东西。 不一定是客户想要的。 即使它没有说高利润,就像您没有在您的网站上明确调用那样,它仍然会使您的算法偏向于选择不代表更广泛集合的事物。

Another interesting KPI in the recommender system is diversity. In most models, the model will have a pretty strong conviction of what type of customer you are. It doesn’t take into account the cross-correlation between the things I recommend. Maybe from going to Amazon, it determines that I really like socks. Most likely in the top ten, you’ll see socks, maybe you’ll see underwear, maybe you’ll see more socks. Although it may be true that I like socks, one slot is enough for that sock. There needs to be a way to take things that scored lower in the algorithm and surface that up for the sake of diversity.

推荐系统中另一个有趣的KPI是多样性。 在大多数模型中,模型都会对您是哪种类型的客户有很强的信念。 它没有考虑我推荐的事物之间的互相关性。 也许从去亚马逊开始,它决定了我真的很喜欢袜子。 最有可能在前十名中,您会看到袜子,也许您会看到内衣,也许您会看到更多的袜子。 尽管我确实喜欢袜子,但一个袜子足以容纳那只袜子。 为了多样性,需要一种方法来处理在算法中得分较低的事物,并将其表面化。

There’s these cross-hybridized objective functions where you do want to maximize for conversion, but you have to take into account the customer experience as well. If you give them diversity, then they explore the longer-tailed products, and [maybe] longer term [that] might increase customer value

您确实希望在这些交叉混合的目标函数中实现最大化的转换,但是您也必须考虑客户体验。 如果您赋予他们多样性,那么他们将探索更长尾的产品,并且[可能]长期[可能]增加客户价值

MJ: I have so many more questions, but I think we’re running out of time. I want to just ask this final question: what are your final tips for building a good recommendation engine? How do you avoid the pitfalls? What are the things to watch out for?

MJ:我还有很多其他问题,但是我认为我们已经没时间了。 我只想问最后一个问题:构建良好推荐引擎的最终提示是什么? 您如何避免陷阱? 需要注意什么?

JC: The number one tip: you know your product better than a vendor does. The vendor can give you advice on the algorithm and what to use, but [don’t] plug-and-play a recommender system into your website without really understanding your business and what drives it. Is it like a milk and eggs and bread type of thing, or is it a fashion thing where you don’t want to recommend too similar things, or if it’s a lifecycle thing where you need to recommend things that fit the lifecycle, only you know that. If a vendor claims to know this better than you, I think it’s a clear sign to stay away.

JC:第一要诀:您比供应商更了解您的产品。 供应商可以为您提供有关算法及其使用方法的建议,但[不要]在您不真正了解您的业务及其驱动因素的情况下,将推荐系统插入即插即用。 是像牛奶,鸡蛋和面包之类的东西,还是不想推荐过于相似的事物的时尚事物,或者,如果它是生命周期的事物,您需要推荐适合生命周期的事物,只有您自己我知道。 如果供应商声称比您更了解这一点,那么我认为这是一个明确的信号,可以远离。

Number two, you have to consider the maturity of your business to actually take in something that’s as complicated as deep neural network. Obviously a deep neural net will give you cutting edge performance, [but] your business might not need that depending on where it is.

第二,您必须考虑业务的成熟度,才能真正接受像深度神经网络一样复杂的事物。 显然,深层的神经网络将为您提供最先进的性能,但您的业务可能不需要,具体取决于它在哪里。

Something I found is that if you look at all the research papers about random forests versus gradient boosted decision trees versus neural network, the difference in the cutting edge once you get past the gradient boosted tree is really small. You’ll generally only see a five to ten percent improvement in your accuracy by going to something that’s cutting edge. What this means is that you can get to 80% of what you need with something that’s fairly simple. Take that into account.

我发现,如果您查看有关随机森林与梯度增强决策树与神经网络的所有研究论文,那么经过梯度增强树之后,最前沿的差异确实很小。 通过采用最先进的方法,您通常只会看到5%到10%的精度提高。 这意味着您可以用相当简单的东西就可以满足80%的需求。 考虑到这一点。

Another thing to take into account is the resourcing. If you do decide to build something in-house, it’s hard to find a deep learning expert or a machine learning expert that can maintain it over time. What this means is that your business has to be mature enough to support these engineers, because I think a lot of people have a notion that once they build it for me, it’s done, I have the capability.

要考虑的另一件事是资源。 如果您决定自己构建某些东西,很难找到可以长期维护它的深度学习专家或机器学习专家。 这意味着您的业务必须足够成熟才能支持这些工程师,因为我认为很多人都认为,一旦为我构建它,它就完成了,我就有能力了。

In reality, it’s something that needs to be maintained over time, improved, bugs can pop up. No machine learning pipeline is perfect. From a strategic perspective or a technical perspective, you have to think about it as a long-term investment versus just a build and throw it over the fence, which again brings it back to why it’s more important to build a capability in-house than having a vendor do it, because the vendor isn’t invested long-term in your business like you are.

实际上,随着时间的流逝,需要对其进行维护,改进,并且可能会弹出错误。 没有机器学习管道是完美的。 从战略或技术的角度来看,您必须将其视为一项长期投资,而不是将其视为构建并扔掉篱笆,这又使它回到为什么在内部构建能力比在内部构建更重要的原因让供应商来做,因为供应商没有像您一样长期投资于您的业务。

MJ: Just thinking about this, I think you are probably the best recommender engine. What do you think would be a good topic for us to cover in another one of our AI for Growth series?

MJ:考虑到这一点,我认为您可能是最好的推荐引擎。 您认为什么对我们来说是一个很好的话题,可以在我们的《人工智能促进增长》系列的另一篇文章中介绍?

JC: Well, tapping it into the human neural network, I think something around pricing would be interesting [like] dynamic pricing and the ability to price on-the-fly is something that, especially for companies that are moving away from brick-and-mortar channels or on order, companies like Uber where pricing things is done on-the-spot, that’s something that machine learning has really just scratched the surface in industry

JC:好吧,将其应用到人的神经网络中,我认为围绕定价的某些事情会很有趣,例如动态定价,以及即时定价的能力,尤其是对于那些不再依赖实体定价的公司而言。 -砂浆渠道或按订单订购,像Uber这样的公司都在当场进行定价,这是机器学习的真正开端。

Another really interesting thing is the intersection between pricing and personalization. Not just servicing the right content but also servicing the right price in conjunction with that content to give personalized promos that are dynamic and tailored to every customer. That looks like the next frontier for retail over the next three to five years. I think it’s going to be happening pretty quickly, because there’s so much value associated with it. I’m happy to come on and discuss more, depending on how many of these you have.

另一个真正有趣的事情是定价和个性化之间的交集。 不仅为正确的内容提供服务,而且还与该内容一起提供正确的价格,从而为每个客户提供动态且量身定制的个性化促销。 这看起来像是未来三到五年内零售业的下一个前沿领域。 我认为它将很快发生,因为与它相关的价值很大。 我很乐意继续讨论,具体取决于您有多少个。

MJ: Thank you so much, Jack. This was such a wonderful conversation, and I’m sure we’ll be just having you on another one another time, so thank you. And thank you everyone for tuning into AI for Growth. We’ll see you guys at our next episode!

MJ:非常感谢你,杰克。 这次谈话真是太好了,我敢肯定,我们会再一次让你见面,所以谢谢。 并感谢大家适应AI增长。 我们下一集会再见!

学到了什么? 点击 ? 说“谢谢!” 并帮助其他人找到本文。 (Learned something? Click the ? to say “thanks!” and help others find this article.)

The full transcript from this interview was first published on TopBots.

采访的全文最初在TopBots上发表 。

翻译自: https://www.freecodecamp.org/news/how-to-develop-a-hyper-personalized-recommendation-system-ab9faf41b9a/

个性化推荐系统

你可能感兴趣的:(神经网络,算法,大数据,编程语言,python)