神码ai人工智能写作机器人_机器学习和人工智能中的多样性推荐系统

神码ai人工智能写作机器人

人工智能 , 机器学习 (Artificial Intelligence, Machine Learning)

每天,您都会受到机器学习和AI推荐算法的影响。 (Every day you are being influenced by machine learning and AI recommendation algorithms.)

What you consume on social media through Facebook, Twitter, Instagram, the personalization you experience when you search, listen, and watch through Google, Spotify, Youtube, what you discover using Airbnb and UberEats, all of these products are powered by machine learning recommender systems.

您通过Facebook,Twitter,Instagram在社交媒体上消费的内容,通过Google,Spotify,Youtube搜索,收听和观看时所遇到的个性化,使用Airbnb和UberEats发现的内容,所有这些产品均由机器学习推荐器提供支持系统 。

Recommender systems influence our everyday lives 推荐系统会影响我们的日常生活

80% of all content consumed on Netflix and $98 billion of annual revenue on Amazon is driven by recommendation systems and these companies continue investing millions in building better versions of these algorithms.

推荐系统带动了Netflix消费的所有内容的80%和亚马逊每年980亿美元的收入,这些公司继续投资数百万美元来构建这些算法的更好版本。

推荐系统主要有两种: (There are two main types of recommender systems:)

  1. Collaborative filtering: finding similar users to you and recommending you something based on what that similar user liked.

    协作过滤:查找与您相似的用户,并根据相似用户的喜欢向您推荐一些东西。
  2. Content-based filtering: taking your past history and behavior to make recommendations.

    基于内容的过滤:利用您过去的历史和行为提出建议。

There is also a hybrid based recommender system, which mixes collaborative and content-based filtering. These machine learning and AI algorithms are what power the consumer products we use every day.

还有一个基于混合的推荐系统,它将协作过滤和基于内容的过滤混合在一起。 这些机器学习和AI算法是我们每天使用的消费产品的动力。

How recommender systems work 推荐系统的工作方式

问题在于这些算法从根本上针对同一事物进行了优化:相似性。 (The problem is these algorithms are fundamentally optimizing for the same thing: similarities.)

Recommendation algorithms make optimizations based on the key assumption that only similarities are good. If you like fantasy books, you will get recommended more fantasy books, if you like progressive politics, you will get recommended more progressive politics. Following these algorithms limit our world view and we fail to see new, interesting, and unique perspectives.

推荐算法基于只有相似才是好的关键假设进行优化。 如果您喜欢幻想小说,将会获得更多推荐的幻想小说;如果您喜欢进步政治,则将获得推荐更多的进步政治。 遵循这些算法限制了我们的世界观,我们看不到新的,有趣的和独特的观点。

Recommender systems lead us down a one-track mind 推荐系统使我们一心一意

Like a horse running with blinders, we fall into an echo chamber and the dangerous AI feedback loop where the algorithm’s outputs are reused to train new versions of the model. This narrows our thinking and reinforces biases. Recent events like the Facebook–Cambridge Analytica data breach demonstrate technology’s influence over human behavior and its impact on individuals and society.

就像奔跑的人一样,我们陷入了回声室和危险的AI反馈回路,在该回路中,算法的输出被重用以训练新版本的模型。 这缩小了我们的思维范围,并加剧了偏见。 Facebook-Cambridge Analytica数据泄露等近期事件证明了技术对人类行为的影响及其对个人和社会的影响。

Psychology and sociology agree: we fear what we do not know. When people become myopic, that is when the “us vs them” mentality is created and where prejudice is rooted. The civil unrest in the United States and around the world can be linked back to these concepts. Fortunately, research also demonstrates that diversity of perspectives creates understanding and connectedness.

心理学和社会学一致:我们担心自己不知道的事情。 当人们变得近视时,那就是“我们与他们”的心态被创造出来,并且偏见根植于此。 美国和世界范围内的内乱可以与这些概念联系起来。 幸运的是,研究还表明,观点的多样性会产生理解和联系。

这也是一个业务问题。 (This is also a business problem.)

The typical consumer has 3 to 5 preferences:

典型的消费者具有3到5个偏好:

  • 3 to 5 favorite book or movie genres

    3-5种最喜欢的书或电影类型
  • 3 to 5 most listened-to musical categories

    3-5个听得最多的音乐类别
  • 3 to 5 different fashion styles

    3-5种不同的时尚风格
  • 3 to 5 preferred cuisines

    3至5种首选美食
Consumers are diverse 消费者多元化

Why is this diverse consumer behavior not better reflected in our technology’s behavior? In fact, if a business is able to convert a customer into trying a new category, such as turning a running customer into a new road biking customer, that customer is likely to spend 5 to 10x more through onboarding and purchases in that new activity. For every diverse category a business is not recommending, that is lost sales and engagement.

为什么这种多样化的消费者行为不能更好地反映在我们的技术行为中? 实际上,如果企业能够将客户转换为尝试新的类别,例如将正在运行的客户转变为新的公路自行车客户,则该客户很可能会通过在新活动中的入职和购买来多花5到10倍的钱。 对于每种类别的业务,我们都不建议这样做,因为这会损失销售和参与度。

机会:我们如何建立更好的推荐系统,以实现消费者多样化并增加客户生命周期价值? (The opportunity: how can we build a better recommender system that enables consumer diversity and increases customer lifetime value?)

We can approach this problem through the customer lens. Let’s take Elon Musk as a model world citizen, who publicly stated he loved fantasy books growing up and Lord of the Rings having a large impact on him.

我们可以通过客户的角度来解决这个问题。 让我们以埃隆·马斯克(Elon Musk)为例,他是一个模范世界公民,他公开表示自己喜欢成长中的幻想小说,而《指环王》对他的影响很大。

But if Elon continued to follow the recommendations of today’s most visible machine learning algorithm on Amazon, he would continue down the path of fantasy, fantasy, and more fantasy. Elon has also stated that business books shaped his world view, with Zero to One as his recommendation. Technology should be enabling, not limiting, more of these connections for everyone.

但是,如果Elon继续遵循当今亚马逊上最可见的机器学习算法的建议,他将继续走幻想,幻想以及更多幻想的道路。 Elon还表示,商业书籍塑造了他的世界观,他的建议是“ 零对一” 。 技术应该为所有人提供而非限制更多这些连接。

The status quo leads to more of the same, so how can we better match the customer’s interests? 现状导致更多的相同,那么我们如何才能更好地满足客户的利益呢?

How can we build a recommendation engine that would take an input book like Lord of the Rings and recommend an output book like Zero to One?

我们如何构建一个推荐引擎,以采用像《指环王》这样的输入书并推荐像《零到一个》这样的输出书?

If we can solve this for an individual case like Elon’s, then we can start to see how a better recommender system can diverge from the similarity path to create more meaningful diversity.

如果我们可以针对像Elon这样的个别案例解决此问题,那么我们就可以开始看看更好的推荐系统如何从相似性路径中分离出来,以创建更有意义的多样性。

建立多元化推荐系统 (Building a diversity recommender system)

The data science process:1. Define goal2. Gather, explore, and clean data3. Transform data4. Build machine learning recommendation engine5. Build diversity recommendation engine proof of concept6. Design mockups7. Business value hypothesis and target launch

数据科学过程:1。 定义目标2。 收集,浏览和清理数据3。 转换数据4。 构建机器学习推荐引擎5。 建立多样性推荐引擎的概念证明6。 设计样机7。 业务价值假设和目标启动

The data science process 数据科学过程

1.定义目标 (1. Define goal)

Books is the ideal industry to explore because there is a clear distinction between book categories and potential revenue, unlike music where the dollar value for listening to new genres is less clear. The goal is to build a recommender system where we input a book and have it output:

书籍是理想的探索行业,因为书籍类别和潜在收入之间有明显的区别,而音乐则不同于聆听新流派的美元价值。 我们的目标是建立一个推荐系统,在该系统中我们输入一本书并输出:

  1. Recommendations based on similarities, the status quo algorithm

    基于相似度的推荐,现状算法

  2. Recommendations based on diversity, the evolution of the status quo

    基于多样性的建议,现状的演变

The long term goal is to build a recommender system that can be applied across various industries, enabling customers to open doors to diverse discoveries and increasing customer lifetime value for the company.

长期目标是建立一个可应用于各个行业的推荐系统,从而使客户能够为各种发现打开大门,并为公司提高客户的生命周期价值。

2.收集,浏览和清理数据 (2. Gather, explore, and clean data)

Dealing with data 处理数据

Goodreads provides a good dataset. Within here we need to determine what is useful, what can be removed, and which datasets to merge together.

Goodreads提供了一个良好的数据集 。 在这里,我们需要确定有用的,可以删除的以及要合并在一起的数据集。

# Load book data from csv
import pandas as pd
books = pd.read_csv('../data/books.csv')
books
# Explore features
books.columns

There are 10,000 books in this dataset and we want “book tags” as a key feature because it has rich data about the books to help us with recommendations. That data lives in different datasets so we have to data wrangle and piece the data puzzle together.

该数据集中有10,000本书,我们希望“ book tags”作为关键功能,因为它具有有关这些图书的丰富数据,可以帮助我们提出建议。 数据存在于不同的数据集中,因此我们必须进行数据纠缠并将数据难题拼凑在一起。

# Load tags book_tags data from csv
book_tags = pd.read_csv('../data/book_tags.csv')
tags = pd.read_csv('../data/tags.csv')# Merge book_tags and tags
tags_join = pd.merge(book_tags, tags, left_on='tag_id', right_on='tag_id', how='inner')# Merge tags_join and books
books_with_tags = pd.merge(books, tags_join, left_on='book_id', right_on='goodreads_book_id', how='inner')# Store tags into the same book id row
temp_df = books_with_tags.groupby('book_id')['tag_name'].apply(' '.join).reset_index()
temp_df.head(5)# Merge tag_names back into books
books = pd.merge(books, temp_df, left_on='book_id', right_on='book_id', how='inner')
books

We now have book tags all in one dataset.

现在,我们将书籍标签全部集中在一个数据集中。

3.转换数据 (3. Transform data)

We have 10,000 books in the dataset each with 100 book tags. What do these book tags contain?

我们的数据集中有10,000本书,每本书都有100个书签。 这些书标签包含什么?

# Explore book tags
books['tag_name']
Example book tags for The Hunger Games and Harry Potter and the Philosopher’s Stone 饥饿游戏和哈利·波特与魔法石的书籍标签示例

We want to transform these texts into numerical values so we have data that the machine learning algorithm understands. TfidfVectorizer turns text into feature vectors.

我们希望将这些文本转换为数值,以便获得机器学习算法可以理解的数据。 TfidfVectorizer将文本转换为特征向量。

# Transform text to feature vectors
from sklearn.feature_extraction.text import TfidfVectorizer
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(books['tag_name'])
tfidf_matrix.todense()
This becomes a 10000x144268 matrix 这将成为10000x144268矩阵

TF-IDF (Term Frequency — Inverse Document Frequency) calculates how important words are in relation to the whole document. TF summarizes how often a given word appears within a document. IDF downscales words that appear frequently across documents. This allows TF-IDF to define the importance of words within a document based on the relationship and weighting factor.

TF-IDF(术语频率-反向文档频率)计算单词相对于整个文档的重要性。 TF总结了给定单词在文档中出现的频率。 IDF缩小了在文档中经常出现的单词的比例。 这使TF-IDF可以根据关系和加权因子定义文档中单词的重要性。

4.构建机器学习推荐引擎 (4. Build machine learning recommendation engine)

Now we build the recommender. We can use cosine similarity to calculate the numeric values that denote similarities between books.

现在我们建立推荐器。 我们可以使用余弦相似度来计算表示书籍之间相似度的数值。

# Use numeric values to find similarities
from sklearn.metrics.pairwise import linear_kernel
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
cosine_sim

Cosine similarity measures the cosine of the angle between two vectors projected in a multi-dimensional space. The smaller the angle, the higher the cosine similarity. In other words, the closer these book tags are to each other, the more similar the book.

余弦相似度用于度量在多维空间中投影的两个向量之间的角度的余弦。 角度越小,余弦相似度越高。 换句话说,这些书籍标签之间的距离越近,书籍越相似。

Example of how the cosine similarity matrix works 余弦相似度矩阵如何工作的示例

Next we write the machine learning algorithm.

接下来,我们编写机器学习算法。

# Get book recommendations based on the cosine similarity score of book tags# Build a 1-dimensional array with book titles
titles = books['title']
tag_name = books['tag_name']
indices = pd.Series(books.index, index=books['title'])# Function that gets similarity scores
def tags_recommendations(title):
idx = indices[title]
sim_scores = list(enumerate(cosine_sim[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
sim_scores = sim_scores[1:11] # How many results to display
book_indices = [i[0] for i in sim_scores]
title_df = pd.DataFrame({'title': titles.iloc[book_indices].tolist(),
'similarity': [i[1] for i in sim_scores],
'tag_name': tag_name.iloc[book_indices].tolist()},
index=book_indices)
return title_df

This is the foundational code we need for a recommendation engine. This is the building block for Amazon’s $98 billion revenue-generating algorithm and others like it. Almost seems too simple. We can stop here or we can expand our code to show more data insights.

这是推荐引擎所需的基本代码。 这是亚马逊980亿美元的创收算法以及其他类似算法的基础。 几乎看起来太简单了。 我们可以在这里停止,也可以扩展代码以显示更多数据见解。

# Get book tags, total tags, and percentage of common tags
def recommend_stats(target_book_title):
# Get recommended books
rec_df = tags_recommendations(target_book_title)
# Get tags of the target book
rec_book_tags = books_with_tags[books_with_tags['title'] == target_book_title]['tag_name'].to_list()
# Create dictionary of tag lists by book title
book_tag_dict = {}
for title in rec_df['title'].tolist():
book_tag_dict[title] = books_with_tags[books_with_tags['title'] == title]['tag_name'].to_list()
# Create dictionary of tag statistics by book title
tags_stats = {}
for book, tags in book_tag_dict.items():
tags_stats[book] = {}
tags_stats[book]['total_tags'] = len(tags)
same_tags = set(rec_book_tags).intersection(set(tags)) # Get tags in recommended book that are also in target book
tags_stats[book]['%_common_tags'] = (len(same_tags) / len(tags)) * 100
# Convert dictionary to dataframe
tags_stats_df = pd.DataFrame.from_dict(tags_stats, orient='index').reset_index().rename(columns={'index': 'title'})
# Merge tag statistics dataframe to recommended books dataframe
all_stats_df = pd.merge(rec_df, tags_stats_df, on='title')
return all_stats_df

Now we input Lord of the Rings into the recommendation engine and see the results.

现在我们将《指环王》输入推荐引擎并查看结果。

# Find book recommendations
lor_recs = recommend_stats('The Fellowship of the Ring (The Lord of the Rings, #1)')
lor_recs
Top 10 recommended books based on book tag similarity score 根据书本标签相似度评分推荐的前10本书籍

We get a list of the top 10 most similar books to Lord of the Rings based on book tags. The recommendations look nearly identical to Amazon’s website:

根据图书标签,我们会列出与指环王最相似的前十本书。 这些建议看起来几乎与亚马逊的网站相同:

Amazon.com as of August 2020 截至2020年8月的Amazon.com

Success! Producing a similarity recommendation was part one. Part two is producing diversity.

成功! 提出相似性建议是第一部分。 第二部分是产生多样性。

5.建立多样性推荐引擎的概念证明 (5. Build diversity recommendation engine proof of concept)

This next part is where evolution happens. The diversity recommendation algorithm does not currently exist (publicly) so there will be some art to this science. In lieu of spending months in the research and mathematics lab, how can we build a proof of concept that can either validate or invalidate the feasibility of producing a diversity recommendation? Let’s explore the data.

下一部分是进化发生的地方。 分集推荐算法目前不存在(公开),因此该科学将涉及一些技巧。 代替在研究和数学实验室中花费数月的时间,我们如何构建可以验证或无效提出多样性建议的可行性的概念验证? 让我们探索数据。

Since we are reverse engineering through the Elon Musk customer lens and wanting the recommender to output Zero to One, let’s find where this book is positioned in relation to Lord of the Rings.

由于我们是通过Elon Musk的客户透镜进行逆向工程的,并且希望推荐者将“零到一”输出,因此让我们找到这本书相对于《指环王》的定位。

# Find Zero to One book
lor_recs[lor_recs.title == 'Zero to One: Notes on Startups, or How to Build the Future']

In relation to Lord of the Rings, Zero to One is rank 8,592 out of 10,000 books based on similarities. Pretty low. According to the algorithm, these two books are on opposite ends of the spectrum and not similar at all. This book is statistically in the lowest quartile which means neither you nor Elon would be recommended this diversity of thought.

对于《指环王》,根据相似性,在10,000本书中,“零到一”排名8,592。 很低 根据该算法,这两本书在频谱的相反两端,根本不相似。 从统计上看,这本书位于最低的四分位数中,这意味着您和Elon都不会被推荐使用这种思想。

# Calculate statistical data
lor_recs.describe()

Using a boxplot, we can better visualize this positioning:

使用箱线图,我们可以更好地可视化此定位:

# Boxplot of similarity score
import matplotlib.pyplot as plt
lor_recs.boxplot(column=['similarity'])
plt.show()# Boxplot of percentage of common tags
lor_recs.boxplot(column=['%_common_tags'])
plt.show()
Positioning based on cosine similarity and common book tags compared to all 10,000 books 基于余弦相似度和普通书本标签的定位(与所有10,000本书相比)

Based on your own knowledge, would you say these two books are so extremely different?

根据您自己的知识,您会说这两本书有很大不同吗?

We can explore the data further and find the most common book tags using NLTK (Natural Language Toolkit). First, we clean up words such as removing hyphens, tokenize the words, and then remove all the stop words. After the text is clean, we can calculate the top 10 frequent words that appear in the Lord of the Rings book tags.

我们可以使用NLTK(自然语言工具包)进一步探索数据并找到最常用的书本标签。 首先,我们清理单词,例如删除连字符,将单词标记化,然后删除所有停用词。 文本干净之后,我们可以计算出现在《指环王》书籍标签中的前10个常见单词。

# Store book tags into new dataframe
lor_tags = pd.DataFrame(books_with_tags[books_with_tags['title']=='The Fellowship of the Ring (The Lord of the Rings, #1)']['tag_name'])# Find most frequent word used in book tags
import matplotlib
import nltktop_N = 10
txt = lor_tags.tag_name.str.lower().str.replace(r'-', ' ').str.cat(sep=' ') # Remove hyphens
words = nltk.tokenize.word_tokenize(txt)
word_dist = nltk.FreqDist(words)stopwords = nltk.corpus.stopwords.words('english')
words_except_stop_dist = nltk.FreqDist(w for w in words if w not in stopwords)
print('All frequencies, including STOPWORDS:')
print('=' * 60)
lor_rslt = pd.DataFrame(word_dist.most_common(top_N),
columns=['Word', 'Frequency'])
print(lor_rslt)
print('=' * 60)
lor_rslt = pd.DataFrame(words_except_stop_dist.most_common(top_N),
columns=['Word', 'Frequency']).set_index('Word')
matplotlib.style.use('ggplot')lor_rslt.plot.bar(rot=0)
plt.show()
Most frequent words in the Lord of the Rings book tags 指环王书卷中最常用的单词

Since we want diversity and variety, we can take the most frequent words “fantasy” and “fiction” and filter by unlike or different words in the context of book genres. These might be words like non-fiction, economics, or entrepreneurial.

由于我们想要多样性和多样性,因此我们可以采用最常见的词“幻想”和“小说”,并根据书体类型过滤不同或不同的词。 这些可能是非小说,经济学或企业家之类的词。

# Filter by unlike words
lor_recs_filter = lor_recs[(lor_recs['tag_name'].str.contains('non-fiction')) & (lor_recs['tag_name'].str.contains('economics')) & (lor_recs['tag_name'].str.contains('entrepreneurial'))]
lor_recs_filter

This narrows down the list and only include books that contain “non-fiction”, “economics”, or “entrepreneurial” in the book tags. To ensure our reader is recommended a good book, we merge ‘average_rating’ back into the dataset and sort the results by the highest average book rating.

这样可以缩小列表的范围,仅包括在书签标签中包含“非小说”,“经济学”或“企业家”的书籍。 为确保向读者推荐一本好书,我们将“ average_rating”合并回数据集中,并按最高平均书评对结果进行排序。

# Merge recommendations with ratings
lor_recs_filter_merge = pd.merge(books[['title', 'average_rating']], lor_recs_filter, left_on='title', right_on='title', how='inner')# Sort by highest average rating
lor_recs_filter_merge = lor_recs_filter_merge.sort_values(by=['average_rating'], ascending=False)
lor_recs_filter_merge

What appears at the top of the list — Zero to One. We engineered our way into recommending diversity.

列表顶部显示的是“零到一” 。 我们设计了推荐多样性的方法。

There was only one leap of faith, but quite a large one, in linking the relationship between Lord of the Rings and Zero to One. The next step to moving this forward would be in programmatically identifying the relationship between these two books and other books so it becomes reproducible. What is the math and logic that might drive this? The majority of current machine learning and AI recommendation algorithms are based on finding similarities. How can we find diversity instead?

在将《指环王》和《零与一》之间的关系联系起来时,只有一次信念飞跃,但是很大的信念飞跃。 推动这一进展的下一步是以编程方式识别这两本书与其他书籍之间的关系,从而使其具有可复制性。 可能导致这种情况的数学和逻辑是什么? 当前大多数机器学习和AI推荐算法都基于发现相似之处。 我们如何才能找到多样性呢?

Intuitively we can understand that someone interested in fantasy books can also benefit from learning about entrepreneurship. However, our algorithms currently do not provide this nudge. If we can solve this problem and build a better algorithm, not only will this significantly help the customer but also increase a company’s revenue through more sales and satisfied customers. First, we bridge the gap through domain knowledge and expertise for the book industry. Then we move towards applying this across other industries. If you have thoughts on this, let’s chat. This may be a game-changer.

凭直觉我们可以理解,对幻想书感兴趣的人也可以从学习创业中受益。 但是,我们的算法目前不提供此功能。 如果我们能够解决此问题并构建更好的算法,那么这不仅会极大地帮助客户,还会通过增加销售量和满足客户需求来增加公司的收入。 首先,我们通过领域知识和专业知识来弥合图书行业的鸿沟。 然后,我们将其应用于其他行业。 如果您对此有想法,让我们聊天。 这可能会改变游戏规则。

6.设计模型 (6. Design mockups)

How might we design this? We could deploy our algorithm to allocate a certain percentage to the exploration of diversity recommendations, such as a split of 70% similarity and 30% diversity recommendations.

我们该如何设计? 我们可以部署算法为探索多样性建议分配一定百分比,例如将70%的相似性和30%的多样性建议分开。

One potential user gave feedback that they would like to see a “diversity switch”. Let’s mockup this potential design.

一位潜在用户给出了他们希望看到“多样性切换”的反馈。 让我们对这个潜在的设计进行建模。

Design mockup of Amazon.com with a “diversity recommendation” switch 使用“多元化推荐”开关设计Amazon.com样机

Once the customer switches it on, we can keep 3 books as the usual similarity recommendations and the next 2 as our diversity recommendations.

客户打开电源后,我们可以保留3本书作为通常的相似性建议,再保留2本书作为我们的多样性建议。

Switching on diversity 开启多样性

In our product metrics, we would track the number and percentage of times users interact with this switch, increase/decrease in the number of product pages visited, other subsequent user flow behaviors, and conversion rate of purchasing the recommended books or other products. As a potential customer, what are your thoughts on this?

在我们的产品指标中,我们将跟踪用户与此开关互动的次数和百分比,增加/减少访问的产品页面数量,其他随后的用户流量行为以及购买推荐书籍或其他产品的转化率。 作为潜在客户,您对此有何看法?

7.商业价值假设和目标启动 (7. Business value hypothesis and target launch)

What is the potential business value behind this idea? We can start by narrowing down the target customer launch to Amazon.com USA customers that have in the past 12 months: purchased a book, searched for fantasy books, and bought from multiple book categories. Conservative assumptions:

这个想法背后的潜在商业价值是什么? 我们可以从将目标客户范围缩小到过去12个月内在Amazon.com USA的客户范围开始:购买一本书,搜索幻想书籍,以及从多个书籍类别中购买。 保守假设:

USA customers: 112 million
x book buying customers: 25%
x searches for fantasy: 25%
x
= roll out to 1.75 million customersx conversion rate: 10%
= 175,000 customers convertx increase in average annual spend $40= $7 million additional annual revenue

In 2019, the average Amazon customer spend was about $600 per year and Amazon’s annual revenue was $280 billion. This estimate is light in comparison which is good as an initial rollout test. If we increase the scope of the launch we will get a larger potential value. Let’s expand our reach and roll this out to all USA Amazon customer that has purchased a book, with a conservative assumption of 25%:

2019年,亚马逊客户平均每年花费约600美元,亚马逊的年收入为2800亿美元。 与之相比,此估计值比较轻,可以作为初始推出测试。 如果扩大发射范围,我们将获得更大的潜在价值。 让我们扩大影响范围,并将其推广到所有购买了这本书的美国亚马逊客户,保守的假设是25%:

USA customers: 112 million
x book buying customers: 25%
= roll out to 28 million customersx conversion rate: 10%
= 2.8 million customers convertx increase in average annual spend $40= $112 million additional annual revenue

Finally, if we are more aggressive and assume half of Amazon customers can be book buyers, increase the conversion rate, increase the average annual spend uptick, we get into the billions of additional value:

最后,如果我们更加积极进取,并假设一半的亚马逊客户可以成为图书买家,提高转化率,增加平均每年支出增长,那么我们将获得数十亿美元的附加价值:

USA customers: 112 million
x book buying customers: 50%
= roll out to 56 million customersx conversion rate: 20%
= 28 million customers convertx increase in average annual spend $90= $1 billion additional annual revenue

The potential pitfall is that this new recommender system negatively impacts the customer experience and decreases the conversion rate, which becomes a revenue loss. This is why initial customer validation and smaller launch and test plans are a good starting point.

潜在的陷阱是,这种新的推荐系统会对客户体验产生负面影响,并降低转化率,从而导致收入损失。 这就是为什么最初的客户验证以及较小的发布和测试计划是一个好的起点的原因。

The business value upside is significant seeing as how 35% ($98 billion) of Amazon’s revenues were generated through recommendation systems. Even a small percentage improvement in the algorithm would amount to millions and billions of additional revenue.

看到亚马逊的收入中有35%(980亿美元)是通过推荐系统产生的,其商业价值具有重大意义。 即使对算法进行很小的改进,也将带来数百万亿的额外收入。

The end to end data science process of building a recommender system 建立推荐系统的端到端数据科学过程

下一步是什么? (What’s Next?)

This proof of concept illustrates the potential for diversity recommender systems to improve customer experiences, address societal problems, add significant business value, as well as outlines a feasible data science process to improve our technology.

这一概念证明说明了多样性推荐系统改善客户体验,解决社会问题,增加重大业务价值以及概述可行的数据科学过程以改善我们的技术的潜力。

Through better machine learning and AI recommender systems, technology can enable more diversity of thought. Society’s current day ethos says that similarities are good, whereas being diverse can be mixed. Perhaps that is why the majority of recommendation systems research and development today has only focused on finding similarities. If we do not implement change within our algorithms, the status quo will give us more of the same.

通过更好的机器学习和AI推荐系统,技术可以使思想更加多样化。 社会当今的风气说相似性是好的,而多样化则可以混合。 也许这就是为什么当今大多数推荐系统研究和开发只专注于发现相似之处的原因。 如果我们未在算法中实现更改,那么现状将为我们提供更多相同的条件。

Technology to enable a diversity of thought 使思想多样化的技术

Imagine the world where diversity of thought is enabled across all the products that influence us every day. How might that change the world for you, your friends, and people who think differently to you? Machine learning and AI recommendation algorithms can be a powerful change agent. If we continue pursuing this path of diversity, we can positively impact the people and the world around us.

想象世界上,每天影响我们的所有产品都实现了思维多样性。 这将如何改变您,您的朋友以及对您有不同想法的人的世界? 机器学习和AI推荐算法可以成为强大的变革推动者。 如果我们继续走这种多元化的道路,我们可以对周围的人和世界产生积极的影响。

翻译自: https://medium.com/@allenjiang/diversity-recommender-systems-in-machine-learning-and-ai-a56849c5a256

神码ai人工智能写作机器人

你可能感兴趣的:(人工智能,机器学习,python,深度学习,大数据)