深度学习推荐系统

The aim of this post is to describe how one can leverage a deep learning framework to create a hybrid recommender system i.e. a model exploiting both content and collaborative-filter data. The idea is to tackle issues in two different steps: first collaborative filtering and content based model separately, then a combination of the two, to get better results.

吨他的目标是这个职位的是描述一个如何利用一个深刻的学习框架，即模型开发内容和协同过滤的数据创建一个混合推荐系统。这个想法是通过两个不同的步骤来解决问题的：首先分别进行协作过滤和基于内容的模型，然后将两者结合起来以获得更好的结果。

介绍 (Introduction)

Before diving into recommender system, we have been recommended to introduce ourself.

在进入推荐系统之前，建议我们先介绍一下自己。

We are a team of four good friends with different coding backgrounds, ranging from very limited to advanced coding experience. We all come from Physics studies (different specialisations, though), and we work in different areas. By creating DeepBenBello.ai, we made a little dream come true. We wanted to have something to work on together, to start engaging with tech applications and to create something beautiful. Hence this story is the first of a long series, we hope. Stay tuned!

我们是由四个不同语言背景的好朋友组成的团队，从非常有限的经验到高级的编码经验。我们都来自物理学研究(尽管专业各不相同)，并且我们在不同的领域工作。通过创建DeepBenBello.ai ，我们实现了一个小梦想。我们希望可以共同努力，开始与技术应用互动，并创造美好的事物。因此，我们希望这个故事是漫长系列的第一个故事。敬请关注！

Back to the recommender system, the goal is to design and develop an hybrid algorithm based on Bayesian Personalised Ranking triplet loss and implementing such a system in Keras.

回到推荐系统，目标是设计和开发基于贝叶斯个性化排名三元组损失的混合算法在Keras实施这样的系统。

存在哪些类型的推荐系统？ (What Types of Recommender Systems Exist?)

As a quick introduction, let’s try to briefly describe the various types of recommender system. We can split recommender systems in two classes:

作为快速介绍，让我们尝试简要介绍各种类型的推荐系统。我们可以将推荐系统分为两类：

Collaborative filtering
协同过滤
Content based
基于内容

Collaborative filtering assumes that users with similar tastes in the past will have similar tastes in the future. Netflix uses a variant of CF to recommend movies and shows you might like based on similar users’ tastes in similar movies. Content-based filtering assumes that a user will like items in the future that share features — like brand, cast, genre, etc. — with items they liked in the past. Amazon uses a variant of CBF to recommend books you might want to read.

协作过滤假定过去具有相似口味的用户将来会具有相似口味。 Netflix使用CF的一种变体来推荐电影，并根据类似用户在类似电影中的喜好来显示您喜欢的电影。 基于内容的过滤假设用户将来会喜欢与过去喜欢的商品共享特征(例如品牌，演员，流派等)的商品。亚马逊使用CBF的变体来推荐您可能想要阅读的书籍。

编码和嵌入 (Encoding and Embeddings)

Often data are not as simple as we would like them to be. To put them in a format an algorithm can digest (that is numbers) we can make use of different tools.

通常，数据并不像我们希望的那样简单。为了使它们具有某种算法可以消化的格式(即数字 )，我们可以使用不同的工具。

Both encoding and embedding map categorical data in numerical vectors. The difference between the two is the fact that an encoder (like one-hot encoder) is a predetermined function associating a vector to each data row, while embedding vectors are low dimensional and learned. A neural network learns how to locate objects in an embedding space, placing similar entities close to each other.

编码和嵌入都将分类数据映射到数值向量中。两者之间的区别在于，编码器(如单热编码器 )是将向量与每个数据行相关联的预定函数，而嵌入向量的维数较低并且是可学习的。神经网络学习如何在嵌入空间中定位对象，如何将相似的实体彼此靠近放置。

To summarise, an encoding maps a categorical feature in a m-dimensional vector, where m is the number of categories of the feature. An embedding maps categorical features in an n-dimensional vector, where n is an hyper-parameter of the model.

概括地说，编码将分类特征映射到m维向量中，其中m是特征的类别数。嵌入将分类特征映射到n维向量中，其中n是模型的超参数。

A more detailed discussion about embedding can be found in the excellent post below.

有关嵌入的更详细的讨论可以在下面的优秀文章中找到。

The reason we prefer to use embeddings here is because it does not really make sense to treat each categorical value as being completely different from one another (this is one-hot-encoding). In our working case, we aim to build an hybrid recommender system to propose movies to users. Imagine you want to take into account user age range to get rid which movies they are more likely to appreciate. Is it correct to consider “equally different” a 29-years old user from a 31-years old and from a 62-years old? Embeddings will solve this issue. Indeed, we can make use of this technique to “learn” the relationships and inner connections between each possible value and our target variable.

我们之所以在这里使用嵌入的原因是因为将每个分类值彼此完全不同实际上是没有意义的(这是一种热编码 )。在我们的工作案例中，我们旨在建立一个混合推荐系统，向用户推荐电影。假设您要考虑用户年龄段，以摆脱他们更可能欣赏的电影。考虑将29岁的用户与31岁的用户和62岁的用户“完全不同”是正确的吗？嵌入将解决此问题。确实，我们可以利用这种技术来“学习”每个可能值与目标变量之间的关系和内部联系。

Embeddings are based on a training of a Neural Network with the categorical data, in order to retrieve weights from the Embedding layer. This allows us to have a more significant input when compared to a single One-Hot-Encoding approach, furthermore, we introduce a metric space — that is the Embedding space — in where similar entities are close.

嵌入是基于对具有分类数据的神经网络的训练，以便从嵌入层检索权重。与单一的“单热编码”方法相比，这使我们可以有更重要的输入，此外，我们引入了度量空间 - 嵌入空间 —在相似实体附近的位置。

Finally, let’s have a look at our data.

最后，让我们看一下我们的数据。

数据集 (The dataset)

For this application we use the notorious movielens dataset. We have in mind to build an hybrid recommender system, taking advantage from both users and movies features and from user/movie ratings. These data are stored in three different dataset, users.dat , movies.dat and ratings.dat.

对于此应用程序，我们使用臭名昭著的movielens数据集 。我们打算构建一个混合推荐系统，同时利用用户和电影功能以及用户/电影评级。这些数据存储在三个不同的数据集中， users.dat ， movies.dat和ratings.dat 。

The readme of the dataset contains a quite exhaustive explanation of the files. User information contains gender, occupation, age range and zip-code (note: we drop the latter as it is plenty of zip-codes referring to just one user), while movie dataset contains (further than movie id) title and genre. Ratings are what will drive the supervised training.

数据集的自述文件对文件进行了详尽的解释。用户信息包含性别，职业，年龄范围和邮政编码(请注意：我们将后者删除，因为其中有很多邮政编码仅涉及一个用户)，而电影数据集包含(而非电影ID)标题和体裁。评分是推动监督培训的因素。

Ratings dataframe can be reshaped in a (sparse) matrix as follows

评级数据框可以按如下所示在(稀疏)矩阵中重塑

df_matrix = df_rating.pivot(index='UserId', columns='MovieId', values='Rating')

As one can see the matrix is sparse, as the majority of couples user-movie is not rated. 可以看到，矩阵是稀疏的，因为大多数情侣电影没有评级。

Here how we are going to use the information stored in this matrix. Our training set will be made of triplets [user, liked movie, not liked movie]. This takes as inspiration a neural network architecture for image recognition, called Siamese Architecture. According to Wikipedia, Siamese Neural Network is defined as

在这里，我们将如何使用存储在此矩阵中的信息。我们的训练集将由三元组 [用户，喜欢的电影，不喜欢的电影]组成 。这启发了一种用于图像识别的神经网络架构，称为Siamese Architecture 。根据维基百科，暹罗神经网络被定义为

Siamese neural network is an artificial neural network that use the same weights while working in tandem on two different input vectors to compute comparable output vectors. Often one of the output vectors is precomputed, thus forming a baseline against which the other output vector is compared. This is similar to comparing fingerprints or more technical as a distance function for Locality-sensitive hashing.

暹罗神经网络 是一种 人工神经网络 ，它在两个不同的输入向量上串联工作时使用相同的权重，以计算可比较的输出向量。 通常，输出向量之一被预先计算，从而形成基线，另一个输出向量与该基线进行比较。 这类似于比较 指纹 或将更多技术用作距离 敏感哈希 的距离函数 。

This is often used in conjunction with Triplet loss, again according to Wikipedia:

再次根据Wikipedia的说法，这通常与Triplet loss结合使用：

Triplet loss is a loss function for artificial neural networks where a baseline (anchor) input is compared to a positive (truthy) input and a negative (falsy) input. The distance from the baseline (anchor) input to the positive (truthy) input is minimized, and the distance from the baseline (anchor) input to the negative (falsy) input is maximized.

三重损失 是 人工神经网络 的 损失函数 ， 其中将基线(锚定)输入与正(真实)输入和负(虚假)输入进行比较。 从基线(锚)输入到正(真实)输入的距离最小，并且从基线(锚)输入到负(虚假)输入的距离最大。

It is often used for learning similarity of for the purpose of learning embeddings, like word embeddings and even thought vectors, and metric learning.

它通常用于 学习相似性 ，目的是学习嵌入，例如 单词嵌入 ，甚至是 思维向量 ，以及 度量学习 。

In triplet loss basically, we have a triplet as input — (anchor, positive, negative). Anchor can be any image of the person, positive is some other image of the same person and negative is an image of different person. The loss function can be described as:

基本上，在三元组损失中，我们有一个三元组作为输入-( 锚，正，负 )。锚可以是该人的任何图像，正数是同一人的其他图像，负数是不同人的图像。损失函数可描述为：

where (a,p) is the distance between anchor image and positive image. Similarly, (a,n) is the distance between anchor and negative image.

(a，p)是锚点图像和正像之间的距离。同样， (a，n)是锚点和负像之间的距离。

In the training phase we try to minimise such a loss.

在培训阶段，我们将尽量减少这种损失。

A recommender system can be seen in the same way. We have an anchor user, and two recommended items, positive and negative, meaning the recommendation should associate the user to the positive item and penalise the negative one.

可以以相同方式查看推荐系统。我们有一个固定用户，有两个推荐项， 肯定的和否定的 ，这意味着推荐应将用户与肯定的项相关联，并对否定的项进行惩罚。

贝叶斯个性化排名 (Bayesian Personalised Ranking)

As one may notice, the main difference between our recommender system and the image recognition task is the fact that user and items are different things. We need a function to optimise, as the triplet loss one, such that in the training phase we push up the probability of recommending the positive samples and push down the probability of the negative sample by adjusting model parameters.

可能已经注意到，我们的推荐系统和图像识别任务之间的主要区别在于用户和项目是不同的事实。我们需要一个优化函数，如三元组损失，以便在训练阶段通过调整模型参数来提高推荐正样本的可能性，并降低推荐负样本的可能性。

Bayesian Personalized Ranking criterion comes to help. The function can be expressed like

贝叶斯个性化排名标准对您有所帮助。函数可以表示为

where denotes the sigmoid function while user, positive item and negative item are represented by their embedding vectors.

其中表示S形函数 而用户，正项和负项则由其嵌入向量表示。

A poorly drawn slide, showing how minimising BPR Triplet loss function is equivalent to maximising the distance between (Anchor, Positive item) and (Anchor, Negative item). 一张画得不好的幻灯片，显示了最小化BPR三重态损失函数如何等效于最大化 (锚点，正项)和(锚点，负项)之间的 距离。

It is noteworthy that we are now able — in a quite simple way — to leverage user and item features but also to take into account previously expressed feedbacks. This is precisely an hybrid recommender system. I think this may express clearly the power of the embedding tool.

值得注意的是，我们现在已经能够以一种非常简单的方式来利用用户和商品的功能，同时也能够考虑先前表达的反馈。这恰好是混合推荐系统。我认为这可以清楚地表达嵌入工具的强大功能。

该模型 (The model)

Let’s put all of this in a neural network model. We are going to use Keras functional API.

让我们将所有这些都放在神经网络模型中。我们将使用Keras功能API。

First we need to create a network to be trained. Keras does not have a triplet loss layer, so we have to define a custom one.

首先，我们需要创建一个要训练的网络。 Keras没有三重态损耗层，因此我们必须定义一个自定义层。

class TripletLossLayer(Layer):
    """
        Layer object to minimise the triplet loss.
        Here we implement the Bayesian Personal Ranking triplet loss.
    """
    def __init__(self, **kwargs):
        super(TripletLossLayer, self).__init__(**kwargs)
    
    def bpr_triplet_loss(self, inputs):
        """
            Bayesian Personal Ranking triplet loss.
            We actually use log-loss for numerical purposes.
        """
        anchor, positive, negative = inputs
        p_dist = K.sum(anchor*positive, axis=-1, keepdims=True)
        n_dist = K.sum(anchor*negative, axis=-1, keepdims=True)
        return K.log(1.0 - K.sigmoid(p_dist - n_dist))
    
    def call(self, inputs):
        loss = self.bpr_triplet_loss(inputs)
        self.add_loss(loss)
        return loss

Hence we can define the layers adapted to receive user and movie vectors. To do so we need to define an Embedding layer for users and one for items. After that we are ready to define the model. This is done by a function

因此，我们可以定义适合于接收用户和电影矢量的图层。为此，我们需要为用户定义一个嵌入层，为项目定义一个嵌入层。之后，我们准备定义模型。这是通过一个函数来完成的

def build_model(n_users, n_items, emb_dim = 30):
    '''
        Define the Keras Model for training 
        
        Parameters
        ----------
        
            n_users : int
                        number of users
            
            n_items : int
                        number of items
                        
            user_features : list of str
                                list of categorical features (columns of df_users)
            
            item_features : list of str
                                list of categorical features (columns of df_items)
            
            emb_dim : int
                        dimension of the embedding space

    '''
    n_user_features = 3
    n_item_features = 18


    ### Input Layers


    user_input = Input((n_user_features,), name='user_input')
    positive_item_input = Input((n_item_features,), name='pos_item_input')
    negative_item_input = Input((n_item_features,), name='neg_item_input')


    inputs = [user_input, positive_item_input, negative_item_input]


    ### Embedding Layers


    user_emb = Embedding(n_users, emb_dim, input_length=n_user_features, name='user_emb')
    # Positive and negative items will share the same embedding
    item_emb = Embedding(n_items, emb_dim, input_length=n_item_features, name='item_emb')
    # Layer to convert embedding vectors in the same dimensional vectors
    vec_conv32 = Dense(32, name = 'dense_vec32', activation = 'relu')
    vec_conv = Dense(emb_dim, name = 'dense_vec', activation = 'softmax')
    


    # Anchor
    a = Flatten(name = 'flatten_usr_emb')(user_emb(user_input))
    a = Dense(emb_dim, name = 'dense_user', activation = 'softmax')(a)
    
    # Positive
    p = Flatten(name = 'flatten_pos_emb')(item_emb(positive_item_input))
    p = vec_conv32(p)
    p = vec_conv(p)


    # Negative
    n = Flatten(name = 'flatten_neg_emb')(item_emb(negative_item_input))
    n = vec_conv32(n)
    n = vec_conv(n)


    # Score layers
    p_rec_score = ScoreLayer(name='pos_recommendation_score')([a, p])
    n_rec_score = ScoreLayer(name='neg_recommendation_score')([a, n])
    
    # TripletLoss Layer
    loss_layer = TripletLossLayer(name='triplet_loss_layer')([a, p, n])
    
    # Connect the inputs with the outputs
    network_train = Model(inputs=inputs, outputs=loss_layer, name = 'training_model')


    network_predict = Model(inputs=inputs[:-1], outputs=p_rec_score, name = 'inference_model')
    
    # return the model
    return network_train, network_predict

This gives the model represented in the figure below

这给出了下图所示的模型

Note how item input layers share the same weights. 注意项目输入图层如何共享相同的权重。

一些评论 (Some comments)

At this stage, It is good to take a breath and have a look at the code defining the model above.

在这个阶段，最好是喘口气，看看定义上面模型的代码。

We made use of Keras functional API to build the network. Note how we defined layers before applying to other layer objects. This allows us to have weights shared by positive and negative items.

我们利用Keras功能API来构建网络。请注意我们在应用于其他图层对象之前如何定义图层。这使我们能够权衡正负项目的权重。

A further noteworthy aspect of the function above is that it returns actually two models: one to train, the other to predict. This because, as mentioned above, BPR Triplet Loss is not something already implemented in Keras. To be sure the two returned models are actually the same except for the last layer, we can print the `layers` attribute.

上面函数的另一个值得注意的方面是它实际上返回了两个模型：一个用于训练，另一个用于预测。这是因为，如上所述，BPR三重损失尚未在Keras中实现。为了确保除了最后一层外，两个返回的模型实际上是相同的，我们可以打印出“ layers”属性。

As one may notice, the shared layers have actually the same memory address. 可能已经注意到，共享层实际上具有相同的内存地址。

This is important because we need to train our network for one-shot learning meaning, it has to correctly predict preferences even for a never-seen user or an unknown item.

这一点很重要，因为我们需要训练我们的网络以获得一键式学习的意义，即使对于从未见过的用户或未知物品，它也必须正确预测偏好。

训练模型 (Training the model)

We are now ready to pass to the train part of our recommender system.

现在，我们准备好进入推荐系统的火车部分。

建设培训三胞胎 (Construction of training triplets)

Before going and train this model we need to build the batches. Recall our training set is made by triplets (user, positive movie, negative movie).

在开始训练该模型之前，我们需要构建批次。回想一下，我们的训练集是由三元组( 用户，正片，负片 )组成的。

As one can see from the picture above, the model expects a list of three arrays as input. Corresponding to the feature vectors of users and items.

从上图可以看出，该模型期望将三个数组的列表作为输入。对应于用户和项目的特征向量。

Hence, the batch composer function will take information from the users dataset, from movies dataset and from the ratings matrix to compose triplets.

因此，批处理作曲者功能将从用户数据集，电影数据集和评分矩阵中获取信息，以构成三元组。

Thus, the function will return a list containing 3 batch_size-long arrays, representing the triplets. In other words we have a list of objects

因此，该函数将返回一个包含3个batch_size -long数组的列表，这些数组代表三元组。换句话说，我们有一个对象列表

where A is an array of shape (batch_size, n_user_features), P and N are both arrays of shape (batch_size, n_item_features). Each row of the array corresponds to a feature vector. Indeed the triplet referring to the first user (coupled to a movie evaluated positively and to a one negatively ranked) is

其中A是形状数组(batch_size，n_user_features) ， P和N都是形状数组(batch_size，n_item_features) 。数组的每一行对应一个特征向量。确实，指的是第一个用户的三元组(耦合到正面评价的电影和负面评价的电影)

In order to get to this result, we defined a function.

为了得到这个结果，我们定义了一个函数。

将评分矩阵转换为三元组 (Translating ratings matrix into triplets)

The function randomly select a user and among its positive/negative ratings randomly picks two to compose the triplet.

该功能随机选择一个用户，并在其正面/负面评分中随机选择两个以构成三元组。

def get_triplets_hard(batch_size, X_usr, X_item, df, return_cache = False):
    """
        Returns the list of three arrays to feed the model.
        
        Parameters
        ----------
            batch_size : int
                            size of the batch.
            
            X_usr : numpy array of shape (n_users, n_user_features)
                            array of user metadata.
            
            X_item : numpy array of shape (n_items, n_item_features)
                            array of item metadata.
            
            df : Pandas DataFrame
                    dataframe containing user-item ratings.
            
            return_cache : bool
                            parameter to triggere whether we want the list of ids corresponding to 
                            triplets.
                            default: False
                        
        Returns
        -------
            triplets : list of numpy arrays
                        list containing 3 tensors A,P,N corresponding to:
                            - Anchor A : (batch_size, n_user_features)
                            - Positive P : (batch_size, n_item_features)
                            - Negative N : (batch_size, n_item_features)
    """
    # constant values
    n_user_features = X_usr.shape[1]
    n_item_features = X_item.shape[1]


    # define user_list
    user_list = list(df.index.values)


    # initialise result
    triplets = [np.zeros((batch_size, n_user_features)), # anchor
                np.zeros((batch_size, n_item_features)), # pos
                np.zeros((batch_size, n_item_features))  # neg
                ]
    user_ids = []
    p_ids = []
    n_ids = []
    
    for i in range(batch_size):
        # pick one random user for anchor
        anchor_id = random.choice(user_list)
        user_ids.append(anchor_id) 
        
        # all possible positive/negative samples for selected anchor
        p_item_ids = get_pos(df, anchor_id)
        n_item_ids = get_neg(df, anchor_id)
        
        # pick one of the positve ids
        try:
            positive_id = random.choice(p_item_ids)
        except IndexError:
            positive_id = 0


        p_ids.append(positive_id)
        
        # pick the most similar negative id
        try:
            n_min = np.argmin([(cosine_dist(X_item[positive_id-1], X_item[k-1])) for k in n_item_ids])
            negative_id = n_item_ids[n_min]
        except:
            try:
                negative_id = random.choice(n_item_ids)
            except IndexError:
                negative_id = 0
            
        n_ids.append(negative_id)
        
        # define triplet
        triplets[0][i,:] = X_usr[anchor_id-1][:]
        
        if positive_id == 0:
            triplets[1][i,:] = np.zeros((n_item_features,))
        else:
            triplets[1][i,:] = X_item[positive_id-1][:]
        
        if negative_id == 0:
            triplets[2][i,:] = np.zeros((n_item_features,))
        else:
            triplets[2][i,:] = X_item[negative_id-1][:]


    if return_cache:
        cache = {'users': user_ids, 'positive': p_ids, 'negative': n_ids}
        return triplets, cache
    
    return triplets

训练 (Training)

Having obtained the training set i.e. a bunch of triplets (user feature vec, liked movie feature vec, not liked movie feature vec), we can train our model on it.

获得训练集，即一堆三元组(用户功能vec，喜欢电影功能vec，不喜欢电影功能vec)，我们可以在上面训练我们的模型。

Once triplets have been composed we can train our model. We have chosen a batch size of 32 and a 10000 iterations.

一旦三胞胎组成，我们就可以训练我们的模型。我们选择的批次大小为32，迭代次数为10000。

print("Starting training process!")
print("-------------------------------------")
t_start = time.time()


for i in range(1, n_iter+1):
    triplets = get_triplets_hard(batch_size, X_usr, X_item, df_matrix)
    loss = network_train.train_on_batch(triplets, None)
    n_iteration += 1
    if i % evaluate_every == 0:
        print("\n ------------- \n")
        print("[{3}] Time for {0} iterations: {1:.1f} mins, Train Loss: {2}".format(i, (time.time()-t_start)/60.0,loss,n_iteration))

As one might expect, the triplet loss is close to 1/2 at the beginning (its log is -0.69), going down during the training to a value approximately equal to 0.1.

正如人们可能期望的那样，三元组损失在开始时接近1/2(其对数为-0.69)，在训练过程中下降到大约等于0.1的值。

评估和指标 (Evaluation and metrics)

Now the question everyone working in the field has to wonder: How to evaluate our results?

现在，每个在该领域工作的人都不得不怀疑： 如何评估我们的结果 ？

In our situation, we created a model producing embeddings (i.e. vectors) that we can use to compute distances. In other words, if an item i is likely to be appreciated by an user u, their distance will be small, otherwise they will be separated by a great distance in the embedding space.

在我们的情况下，我们创建了一个生成嵌入( 即向量)的模型，可用于计算距离。换句话说，如果项目i可能被用户u欣赏，则它们的距离将很小，否则它们将在嵌入空间中被较大的距离分开。

Thus, we need a threshold: if the found distance is under the threshold then we recommend the item to the user, if the distance is above the threshold then that item is not recommended. We have to choose this threshold carefully. This is a typical ROC curve problem. For this reason, one metric for evaluation could be Area Under the Curve (AUC).

因此，我们需要一个阈值：如果发现的距离在阈值以下，则我们向用户推荐该项目；如果距离在阈值以上，则不建议该项目。我们必须谨慎选择此阈值。这是一个典型的ROC曲线问题。因此，一个评估指标可以是曲线下面积(AUC)。

The AUC after the training of the model. 模型训练后的AUC。

预测：提出实际建议 (Predictions: Making actual recommendations)

The aim of a recommender system is precisely to recommend (likely appreciated) items to users. Thus, given a user and a list of items, we can sort the list by “preference score”.

推荐系统的目的恰恰是向用户推荐(可能是赞赏的)商品。因此，给定用户和项目列表，我们可以按“偏好得分”对列表进行排序。

结论 (Conclusion)

This was just a very simple application. The model was not optimised at all, but I think one can build upon this to achieve interesting results. Especially a more careful hyperparameter tuning and a better choice of the Neural Network architecture would improve performances a lot. It could be interesting to see whether with a model to get tuned hyper-parameters we can obtain much better performance.

这只是一个非常简单的应用程序。该模型根本没有优化，但是我认为可以在此基础上获得有趣的结果。特别是更仔细的超参数调整和神经网络体系结构的更好选择将大大提高性能。看看是否有一个模型可以得到调整的超参数，我们可以获得更好的性能，这可能会很有趣。

Another improvement one can do is to work on the input layers. Data cleaning and maybe enriching considering (for example) geographical data would be beneficial.

一个可以做的改进是在输入层上工作。考虑到(例如)地理数据的数据清理和充实可能将是有益的。

Still, I think this can be a good example of creating a not-so-trivial neural network model to be applied to a concrete problem. It shows quite well the power of Embedding layers.

不过，我认为这可能是创建可应用于具体问题的神经网络模型的一个很好的例子。它显示了嵌入层的强大功能。

It can also work as an example of how nowadays, Deep Learning frameworks really do abstract a lot of the heavy technical stuff for us!

它也可以作为一个示例，说明如今，深度学习框架确实为我们提供了很多繁重的技术知识！

The full code on which this post is based can be found on my GitHub.

这篇文章所基于的完整代码可以在我的GitHub上找到。

致谢 (Acknowledgements)

I would like to thank my dear friend and exceptionally talented programmer @alessandro.angioi, for code revision and fruitful discussions.

我要感谢亲爱的朋友和非常有才华的程序员@ alessandro.angioi ，他们对代码进行了修订并进行了富有成果的讨论。

翻译自: https://medium.com/deep-recommender-system/a-deep-recommender-system-e2b765d27350