论文地址:https://dl.acm.org/doi/pdf/10.1145/3219819.3219885
描述了背景,Airbnb作为短租平台,旨在优化房东(host)和租客(guest)两方的偏好。这个场景的特点是一个用户很少会租同一个房间两次以及一个房间一段时间内只能接待一个用户。
Correspondingly,at Airbnb, a short-term rental marketplace, search and recommendation problems are quite unique, being a two-sided marketplace in which one needs to optimize for host and guest preferences.in a world where a user rarely consumes the same item twice and one listing can accept only one guest for a certain set of dates.
他们开发及部署的房源及用户embedding技术,旨在搜索排序、相似房源推荐两个场景(覆盖99%的转化)实现实时个性化。embedding模型能捕捉到租客长短期的兴趣,从而产生有效的房源推荐。
In this paper we describe Listing and User Embedding techniques we developed and deployed for purposes of Real-time Personalization in Search Ranking and Similar Listing Recommendations, two channels that drive 99% of conversions. The embedding models were specifically tailored for Airbnb marketplace, and are able to capture guest’s short-term and long-term interests, delivering effective home listing recommendations.
Airbnb平台是为房东和租客两方优化搜索结果的,对于指定的租客搜索(query),会根据房源的位置、价格、风格、评论、旅游时间段、天数等对房源进行排序。对于租客可能会拒绝的房源,像有差评、宠物、时间段、容纳人数等因素,排序就会靠后。
In the case of Airbnb, there is a clear need to optimize search results for both hosts and guests, meaning that given an input query with location and trip dates we need to rank high listings whose location, price, style, reviews, etc. are appealing to the guest and, at the same time, are a good match in terms of host preferences for trip duration and lead days. Furthermore, we need to detect listings that would likely reject the guest due to bad reviews, pets, length of stay, group size or any other factor,and rank these listings lower. To achieve this we resort to using Learning to Rank. Specifically,we formulate the problem as pairwise regression with positive utilities for bookings and negative utilities for rejections, which we optimize using a modified version of Lambda Rank [4] model that jointly optimizes ranking for both sides of the marketplace.
继续介绍场景特点,租客在预定(book)之前会点击多个房源,那么在租客的一个搜索会话中,可以充分利用这些点击信号。还有一些负反馈信号,像租客跳过了排在前面的一些房源,说明租客对推荐系统给的结果不满意。
Since guests typically conduct multiple searches before booking, i.e. click on more than one listing and contact more than one host during their search session, we can use these in-session signals, i.e. clicks, host contacts, etc. for Real-time Personalization where the aim is to showto the guest more of the listings similar to the oneswe think they liked since staring the search session. At the same time we can use the negative signal, e.g. skips of high ranked listings, to show to the guest less of the listings similar to the ones we think they did not like. To be able to calculate similarities between listings that guest interacted with and candidate listings that need to be ranked we propose to use listing embeddings, low-dimensional vector representations learned from search sessions. We leverage these similarities to create personalization features for our Search Ranking Model and to power our Similar Listing Recommendations, the two platforms that drive 99% of bookings at Airbnb
除了利用点击信息(可以作为租客短期兴趣),还可以利用预定的信息(可作为租客的长期兴趣),不过租客一年可能就租1-2次,预定行为非常长尾稀疏,Airbnb就把降租客做了分类(user type),按照不同类别的租客粒度来训练embedding。
In addition to Real-time Personalization using immediate user actions, such as clicks, that can be used as proxy signal for shortterm user interest,we introduce another type of embeddings trained on bookings to be able to capture user’s long-term interest. Due to the nature of travel business, where users travel 1-2 times per year on average, bookings are a sparse signal, with a long tail of users with a single booking. To tackle this we propose to train embeddings at a level of user type, instead of a particular user id, where type is determined using many-to-one rule-based mapping that leverages known user attributes. At the same time we learn listing type embeddings in the same vector space as user type embeddings. This enables us to calculate similarities between user type embedding of the user who is conducting a search and listing type embeddings of candidate listings that need to be ranked.
论文认为创新点如下:
实时个性化:之前很多研究都是离线先计算好用户和物料的embedding,然后在线读取计算,他们实现了一种在线计算embedding的方式。
Real-time Personalization - Most of the previous work on personalization and item recommendations using embeddings [8, 11] is deployed to production by forming tables of user-item and item-item recommendations offline, and then reading from them at the time of recommendation.We implemented a solution where embeddings of items that user most recently interacted with are combined in an online manner to calculate similarities to items that need to be ranked
合并搜索的适应性训练(改名叫hard negetive更好):因为用户很少跨区域,经常在一个特定的地区搜索,那么在负采样的时候加入特定区域的负样本,会提升embedding效果。这个其实是Airbnb场景下选择hard negetive的一种方式。
Adapting Training for Congregated Search - Unlike in Web search, the search on travel platforms is often congregated, where users frequently search only within a certain market, e.g. Paris., and rarely across different markets. We adapted the embedding training algorithm to take this into account when doing negative sampling, which lead to capturing better within-market listings similarities.
预定转化信息作为全局上下文,每个点击序列(click session)后面会会跟一个预定(book)
Leveraging Conversions as Global Context - We recognize the importance of click sessions that end up in conversion, in our case booking. When learning listing embeddings we treat the booked listing as global context that is always being predicted as the window moves over the session.
用户聚类embedding,预定行为非常稀疏,为捕捉用户的长期兴趣,将用户进行分组(group of users),相同组的用户embedding相同。
User Type Embeddings - Previous work on training user embeddings to capture their long-term interest [6, 27] train a separate embedding for each user. When target signal is sparse, there is not enough data to train a good embedding representation for each user. Not to mention that storing embeddings for each user to perform online calculations would require lot of memory. For that reason we propose to train embeddings at a level of user type, where groups of users with same type will have the same embedding
将房东的拒绝行为作为显示的负反馈。
Rejections as Explicit Negatives - To reduce recommendations that result in rejections we encode host preference signal in user and listing type embeddings by treating host rejections as explicit negatives during training.
对于用户短期兴趣,用了超过8亿点击会话(click session)来训练embedding。对于长期用户兴趣,通过5千万的用户的预行为来对用户进行分组,通过分组用户来训练embedding。
For short-term interest personalization we trained listing embeddings using more than 800 million search clicks sessions, resulting in high quality listing representations. We used extensive offline and online evaluation on real search traffic which showed that adding embedding features to the ranking model resulted in significant booking gain. In addition to the search ranking algorithm, listing embeddings were successfully tested and launched for similar listing recommendations where they outperformed the existing algorithm click-through rate (CTR) by 20%.
For long-term interest personalization we trained user type and listing type embeddings using sequences of booked listings by 50 million users. Both user and listing type embeddings were learned in the same vector space, such that we can calculate similarities between user type and listing types of listings that need to be ranked. The similarity was used as an additional feature for search ranking model and was also successfully tested and launched.
自然语言处理传统语言建模方案是用高维稀疏向量来表示,逐渐被以神经网络学习到的低维向量表示所取代。神经网络将词的顺序及共线性考虑在内,随着CBOW及SG(skip-gram)的发展,embedding表示在大规模文本上面表现越来越好。
In a number of Natural Language Processing (NLP) applications classic methods for language modeling that represent words as highdimensional, sparse vectors have been replaced by Neural Language models that learn word embeddings, i.e. low-dimensional representations of words, through the use of neural networks [25, 27]. The networks are trained by directly taking into account the word order and their co-occurrence, based on the assumption that words frequently appearing together in the sentences also share more statistical dependence. With the development of highly scalable continuous bag-of-words (CBOW) and skip-gram (SG) language models for word representation learning [17], the embedding models have been shown to obtain state-of-the-art performance on many traditional language tasks after training on large text data.
词向量表示应用到很多领域,用户的行为,像点击、购买、搜索查询、音乐/app/电影推荐等很多场景。此外还可以应用到冷启场景,根据用户或者物理的meta信息(标题、描述等)来构建新用户/物料的embedding。此外还应用在社交网络分析,在网络上面随机游走可以学习到网络图的结构。
More recently, the concept of embeddings has been extended beyond word representations to other applications outside of NLP domain. Researchers from the Web Search, E-commerce and Marketplace domains have quickly realized that just like one can train word embeddings by treating a sequence of words in a sentence as context, same can be done for training embeddings of user actions, e.g. items that were clicked or purchased [11, 18], queries and ads that were clicked [8, 9], by treating sequence of user actions as context. Ever since, we have seen embeddings being leveraged for various types of recommendations on the Web, including music recommendations [26], job search [13], app recommendations [21], movie recommendations [3, 7], etc. Furthermore, it has been shown that items which user interacted with can be leveraged to directly lean user embeddings in the same feature space as item embeddings, such that direct user-item recommendations can be made [6, 10, 11, 24, 27]. Alternative approach, specifically useful for cold-start recommendations, is to still to use text embeddings (e.g. ones publicly available at https://code.google.com/p/word2vec) and leverage item and/or user meta data (e.g. title and description) to compute their embeddings [5, 14, 19, 28]. Finally, similar extensions of embedding approaches have been proposed for Social Network analysis, where random walks on graphs can be used to learn embeddings of nodes in graph structure [12, 20].
N N N个用户的 S S S个点击会话, s = ( l 1 , l 2 , . . . , l M ) ∈ S s=(l_1,l_2,...,l_M) \in S s=(l1,l2,...,lM)∈S表示用户一次会话的点击房源(listing)的序列。两个点击之间超过30分钟认为是2个会话。目标是对于每个房源 l i l_i li学习一个 d d d维向量表示 v l i ∈ R d v_{l_i}\in R^d vli∈Rd,使得相似房源在向量空间中相近。
使用skip-gram模型最大化对数似然函数 L L L
Let us assume we are given a set S of S click sessions obtained from N users, where each session s = (l1, . . . , lM) ∈ S is defined as an uninterrupted sequence of M listing ids that were clicked by the user. A new session is started whenever there is a time gap of more than 30 minutes between two consecutive user clicks. Given this data set, the aim is to learn a d-dimensional real-valued representation vli ∈ Rd of each unique listing li , such that similar listings lie nearby in the embedding space.
More formally, the objective of the model is to learn listing representations using the skip-gram model [17] by maximizing the objective function L over the entire set S of search sessions, defined as follows
L = ∑ s ∈ S ∑ l i ∈ s ( ∑ − m ≤ j ≤ m , i ≠ 0 l o g P ( l i + j ∣ l i ) ) ( 1 ) L = \sum_{s\in S} \sum_{l_i \in s}(\sum_{-m\leq j \leq m,i \neq 0}logP(l_{i+j}|l_i)) \ \ \ \ \ (1) L=s∈S∑li∈s∑(−m≤j≤m,i=0∑logP(li+j∣li)) (1)
式中的概率由softmax计算得出
P ( l i + j ∣ l i ) = e x p ( v l i T v l i + j ′ ) ∑ l = 1 ∣ V ∣ e x p ( v l i T v l ′ ) ( 2 ) P(l_{i+j}|l_i) = \frac { exp(v_{l_i}^Tv_{l_{i+j}}^{'}) } {\sum_{l=1}^{\vert V \vert} exp(v_{l_i}^Tv_{l}^{'}) } \ \ \ \ \ (2) P(li+j∣li)=∑l=1∣V∣exp(vliTvl′)exp(vliTvli+j′) (2)
这里 v l v_l vl和 v l ′ v_l^{'} vl′是房源 l l l的输入向量和输出向量表示,超参数 m m m是滑动窗口长度。由于房源集合大小 ∣ V ∣ \vert V \vert ∣V∣百万量级非常大,直接大规模计算上式不太现实,采用Word2vec的负采样方式来采样一部分进行计算。那么就要产生两个集合,正样本集合和负样本集合。 ( l , c ) ∈ D p (l,c)\in D_p (l,c)∈Dp表示正样本对集合, l l l表示当前房源, c c c表示一个会话中其他在 m m m临近的点击序列。 ( l , c ) ∈ D n (l,c)\in D_n (l,c)∈Dn表示从整个房源集合 V V V中随机负采样得到的样本对。优化函数变成如下
a r g m a x θ ∑ ( l , c ) ∈ D p l o g 1 1 + e − v c ′ v l + ∑ ( l , c ) ∈ D n l o g 1 1 + e v c ′ v l ( 3 ) \underset {\theta} {argmax} \sum_{ (l,c) \in D_p } log \frac {1} {1 + e^{-v_c^{'}v_l}} + \sum_{ (l,c) \in D_n } log \frac {1} {1 + e^{v_c^{'}v_l}} \ \ \ \ \ (3) θargmax(l,c)∈Dp∑log1+e−vc′vl1+(l,c)∈Dn∑log1+evc′vl1 (3)
对会话进行拆分,会话可以拆分成预定会话,即一系列点击之后最后预定了房源;另外一部分称为探索会话,最后没有预定行为。对于预定会话来讲,平台不仅希望优化点击,而且也希望优化预定,因此将所有的预定房源做成一个全局的候选(global context),加在会话后面,依次多加一个约束来优化预定目标。
Booked Listing as Global Context. We can break down the click sessions set S into 1) booked sessions, i.e. click sessions that end with user booking a listing to stay at, and 2) exploratory sessions, i.e. click sessions that do not end with booking, i.e. users were just browsing. Both are useful from the standpoint of capturing contextual similarity, however booked sessions can be used to adapt the optimization such that at each step we predict not only the neighboring clicked listings but the eventually booked listing as well. This adaptation can be achieved by adding booked listing as global context, such that it will always be predicted no matter if it is within the context window or not. Consequently, for booked sessions the embedding update rule becomes
因此预定会话的目标函数变成如下,注意最后不是累加项,是个单项,表示最后一个预定的行为。
a r g m a x θ ∑ ( l , c ) ∈ D p l o g 1 1 + e − v c ′ v l + ∑ ( l , c ) ∈ D n l o g 1 1 + e v c ′ v l + l o g 1 1 + e − v l b ′ v l ( 4 ) \underset {\theta} {argmax} \sum_{ (l,c) \in D_p } log \frac {1} {1 + e^{-v_c^{'}v_l}} + \sum_{ (l,c) \in D_n } log \frac {1} {1 + e^{v_c^{'}v_l}} + log \frac {1} {1 + e^{-v_{l_b}^{'}v_l}} \ \ \ \ \ (4) θargmax(l,c)∈Dp∑log1+e−vc′vl1+(l,c)∈Dn∑log1+evc′vl1+log1+e−vlb′vl1 (4)
对于探索会话来讲,没有预定行为,优化目标还是公式(3)
For exploratory sessions the updates are still conducted by optimizing objective (3).
如下图所示,随着窗口的滑动,预定的房源总是会跟在后面,所以称为全局的context。
因为租客预定的目的基本是在一个地区,很少有跨地区的,而负采样是所有地区的随机采样,那么对于一个给定的房源 l l l,正样本集合 D p D_p Dp中的房源对大概率来自同一地区,而负样本集合 D n D_n Dn大概率来自不同地区,实际发现这样对于embedding的相似度学习是有损的,因此增加一个随机负样本集合 D m n D_{m_n} Dmn,表示来自相同地区的负样本集,这里其实就是Airbnb场景里面增加了hard negetive的一种方式。目标函数变为
Adapting Training for Congregated Search. Users of online travel booking sites typically search only within a single market, i.e. location they want to stay at. As a consequence, there is a high probability that Dp contains listings from the same market. On the other hand, due to random sampling of negatives, it is very likely that Dn contains mostly listings that are not from the same markets as listings in Dp . At each step, for a given central listing l , the positive context mostly consist of listings from the same market as l , while the negative context mostly consists of listings that are not from the same market as l . We found that this imbalance leads to learning sub-optimal within-market similarities. To address this issue we propose to add a set of random negatives Dmn , sampled from the market of the central listing l ,
a r g m a x θ ∑ ( l , c ) ∈ D p l o g 1 1 + e − v c ′ v l + ∑ ( l , c ) ∈ D n l o g 1 1 + e v c ′ v l + l o g 1 1 + e − v l b ′ v l + ∑ ( l , m n ) ∈ D m n l o g 1 1 + e v m n ′ v l ( 5 ) \underset {\theta} {argmax} \sum_{ (l,c) \in D_p } log \frac {1} {1 + e^{-v_c^{'}v_l}} + \sum_{ (l,c) \in D_n } log \frac {1} {1 + e^{v_c^{'}v_l}} + log \frac {1} {1 + e^{-v_{l_b}^{'}v_l}} + \sum_{ (l,m_n) \in D_{m_n} } log \frac {1} {1 + e^{v_{m_n}^{'}v_l}} \ \ \ \ \ (5) θargmax(l,c)∈Dp∑log1+e−vc′vl1+(l,c)∈Dn∑log1+evc′vl1+log1+e−vlb′vl1+(l,mn)∈Dmn∑log1+evmn′vl1 (5)
冷启房源embedding:对于新的房源,因为不在任何点击会话中,因此没有embedding,但是可以根据房源的位置、价格、类型等meta信息,找到3个最相地理位置最近的房源embedding,然后进行pooling计算(取平均值)作为新房源的embedding,这样能覆盖98%的新房源。
Cold start listing embeddings. Every day new listings are created by hosts and made available on Airbnb. At that point these listings do not have an embedding because they were not present in the click sessions S training data. To create embeddings for new listings we propose to utilize existing embeddings of other listings. Upon listing creation the host is required to provide information about the listing, such as location, price, listing type, etc. We use the provided meta-data about the listing to find 3 geographically closest listings (within a 10 miles radius) that have embeddings, are of same listing type as the new listing (e.g. Private Room) and belong to the same price bucket as the new listing (e.g. $20 − $25 per night). Next, we calculate the mean vector using 3 embeddings of identified listings to form the new listing embedding. Using this technique we are able to cover more than 98% of new listings.
房源embedding评估:论文对学习到embedding进行k-means聚类,看不同类是否有地理位置上的相似性,对于California的房源embedding聚类发现,相同位置的房源都聚集在一个类里面(Figure2)。
Examining Listing Embeddings. To evaluate what characteristics of listings were captured by embeddings we examine the d = 32 dimensional embeddings trained using (5) on 800 million click sessions. First, by performing k-means clustering on learned embeddings we evaluate if geographical similarity is encoded. Figure 2, which shows resulting 100 clusters in California, confirms that listings from similar locations are clustered together.We found the clusters very useful for re-evaluating our definitions of travel markets. Next, we evaluate average cosine similarities between Examining Listing Embeddings. To evaluate what characteristics of listings were captured by embeddings we examine the d = 32 dimensional embeddings trained using (5) on 800 million click sessions. First, by performing k-means clustering on learned embeddings we evaluate if geographical similarity is encoded. Figure 2, which shows resulting 100 clusters in California, confirms that listings from similar locations are clustered together.We found the clusters very useful for re-evaluating our definitions of travel markets. Next, we evaluate average cosine similarities between
然后对不同的房源类型、价格区间的房源embedding进行cos相似度计算,发现同一类型房源的相似度更高,不同类型房源的相似度更低(Table1和Tabel2)。
房源的一些特征像价格很容易从房源的meta信息中获取到,但是像建筑结构、风格这些很难从meta中获取到,为评估房源embedding是否学习到了这些抽象的信息,论文评估了房源embedding的k近邻的房源,验证了embedding也学习到了房源的这些抽象信息(Figure4)。
While some listing characteristics, such as price, do not need to be learned because they can be extracted from listing meta-data, other types of listing characteristics, such as architecture, style and feel are much harder to extract in form of listing features. To evaluate if these characteristics are captured by embeddings we can examine k-nearest neighbors of unique architecture listings in the listing embedding space. Figure 3 shows one such case where for a listing of unique architecture on the left, the most similar listings are of the same style and architecture. To be able to conduct fast and easy explorations in the listing embedding space we developed an internal Similarity Exploration Tool shown in Figure 4.
根据房源embedding能发现相同地区、风格、结构等信息,捕捉到了用户短期兴趣,适合实时个性化,但是组合长期的历史行为,比如在不同地区有过预定,这个时候需要给租客推荐相似的房源就需要用户的长期兴趣。
Listing embeddings described in Section 3.1. that were trained using click sessions are very good at finding similarities between listings of the same market. As such, they are suitable for short-term, insession, personalization where the aim is to show to the user listings that are similar to the ones they clicked during the immanent search session. However, in addition to in-session personalization, based on signals that just happened within the same session, it would be useful to personalize search based on signals from user’s longerterm history. For example, given a user who is currently searching for a listing in Los Angeles, and has made past bookings in New York and London, it would be useful to recommend listings that are similar to those previously booked ones.
s b = ( l b 1 , l b 2 , . . . , l b M ) s_b = (l_{b_1},l_{b_2},...,l_{b_M}) sb=(lb1,lb2,...,lbM)表示用户 j j j一系列预定会话(book sessions),直接用这些预定会话,会有一些问题:预定会话规模较小,远小于点击会话;租客角度看,很多用户过去仅预定一次,没法从一个长度序列上学习,房源角度看,学习一个有意义的房源embedding至少需要这个房源出现在预定会话5-10次,但是很多房源的预定次数小于这个数;用户两次预定之间,用户的长期偏好兴趣很可能已经发生了变化,像用户职业发生改变等情况。
While some cross-market similarities are captured in listing embeddings trained using clicks, a more principal way of learning such cross-market similarities would be to learn from sessions constructed of listings that a particular user booked over time. Specifically, let us assume we are given a set Sb of booking sessions obtained from N users, where each booking session sb = (lb1, . . . , lbM) is defined as a sequence of listings booked by user j ordered in time. Attempting to learn embeddings vlid for each listinд_id using this type of data would be challenging in many ways:
First, booking sessions data Sb is much smaller than click sessions data S because bookings are less frequent events.
Second, many users booked only a single listing in the past and we cannot learn from a session of length 1.
Third, to learn a meaningful embedding for any entity from contextual information at least 5 − 10 occurrences of that entity are needed in the data, and there are many listinд_ids on the platform that were booked less than 5 − 10 times.
Finally, long time intervals may pass between two consecutive bookings by the user, and in that time user preferences, such as price point, may change, e.g. due to career change.
解决方式也很简单,就不针对一个房源,而是针对一组房源。像Table3一样,按照房源的属性对房源进行分桶,用listing_type表示分桶后的房源组。
To address these very common marketplace problems in practice, we propose to learn embeddings at a level of listinд_type instead of listinд_id. Given meta-data available for a certain listinд_id such as location, price, listing type, capacity, number of beds, etc., we use a rule-based mapping defined in Table 3 to determine its listinд_type. For example, an Entire Home listing from US that has a 2 person capacity, 1 bed, 1 bedroom & 1 bathroom, with Average Price Per Night of $60.8, Average Price Per Night Per Guest of $29.3, 5 reviews, all 5 stars, and 100% New Guest Accept Rate would map into listinд_type = US_lt1_pn3_pд3_r3_5s4_c2_b1_bd2_bt2_nu3. Buckets are determined in a data-driven manner to maximize for coverage in each listinд_type bucket. The mapping from listinд_id to a listinд_type is a many-to-one mapping, meaning that many listings will map into the same listinд_type.
租客也同样进行分桶,分桶方式(Table 4),用user_type表示租客组。但是租客分桶的依据除了租客本身的属性外,还有很多历史预定预定信息,比如说历史预定次数、价格等信息。这样的话有助于对没有历史预定的租客做冷启。
To account for user ever-changing preferences over time we propose to learn user_type embeddings in the same vector space as listinд_type embeddings. The user_type is determined using a similar procedure we applied to listings, i.e. by leveraging metadata about user and their previous bookings, defined in Table 4. For example, for a user from San Francisco with MacBook laptop, English language settings, full profile with user photo, 83.4% average Guest 5 star rating from hosts, who has made 3 bookings in the past, where the average statistics of booked listings were $52.52 Price Per Night, $31.85 Price Per Night Per Guest, 2.33 Capacity, 8.24 Reviews and 76.1% Listing 5 star rating, the resulting user_type is SF_lд1_dt1_f p1_pp1_nb1_ppn2_ppд3_c2_nr3_l5s3_д5s3. When generating booking sessions for training embeddings we calculate the user_type up to the latest booking. For users who made their first booking user_type is calculated based on the first 5 rows from Table 4 because at the time of booking we had no prior information about past bookings. This is convenient, because learned embeddings for user_types which are based on first 5 rows can be used for coldstart personalization for logged-out users and new users with no past bookings.
房源组和租客组的训练过程如下
N N N个租客预定会话集合 S b S_b Sb, s b = ( u t y p e 1 l t y p e 1 , . . . , u t y p e M l t y p e M ) s_b=(u_{type_1l_{type_1}},...,u_{type_M}l_{type_M}) sb=(utype1ltype1,...,utypeMltypeM)表示 ( u s e r _ t y p e , l i s t i n g _ t y p e ) (user\_type,listing\_type) (user_type,listing_type)预定序列,每个会话都是同一个用户组的行为。
目标函数如下
a r g m a x θ ∑ ( u t , c ) ∈ D b o o k l o g 1 1 + e − v c ′ v u t + ∑ ( u t , c ) ∈ D n e g l o g 1 1 + e v c ′ v u t ( 6 ) \underset {\theta} {argmax} \sum_{ (u_t,c) \in D_{book} } log \frac {1} {1 + e^{-v_c^{'}v_{u_t}}} + \sum_{ (u_t,c) \in D_{neg} } log \frac {1} {1 + e^{v_c^{'}v_{u_t}}} \ \ \ \ \ (6) θargmax(ut,c)∈Dbook∑log1+e−vc′vut1+(ut,c)∈Dneg∑log1+evc′vut1 (6)
D b o o k D_{book} Dbook包含租客组(user_type)预定的房源组(listing_type), D n e g D_{neg} Dneg表示随机抽取的租客组和房源组。
对于房源组同样,
a r g m a x θ ∑ ( l t , c ) ∈ D b o o k l o g 1 1 + e − v c ′ v l t + ∑ ( l t , c ) ∈ D n e g l o g 1 1 + e v c ′ v l t ( 7 ) \underset {\theta} {argmax} \sum_{ (l_t,c) \in D_{book} } log \frac {1} {1 + e^{-v_c^{'}v_{l_t}}} + \sum_{ (l_t,c) \in D_{neg} } log \frac {1} {1 + e^{v_c^{'}v_{l_t}}} \ \ \ \ \ (7) θargmax(lt,c)∈Dbook∑log1+e−vc′vlt1+(lt,c)∈Dneg∑log1+evc′vlt1 (7)
因为用户组的预定会话本身是不同地区的,因此没有必要再对特定地区做负采样操作。
Since booking sessions by definition mostly contain listings from different markets, there is no need to sample additional negatives from same market as the booked listing, like we did in Session 3.1. to account for the congregated search in click sessions.
加入房东拒绝的负反馈信息。对于喜欢打差评或者画像不完整or缺失的租客分组,房东可能会拒绝,将房东拒绝的信息加入到负样本中,作为一个约束条件。其实这是租客组和房源组embedding训练的负样本选择方式。
Explicit Negatives for Rejections. Unlike clicks that only reflect guest-side preferences, bookings reflect host-side preferences as well, as there exists an explicit feedback from the host, in form of accepting guest’s request to book or rejecting guest’s request to book. Some of the reasons for host rejections are bad guest star ratings, incomplete or empty guest profile, no profile picture, etc. These characteristics are part of user_type definition from Table 4. Host rejections can be utilized during training to encode the host preference signal in the embedding space in addition to the guest preference signal. The whole purpose of incorporating the rejection signal is that some listinд_types are less sensitive to user_types with no bookings, incomplete profiles and less than average guest star ratings than others, and we want the embeddings of those listinд_types and user_types to be closer in the vector space, such that recommendations based on embedding similarities would reduce future rejections in addition to maximizing booking chances. We formulate the use of the rejections as explicit negatives in the following manner. In addition to sets Dbook and Dneд, we generate a set Dr e j of pairs (ut , lt ) of user_type or listinд_type that were involved in a rejection event. As depicted in Figure 5b (on the right), we specifically focus on the cases when host rejections (labeled with a minus sign) were followed by a successful booking (labeled with a plus sign) of another listing by the same user. The new optimization objective can then be formulated as
D r e j e c t D_{reject} Dreject表示和房东拒绝相关的 ( u t , l t ) (u_t,l_t) (ut,lt)样本对,
对于租客组,目标函数为
a r g m a x θ ∑ ( u t , c ) ∈ D b o o k l o g 1 1 + e − v c ′ v u t + ∑ ( u t , c ) ∈ D n e g l o g 1 1 + e v c ′ v u t + ∑ ( u t , l t ) ∈ D r e j e c t l o g 1 1 + e v l t ′ v u t ( 8 ) \underset {\theta} {argmax} \sum_{ (u_t,c) \in D_{book} } log \frac {1} {1 + e^{-v_c^{'}v_{u_t}}} + \sum_{ (u_t,c) \in D_{neg} } log \frac {1} {1 + e^{v_c^{'}v_{u_t}}} + \sum_{ (u_t,l_t) \in D_{reject} } log \frac {1} {1 + e^{v_{l_t}^{'}v_{u_t}}} \ \ \ \ \ (8) θargmax(ut,c)∈Dbook∑log1+e−vc′vut1+(ut,c)∈Dneg∑log1+evc′vut1+(ut,lt)∈Dreject∑log1+evlt′vut1 (8)
对于房源组
a r g m a x θ ∑ ( l t , c ) ∈ D b o o k l o g 1 1 + e − v c ′ v l t + ∑ ( l t , c ) ∈ D n e g l o g 1 1 + e v c ′ v l t + ∑ ( u t , l t ) ∈ D r e j e c t l o g 1 1 + e v l t ′ v u t ( 9 ) \underset {\theta} {argmax} \sum_{ (l_t,c) \in D_{book} } log \frac {1} {1 + e^{-v_c^{'}v_{l_t}}} + \sum_{ (l_t,c) \in D_{neg} } log \frac {1} {1 + e^{v_c^{'}v_{l_t}}} + \sum_{ (u_t,l_t) \in D_{reject} } log \frac {1} {1 + e^{v_{l_t}^{'}v_{u_t}}} \ \ \ \ \ (9) θargmax(lt,c)∈Dbook∑log1+e−vc′vlt1+(lt,c)∈Dneg∑log1+evc′vlt1+(ut,lt)∈Dreject∑log1+evlt′vut1 (9)
评估结果,如Table5所示,对于预定高质量、大房间、历史好评多的用户组,相似的房源组能体现出这些特点。
Given learned embeddings for all user_types and listinд_types, we can recommend to the user the most relevant listings based on the cosine similarities between user’s current user_type embedding and listinд_type embeddings of candidate listings. For example, in Table 5 we show cosine similarities between user_type = SF_lд1_dt1_f p1_pp1_nb3_ppn5_ppд5_c4_nr3_l5s3_д5s3 who typically books high quality, spacious listings with lots of good reviews and several different listinд_types in US. It can be observed that listing types that best match these user preferences, i.e. entire home, lots of good reviews, large and above average price, have high cosine similarity, while the ones that do not match user preferences, i.e. ones with less space, lower price and small number of reviews have low cosine similarity.
8亿租客点击会话,每个租客会话按照时间顺序对点击房源进行排列,按照相邻2个点击间隔是否有30分钟进行会话切分,另外去掉小于30秒的点击,保证每个点击会话有2个以上的房源点击。对点击会话进行分类,分为预定会话和探索会话,实验对预定会话进行5X过采样,效果最好。
For training listing embeddingswe created 800 million click sessions from search, by taking all searches from logged-in users, grouping them by user id and ordering clicks on listing ids in time. This was followed by splitting one large ordered list of listing ids into multiple ones based on 30 minute inactivity rule. Next, we removed accidental and short clicks, i.e. clicks for which user stayed on the listing page for less than 30 seconds, and kept only sessions consisting of 2 or more clicks. Finally, the sessions were anonymized by dropping the user id column. As mentioned before, click sessions consist of exploratory sessions &. booked sessions (sequence of clicks that end with booking). In light of offline evaluation results we oversampled booked sessions by 5x in the training data, which resulted in the best performing listing embeddings.
实验发现每天整个重新训练一遍比在现有的基础上面增量不断训练效果更好。
Setting up Daily Training. We learn listing embeddings for 4.5 million Airbnb listings and our training data practicalities and parameters were tuned using offline evaluation techniques presented below. Our training data is updated daily in a sliding window manner over multiple months, by processing the latest day search sessions and adding them to the dataset and discarding the oldest day search sessions from the dataset. We train embeddings for each listinд_id, where we initialize vectors randomly before training (same random seed is used every time). We found that we get better offline performance if we re-train listing embeddings from scratch every day, instead of incrementally continuing training on existing vectors. The day-to-day vector differences do not cause discrepancies in our models because in our applications we use the cosine similarity as the primary signal and not the actual vectors themselves. Even with vector changes over time, the connotations of cosine similarity measure and its ranges do not change.
离线调优:窗口大小为5,维度为32,训练数据迭代10轮
Dimensionality of listing embeddings was set to d = 32, as we found that to be a good trade-off between offline performance and memory needed to store vectors in RAM memory of search machines for purposes of real-time similarity calculations. Context window size was set tom = 5, and we performed 10 iterations over the training data. To implement the congregated search change to the algorithm we modified the original word2vec c code1. Training used MapReduce, where 300 mappers read data and a single reducer trains the model in a multi-threaded manner. End-to-end daily data generation and training pipeline is implemented using Airflow2, which is Airbnb’s open-sourced scheduling platform.
评估采用的是预定的房源在排序模型的平均排名位置,位置越小,说明越好。
论文对比了几组实验
a. 全局随机负采样
b. 全局随机负采样 + 预定房源信息
c. 全局随机负采样 + 预定房源信息 + 相同地区显示负采样
最后一种的排序结果平均位置最低,效果最好。
To be able to make quick decisions regarding different ideas on optimization function, training data construction, hyperparameters, etc, we needed a way to quickly compare different embeddings. One way to evaluate trained embeddings is to test how good they are in recommending listings that user would book, based on the most recent user click. More specifically, let us assume we are given the most recently clicked listing and listing candidates that need to be ranked, which contain the listing that user eventually booked. By calculating cosine similarities between embeddings of clicked listing and candidate listings we can rank the candidates and observe the rank position of the booked listing For purposes of evaluation we use a large number of such search, click and booking events, where rankings were already assigned by our Search Ranking model. In Figure 6 we show results of offline evaluation in which we compared several versions of d = 32 embeddings with regards to how they rank the booked listing based on clicks that precede it. Rankings of booked listing are averaged for each click leading to the booking, going as far back as 17 clicks before the booking to the Last click before the booking. Lower values mean higher ranking. Embedding versions that we compared were 1) d32: trained using (3), 2) d32 book: trained with bookings as global context (4) and 3) d32 book + neg: trained with bookings as global context and explicit negatives from same market (5). It can be observed that Search Ranking model gets better with more clicks as it uses memorization features. It can also be observed that re-ranking listings based on embedding similarity would be useful, especially in early stages of the search funnel. Finally, we can conclude that d32 book + neg outperforms the other two embedding versions. The same type of graphs were used to make decisions regarding hyperparameters, data construction, etc.
Airbnb每个房源下面有相似房源推荐,这些相似房源通过当前房源embedding计算和其他房源embedding的cos相似度,最相似的K个作为推荐结果。
Every Airbnb home listing page3 contains Similar Listings carousel which recommends listings that are similar to it and available for the same set of dates. At the time of our test, the existing algorithm for Similar Listings carousel was calling the main Search Ranking model for the same location as the given listing followed by filtering on availability, price range and listing type of the given listing. We conducted an A/B test where we compared the existing similar listings algorithm to an embedding-based solution, in which similar listings were produced by finding the k-nearest neighbors in listing embedding space. Given learned listing embeddings, similar listings for a given listing l were found by calculating cosine similarity between its vector vl and vectors vj of all listings from the same market that are available for the same set of dates (if check-in and check-out dates are set). The K listings with the highest similarity were retrieved as similar listings. The calculations were performed online and happen in parallel using our sharded architecture, where parts of embeddings are stored on each of the search machines
The A/B test showed that embedding-based solution lead to a 21% increase in Similar Listing carousel CTR (23% in cases when listing page had entered dates and 20% in cases of dateless pages) and 4.9% increase in guests who find the listing they end up booking in Similar Listing carousel. In light of these results we deployed the embedding-based Similar Listings to production
D s = ( x i , y i ) D_s = (x_i,y_i) Ds=(xi,yi)表示搜索训练数据, i = 1 , 2 , . . . , K i=1,2,...,K i=1,2,...,K, K K K表示搜索返回的房源数量, x i x_i xi表示第 i i i特征向量, y i ∈ { 0 , 0.01 , 0.25 , 1 , − 0.4 } y_i \in \{0, 0.01, 0.25, 1, -0.4\} yi∈{0,0.01,0.25,1,−0.4},表示第 i i i个房源的label。值为1表示预定房源行为,值为0.25表示租客联系了房东但是没有预定,值为-0.4表示房东拒绝了租客,值为0.01表示租客仅仅点击了房源,值为0表示仅仅看了未点击。保留最近30天至少有一个预定行为的数据。
Background. To formally describe our Search Ranking Model, let us assume we are given training data about each search Ds = (xi ,yi ), i = 1…K, where K is the number of listings returned by search, xi is a vector containing features of the i-th listing result and yi ∈ {0, 0.01, 0.25, 1, −0.4} is the label assigned to the i-th listing result. To assign the label to a particular listing from the search result we wait for 1 week after search happened to observe the final outcome, which can be yi = 1 if listing was booked, yi = 0.25 if listing host was contacted by the guest but booking did not happen, y = −0.4 if listing host rejected the guest, yi = 0.01 is listing was clicked and yi = 0 if listing was just viewed but not clicked. After that 1 week wait the set Ds is also shortened to keep only search results up to the last result user clicked on Kc ≤ K. Finally, to form data D = ÐN s=1 Ds we only keep Ds sets which contain at least one booking label. Every time we train a new ranking model we use the most recent 30 days of data.
特征向量 x i x_i xi包含房源特征、租客特征、搜索查询特征、交叉特征。
房源特征:每晚价格、房源类型、房间数量、拒绝率等。
租客特征:平均预定价格、评分等。
搜索查询特征:租客数量、住宿时长等。
交叉特征:搜索的房源距离,即搜索的位置和房源的差值;容量合适度,及查询的容量和租客的数量的差值;价格差异度,房源价格和租客历史预定均价的差值;拒绝概率,房东可能拒绝租客的概率;点击百分比,多少用户同时在看同一个房源的实时特征。
总共大约100个特征。
Feature vector xi for the i-th listing result consists of listing features, user features, query features and cross-features. Listing features are features associated with the listing itself, such as price per night, listing type, number of rooms, rejection rate, etc. Query features are features associated with the issued query, such as number of guests, length of stay, lead days, etc. User features are features associated with the user who is conducting the search, such as average booked price, guest rating, etc. Cross-features are features derived from two or more of these feature sources: listing, user, query. Examples of such features are query listing distance: distance between query location and listing location, capacity fit: difference between query number of guests and listing capacity, price difference: difference between listing price and average price of user’s historical bookings, rejection probability: probability that host will reject these query parameters, click percentage: real-time memorization feature that tracks what percentage of user’s clicks were on that particular listing, etc. The model uses approximately 100 features. For conciseness we will not list all of them.a
使用GBDT训练一个回归模型,80%数据作为训练集,20%数据作为测试集。使用NDCG来评估。
Next, we formulate the problem as pairwise regression with search labels as utilities and use data D to train a Gradient Boosting Decision Trees (GBDT) model, using package 4 that was modified to support Lambda Rank. When evaluating different models offline, we use NDCG, a standard ranking metric, on hold-out set of search sessions, i.e. 80% of D for training and 20% for testing.
将房源embedding加入到排序模型中,450万房源的embedding。租客最近2周的历史行为可以分类如下:
H c H_c Hc:租客点击的房源序列
H l c H_{l_c} Hlc :租客长点击的房源序列,长点击定义为停留时长大于60秒
H s H_s Hs :租客跳过的房源序列
H w H_w Hw::租客添加到愿望列表的房源序列,相当于收藏序列
H i H_i Hi:租客联系过当时没有成交的房源序列
H b H_b Hb:租客预定的房源序列
这些序列进一步根据地区是否相同划分成对应子集。
Listing Embedding Features. The first step in adding embedding features to our Search Ranking Model was to load the 4.5 million embeddings into our search backend such that they can be accessed in real-time for feature calculation and model scoring. Next, we introduced several user short-term history sets, that hold user actions from last 2 weeks, which are updated in real-time as new user actions happen. The logic was implemented using using Kafka 5. Specifically, for each user_id we collect and maintain (regularly update) the following sets of listing ids:
(1) H c H_c Hc : clicked listing_ids - listings that user clicked on in last 2 weeks.
(2) H l c H_{l_c} Hlc : long-clicked listing_ids - listing that user clicked and stayed on the listing page for longer than 60 sec.
(3) H s H_s Hs : skipped listing_ids - listings that user skipped in favor of a click on a lower positioned listing
(4) H w H_w Hw: wishlisted listing_ids - listings that user added to a wishlist in last 2 weeks.
(5) H i H_i Hi : inquired listing_ids - listings that user contacted in last 2 weeks but did not book.
(6) H b H_b Hb : booked listing_ids - listings that user booked in last 2 weeks.
We further split each of the short-term history sets H∗ into subsets that contain listings from the same market. For example, if user had clicked on listings from New York and Los Angeles, their set Hc would be further split into Hc (NY) and Hc (LA).
Finally, we define the embedding features which utilize the defined sets and the listing embeddings to produce a score for each candidate listing. The features are summarized in Table 6.
E m b C l i c k S i m EmbClickSim EmbClickSim表示候选房源 l i l_i li与租客的点击相似度特征。对于候选房源 l i l_i li,计算候选房源embedding( v l t v_{l_t} vlt)和用户最近点击集合 H c H_c Hc的相似度,根据不同地区将点击集合 H c H_c Hc拆分,将不同地区的点击集合的embedding求平均作为这个地区的中心房源embedding,然后计算候选房源embedding和各个子集中心房源embedding的cos相似度,取最大值作为点击相似度特征。
E m b C l i c k S i m ( l i , H c ) = m a x m ∈ M c o s ( v l t , ∑ l h ∈ m , l h ∈ H c v l h ) EmbClickSim(l_i,H_c) = \underset {m \in M} {max} cos(v_{l_t},\sum_{l_h \in m,l_h \in H_c}v_{l_h}) EmbClickSim(li,Hc)=m∈Mmaxcos(vlt,lh∈m,lh∈Hc∑vlh)
式中的M表示划分的不同地区子集的个数。
To compute E m b C l i c k S i m EmbClickSim EmbClickSim for candidate listing li we need to compute cosine similarity between its listing embedding v l i v_{l_i} vli and embeddings of listings in H c H_c Hc . We do so by first computing H c H_c Hc market-level centroid embeddings. To illustrate, let us assume H c H_c Hc contains 5 listings from NY and 3 listings from LA. This would entail computing two market-level centroid embeddings, one for NY and one for LA, by averaging embeddings of listing ids from each of the markets. Finally, EmbClickSim is calculated as maximum out of two similarities between listing embedding v l i v_{l_i} vli and H c H_c Hc market-level centroid embeddings. More generally EmbClickSim can be expressed as:
当前房源和租客最近一次长点击的相似度特征,计算方式如下:
E m b L a s t L o n g C l i c k S i m ( l i , H l c ) = c o l ( v l i , v l l a s t ) EmbLastLongClickSim(l_i,H_{l_c}) = col(v_{l_i},v_{l_{last}}) EmbLastLongClickSim(li,Hlc)=col(vli,vllast)
In addition to similarity to all user clicks, we added a feature that measures similarity to the latest long click, E m b L a s t L o n g C l i c k S i m EmbLastLongClickSim EmbLastLongClickSim. For a candidate listing l i l_i li it is calculated by finding the cosine similarity between its embedding v l i v_{l_i} vli and the embedding of the latest long clicked listing llast from H l c H_{l_c} Hlc ,
租客组和房源组相似度特征
U s e r T y p e L i s t i n g T y p e S i m ( u t , l i ) = c o s ( v u t , v l i ) UserTypeListingTypeSim(u_t,l_i) = cos(v_{u_t},v_{l_i}) UserTypeListingTypeSim(ut,li)=cos(vut,vli)
User-type & Listing-type Embedding Features. We follow similar procedure to introduce features based on user type and listing type embeddings. We trained embeddings for 500K user types and 500K listing types using 50 million user booking sessions. Embeddings were d = 32 dimensional and were trained using a sliding window ofm = 5 over booking sessions. The user type and listing type embeddings were loaded to search machines memory, such that we can compute the type similarities online. To compute the U s e r T y p e L i s t i n g T y p e S i m UserTypeListingTypeSim UserTypeListingTypeSim feature for candidate listing l i l_i li we simply look-up its current listing type l t l_t lt as well as current user type u t u_t ut of the user who is conducting the search and calculate cosine similarity between their embeddings
计算这些新加的embedding相关的特征的覆盖度和重要度(Table 7)
All features from Table 6 were logged for 30 days so they could be added to search ranking training set D. The coverage of features, meaning the proportion of D which had particular feature populated, are reported in Table 7. As expected, it can be observed that features based on user clicks and skips have the highest coverage Finally, we trained a new GBDT Search Ranking model with embedding features added. Feature importances for embedding features (ranking among 104 features) are shown in Table 7. Top ranking features are similarity to listings user clicked on (EmbClick- Sim: ranked 5th overall) and similarity to listings user skipped (EmbSkipSim: ranked 8th overall). Five embedding features ranked among the top 20 features. As expected, long-term feature UserType- ListingTypeSim which used all past user bookings ranked better than short-term feature EmbBookSim which takes into account only bookings from last 2 weeks. This also shows that recommendations based on past bookings are better with embeddings that are trained using historical booking sessions instead of click sessions
评估模型是否学习到了这些新加的embedding特征(Figure 7)。使用的是控制变量法,固定其他特征,只变动一种特征。
图1: E m b C l i c k S i m EmbClickSim EmbClickSim值越大,排序分越大
图2: E m b S k i p S i m EmbSkipSim EmbSkipSim值越大,排序分越小
图3: U s e r T y p e L i s t i n g T y p e S i m UserTypeListingTypeSim UserTypeListingTypeSim值越大,排序分越大
To evaluate if the model learned to use the features as we intended, we plot the partial dependency plots for 3 embedding features: EmbClickSim, EmbSkipSim and UserTypeListTypeSim. These plots show what would happen to listing’s ranking score if we fix values of all but a single feature (the one we are examining). On the left subgraph it can be seen that large values of EmbClickSim, which convey that listing is similar to the listings user recently click on, lead to a higher model score. The middle subgraph shows that large values of EmbSkipSim, which indicate that listing is similar to the listings user skipped, lead to a lower model score. Finally, the right subgraph shows that large values of UserTypeListingTypeSim, which indicate that user type is similar to listing type, lead to a higher model score as expected.
在线实验结果显示加入embedding特征,效果提升显著(Table 8)。论文说几个月后加了反转实验,即去掉embedding特征,负向显著,说明加的特征效果很好。
Online Experiment Results Summary. We conducted both offline and online experiments (A/B test). First, we compared two search ranking models trained on the same data with and without embedding features. In Table 8 we summarize the results in terms of DCU (Discounted Cumulative Utility) per each utility (impression, click, rejection and booking) and overall NDCU (Normalized Discounted Cumulative Utility). It can be observed that adding embedding features resulted in 2.27% lift in NDCU, where booking DCU increased by 2.58%, meaning that booked listings were ranked higher in the hold-out set, without any hit on rejections (DCU -0.4 was flat), meaning that rejected listings did not rank any higher than in the model without embedding features.
Observations from Table 8, plus the fact that embedding features ranked high in GBDT feature importances (Table 7) and the finding that features behavior matches what we intuitively expected (Figure 7) was enough to make a decision to proceed to an online experiment. In the online experiment we saw a statistically significant booking gain and embedding features were launched to production. Several months later we conducted a back test in which we attempted to remove the embedding features, and it resulted in negative bookings, which was another indicator that the real-time embedding features are effective.
针对Airbnb的实时个性化搜索排序场景,通过租客的点击和预定会话,提出一种低维的房源和租客向量表示,增加hard 负样本、全局预定connext等创新想法。
We proposed a novel method for real-time personalization in Search Ranking at Airbnb. The method learns low-dimensional representations of home listings and users based on contextual co-occurrence in user click and booking sessions. To better leverage available search contexts, we incorporate concepts such as global context and explicit negative signals into the training procedure. We evaluated the proposed method in Similar Listing Recommendations and Search Ranking. After successful test on live search traffic both embedding applications were deployed to production.
这篇论文里面的hard negetive选择以及诸多细节非常具有参考意义。