论文阅读：Alibaba-Deep Interest Network for CTR Prediction

提纲

解决的问题
方法
启发与疑问

1. 解决的问题

简单点说，本文是为了解决在商品推荐中如何表示用户对不同种类商品的兴趣。举个例子，一位年轻的妈妈浏览过或者购买过很多种类的商品：鞋子、婴儿外套、包包、耳环、T恤等，然而当她购买耳环时，她浏览或者购买过首饰类的商品肯定比其他种类的商品更能反映出她对于耳环的喜好。
如果对于所有的商品，用户都使用相同的特征表示，那将无法表示出用户在不同种类商品的兴趣程度；但是如果对不同种类的商品分别用不同的表示，那表示用户兴趣的向量太多了，这样会导致特征参数比较多。因此本文提出DIN解决对用户在不同种类商品上的兴趣表示。

为了解决上述问题，本文做了以下几个工作：

结合attention机制提出DIN模型
a mini-batch aware regularizer
a data adaptive activation function

2.方法

2.1 特征表示

本文用的最简单的one-hot和multi-hot, 对每个特征组分别进行one-hot/multi-hot的编码。, 其中表示特征组，.

在输入时，对每组特征随机生成一个表示embedding dictionary.

对于multi-hot的特征组，embedding后是a list of vectors, 因此用一层的pooling来得到长度固定的vector。

原文：

it is a common pratice to transform the list of embedding vectors via a pooling layer to get a fixed-length vector.

2.2 DIN

论文阅读：Alibaba-Deep Interest Network for CTR Prediction_第1张图片

由上图看出，待推送的商品和历史的商品通过Activation Unit得到相应的权重，然后将历史商品通过加权求和的方式得到新的用户表示，并且和其他特征通过concat & flatten一起输到下一层。

DIN simulates this process by paying attention to the representation of locally activated interests w.r.t. given ad.
Activation units are applied on the user behavior features, which performs as a weighted sum pooling to adaptively calculate user representationgiven a candidate ad , as shown in:

where is the list of embedding vectors of behaviors of user with length of , is the embedding vector of ad .

本文简短地说明了一下LSTM并没有得到很好的效果，因此该模型中没有使用。在训练过程中，作者介绍了两个技巧：mini-batch aware regularization && data adaptive activation function。

2.3 mini-batch aware regularization

提出该方法是为了解决特征多、参数多导致regularization计算量太大的问题。但是大部分的特征都是稀疏特征，于是展开范式：

where is the embedding vector, denotes if
the instance has the feature id , and denotes the number of
occurrence for feature id in all samples。
(关于这一步的推导没明白？？？当的时候怎么算呢？)

上式又可以变化为：

where denotes the number of mini-batches， denotes the
mini-batch. denote if there is at
least one instance having the feature id in mini-batch 。

因此上式又可以近似于：

我的理解是：因为原特征空间是稀疏的，因此上述的近似方法可行。（但是本文没有给出具体的可行性推导或者说明。）

2.4 data adaptive activation function

本文提出了一种新的adaptive activation function——Dice，主要是为了解决不同的mini-batch会出现分布不同的问题。

其中是个很小的常数，在论文中设为。

3. 启发与疑问

3.1 启发

实验部分主要是不同模型之间的比较，因为就不展示结果了；但是文中的离线评估方法，我觉得值得借鉴。

AUC

where is the number of users， and are the
number of impressions and AUC corresponding to the user.

RelaImpr

RelaImpr表示模型的相对提升

3.2 疑问

特征表示

不同种类的商品可能有不同的feature group，那对于不同的feature group怎样做统一的特征表示呢？有次线下的技术沙龙刚好阿里的做分享，请教过这个问题，她的回答是：会将所有的特征都作为文本处理。

mini-batch aware regularization

关于的近似，除了对推导不明白外，还有这种近似方法的理论依据是什么？

参考资料

Deep Interest Network for Click-Through Rate Prediction