Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction
目录
本文主要工作
相关工作
本文的模型
实验
(1)Our main contributions are listed as follows:
• Inspired by the success of SENET in the computer vision field, we use the SENET mechanism to learn the weights of features dynamically.【SENET机制,动态学习特征的权重】
• We introduce three types of Bilinear-Interaction layer to learn feature interactions in a fine-grained way. This is also in contrast to the previous work[6, 9, 10, 19, 20, 23], which calculates the feature interactions with Hadamard product or inner product.【3种方式学习特征的交互】
• Combining SENET mechanism with bilinear feature interaction, our shallow model achieves state-of-the-art among the shallow models such as FFM on Criteo and Avazu datasets.【SENET机制和双线性特征交互,形成浅模型。和FFM对比试验】
• For further performance gains, we combine a classical deep neural network(DNN) component with the shallow model to be a deep model. The deep FiBiNET consistently outperforms the other state-of-the-art deep models on Criteo and Avazu datasets.【加DNN,形成深度模型。和其他深度模型对比试验】
(2)浅模型:FM
Factorization machine(FM)[19, 20] and field-aware factorization machine (FFM)[9, 10] are two of the most successful CTR mod- els. FM models all feature interactions between variables using factorized parameters. It has a low time complexity and memory storage, and it works well on large sparse data.
(3)浅模型:FFM
FFM introduced field aware latent vectors and won two competitions hosted by Criteo and Avazu[9]. However, FFM was restricted by the need of large memory and cannot easily be used in Internet companies.
(4)深度模型:FNN
Factorization-Machine Supported Neural Networks (FNN)[25] is a forward neural network using FM to pre-train the embedding layer. However, FNN can capture only high-order feature interac- tions.
(5)深度模型:Wide & Deep
Wide & Deep model(WDL)[1] was initially introduced for application recommendation in google play. WDL jointly trains wide linear models and deep neural networks to combine the bene- fits of memorization and generalization for recommender systems. However, expertise feature engineering is still needed on the in- put to the wide part of WDL, which means that the cross-product transformation also requires to be manually designed.
(6)深度模型:DeepFM
To allevi- ate manual efforts in feature engineering, DeepFM[4] replaces the wide part of WDL with FM and shares the feature embedding be- tween the FM and deep component. DeepFM is regarded as one state-of-the-art model in CTR estimation field.
(7)DCN
Deep & Cross Network (DCN)[22] efficiently captures feature interactions of bounded degrees in an explicit fashion.
(8) xDeepFM
Similarly, eXtreme Deep Factorization Machine (xDeepFM)[15] also models the low-order and high-order feature interactions in an explicit way by proposing a novel Compressed Interaction Network (CIN) part.
(9)AFM
As [23] mentioned, FM can be hindered by its modeling of all feature interactions with the same weight, as not all feature inter- actions are equally useful and predictive. And they propose the Attentional Factorization Machines(AFM)[23] model, which uses an attention network to learn the weights of feature interactions.
(10)DIN
Deep Interest Network (DIN)[26] represents users’ diverse interests with an interest distribution and designs an attention-like network structure to locally activate the related interests according to the candidate ad.
(11)SENET Module 【重点】
Hu [8] proposed the “Squeeze-and-Excitation Network” (SENET) to improve the representational power of a network by explicitly modeling the interdependencies between the channels of convolu- tional features in various image classification tasks. The SENET is proved to be successful in image classification tasks and won first place in the ILSVRC 2017 classification task.
There are also other applications about SENET except for the image classification[12, 21, 24]. [21] introduces three variants of the SE modules for semantic segmentation task. Classifying common thoracic diseases as well as localizing suspicious lesion regions on chest X-rays[24] is another application field. [16] extends SENET module with a global-and-local attention (GALA) module to get state-of-the-art accuracy on ILSVRC.
目的是动态学习特征的重要性,更精细地学习特征的交互。
To this end, we propose the Feature Importance and Bilinear feature Interaction NETwork(FiBiNET) for CTR prediction tasks.
模型的组成:
(12)Sparse Input and Embedding layer
如上面描述,和DeepFM相同
输出:E = [e1,e2,··· ,ei,··· ,ef ],
(13) SENET Layer
different features have various importances for the target task。
(14) the SENET is comprised of three steps
14.1 Squeeze.
This step is used for calculating ’summary statistics’ of each field embedding.
E--->Z
14.2 Excitation
learn the weight of each field embedding based on the statistic vector Z
2个全连接层
14.3 Re-Weight
也叫re-scale
(15)Bilinear-Interaction Layer
to calculate the second order fea- ture interactions
· denotes the regular inner product,
⊙ denotes the Hadamard product
Inner product and Hadamard prod- uct in Interaction layer are too simple to effectively model the feature interactions in sparse dataset.
【太简单了,不能完成特征交互的建模】
(16)本文提出3种类型,完成特征交互
E--->p
V--->q
(17) Combination Layer
p和q,喂给FiBiNET的标准神经网络层
(18)Deep Network
由几个全连接层组成,抓取高阶特征的交互。
(19)输出层
(20)目标函数
(21)FM和FNN的关系
我们的模型,去除SENET layer and Bilinear-Interaction layer, 是FNN。
再去除DNN part, 同时使用一个求和,shallow FiBiNET 降级为traditional FM
在这个部分,我们组织了大量的实验,来回答以下几个问题:
(RQ1) How does our model perform as compared to the state-of- the-art methods for CTR prediction? 在CTR预测上面,和最先进的模型相比,我们的模型表现如何?
(RQ2) Can the different combinations of bilinear and Hadamard functions in Bilinear-Interaction layer impact its performance?
在双线性交叉层,双线性和哈达码乘积,不同的组合方式,是否能够影响他们的性能?
(RQ3) Can the different field types(Field-All, Field-Each and Field- Interaction) of Bilinear-Interaction layer impact its performance?
双线性交叉层的不同field类型,能否影响性能?
(RQ4) How do the settings of networks influence the performance of our model?
模型的网络结构设置,如何影响性能?
(RQ5) Which is the most important component in FiBiNET?
FiBiNET最重要的部分是哪个?
我们将要回答这些问题,通过展现一些基础的实验设置。
(22)数据集
4.1.1 数据集
1) Criteo.
CTR预估模型中,Criteo 被广泛使用。它包含45 millions的点击数据实例。有26个匿名的分类领域,13个连续的特征域。我们将数据随机地分为2部分:90%训练,其余的作为测试集。
2) Avazu
Avazu数据集包含几天的点击数据,按照年代排序。它包含了40millions的点击日志数据实例。对于每一个点击数据,有24个域,表示独立的广告影响。我们将这些数据随机分为2部分:80%训练,其余测试。
(23)评估指标
我们采用2个指标:AUC和Log loss
AUC: Area under ROC curve is a widely used metric in evaluating classification problems. Besides, some work validates AUC as a good measurement in CTR prediction[3]. AUC is insensitive to the classification threshold and the positive ratio. The upper bound of AUC is 1, and the larger the better.
Log loss: Log loss is widely used metric in binary classification, measuring the distance between two distributions. The lower bound of log loss is 0, indicating the two distributions perfectly match, and a smaller value indicates better performance.
(24)基准方法
实验分为2组:shallow group and deep group
baseline models 也是2组: shallow baseline models and deep baseline models.
浅模型基准包含:LR、FM、FFM、AFM;
深度模型基准包含:FNN、DCN、DeepFM、XDeepFM
请注意, 在CTR预测中,AUC的哪怕1‰ 的提高,被认为是非常重要的,因为它会给公司的收益带来巨大的增长,如果这个公司有大量的用户。
(25)实现细节
(26) 结论
根据现有的最好的模型的缺点,我们提出了一个新模型FiBiNET,全称Feature Importance and Bilinear feature Interaction NETwork ,目的是动态学习特征的重要性,更好地获取特征的交互。我们提出的FiBiNET,在以下几个方面有贡献:
1) CTR预测任务,SENET 模块可以动态学习特征的重要性。它增强了重要特征的权重,降低了不重要特征的权重。
2)我们介绍了双线性交互层的3种类型,学习特征的交互,而不是计算特征的交互,通过哈达码乘积和内积。
3) 我们的浅模型,将SENET机制和双线性特征交互结合起来,比其他的浅模型效果好,例如:FM和FFM。
4)为了更好地提升效果,我们将浅模型和DNN进行结合,形成了深度模型。深度FiBiNET 持续地表现优秀,比其他的前沿的深度模型,例如DeepFM和XdeepFM.
相关资料:
https://shenweichen.blog.csdn.net/article/details/95234555