FiBiNET总结

Combining Feature Importance and Bilinear feature Interaction for Click-Through Rate Prediction

目录

本文主要工作

相关工作

本文的模型

实验


本文主要工作

(1)Our main contributions are listed as follows:

• Inspired by the success of SENET in the computer vision field, we use the SENET mechanism to learn the weights of features dynamically.【SENET机制,动态学习特征的权重】

• We introduce three types of Bilinear-Interaction layer to learn feature interactions in a fine-grained way. This is also in contrast to the previous work[6, 9, 10, 19, 20, 23], which calculates the feature interactions with Hadamard product or inner product.【3种方式学习特征的交互】

• Combining SENET mechanism with bilinear feature interaction, our shallow model achieves state-of-the-art among the shallow models such as FFM on Criteo and Avazu datasets.【SENET机制和双线性特征交互,形成浅模型。和FFM对比试验】

• For further performance gains, we combine a classical deep neural network(DNN) component with the shallow model to be a deep model. The deep FiBiNET consistently outperforms the other state-of-the-art deep models on Criteo and Avazu datasets.【加DNN,形成深度模型。和其他深度模型对比试验】

相关工作

(2)浅模型:FM

Factorization machine(FM)[19, 20] and field-aware factorization machine (FFM)[9, 10] are two of the most successful CTR mod- els. FM models all feature interactions between variables using factorized parameters. It has a low time complexity and memory storage, and it works well on large sparse data.

(3)浅模型:FFM

FFM introduced field aware latent vectors and won two competitions hosted by Criteo and Avazu[9]. However, FFM was restricted by the need of large memory and cannot easily be used in Internet companies.

(4)深度模型:FNN

Factorization-Machine Supported Neural Networks (FNN)[25] is a forward neural network using FM to pre-train the embedding layer. However, FNN can capture only high-order feature interac- tions.

(5)深度模型:Wide & Deep

Wide & Deep model(WDL)[1] was initially introduced for application recommendation in google play. WDL jointly trains wide linear models and deep neural networks to combine the bene- fits of memorization and generalization for recommender systems. However, expertise feature engineering is still needed on the in- put to the wide part of WDL, which means that the cross-product transformation also requires to be manually designed.

(6)深度模型:DeepFM

To allevi- ate manual efforts in feature engineering, DeepFM[4] replaces the wide part of WDL with FM and shares the feature embedding be- tween the FM and deep component. DeepFM is regarded as one state-of-the-art model in CTR estimation field.

(7)DCN

Deep & Cross Network (DCN)[22] efficiently captures feature interactions of bounded degrees in an explicit fashion.

(8) xDeepFM

Similarly, eXtreme Deep Factorization Machine (xDeepFM)[15] also models the low-order and high-order feature interactions in an explicit way by proposing a novel Compressed Interaction Network (CIN) part.

(9)AFM

As [23] mentioned, FM can be hindered by its modeling of all feature interactions with the same weight, as not all feature inter- actions are equally useful and predictive. And they propose the Attentional Factorization Machines(AFM)[23] model, which uses an attention network to learn the weights of feature interactions.

(10)DIN

Deep Interest Network (DIN)[26] represents users’ diverse interests with an interest distribution and designs an attention-like network structure to locally activate the related interests according to the candidate ad.

(11)SENET Module 【重点】

Hu [8] proposed the “Squeeze-and-Excitation Network” (SENET) to improve the representational power of a network by explicitly modeling the interdependencies between the channels of convolu- tional features in various image classification tasks. The SENET is proved to be successful in image classification tasks and won first place in the ILSVRC 2017 classification task.

There are also other applications about SENET except for the image classification[12, 21, 24]. [21] introduces three variants of the SE modules for semantic segmentation task. Classifying common thoracic diseases as well as localizing suspicious lesion regions on chest X-rays[24] is another application field. [16] extends SENET module with a global-and-local attention (GALA) module to get state-of-the-art accuracy on ILSVRC.

本文的模型

目的是动态学习特征的重要性,更精细地学习特征的交互。

To this end, we propose the Feature Importance and Bilinear feature Interaction NETwork(FiBiNET) for CTR prediction tasks.

模型的组成:

  1. sparse input layer,  【和DeepFM相同 adopts a sparse representation for input features and embeds the raw feature input into a dense vector.】
  2. embedding layer,  【和DeepFM相同 adopts a sparse representation for input features and embeds the raw feature input into a dense vector.】
  3. SENET layer,   【convert an embedding layer into the SENET-Like embedding fea- tures, which helps to boost feature discriminability.】
  4. Bilinear-Interaction layer, 【models second order feature interactions on the original embedding and the SENET-Like embedding respec- tively.】
  5. combination layer, 【合并Bilinear-Interaction层的输出,将交叉特征串联起来】
  6. multiple hidden layers  【feed the cross features into a deep neural network】
  7. output layer.

(12)Sparse Input and Embedding layer

如上面描述,和DeepFM相同

输出:E = [e1,e2,··· ,ei,··· ,ef ],

(13) SENET Layer

different features have various importances for the target task。

  1. Using the feature embedding as input, the SENET produces weigh tvector A={a1,···,ai,···,af} for field embeddings
  2. then rescales the original embedding E with vector A to get a new embedding(SENET-Likeembedding)V = [v1,··· ,vi,··· ,vf ], where ai ∈ R is a scalar that denotes the weight of the i-th field embedding vi , vi ∈ Rk denotes the SENET-Like embedding of i-th field,i∈[1,2,···,f],V∈Rf×k,k is an embedding size,and fis the number of fields.
  • SENET生成A
  • f(A,E)=V

 

(14) the SENET is comprised of three steps

  1. Squeeze
  2. Excitation
  3. Re-Weight

14.1 Squeeze.

This step is used for calculating ’summary statistics’ of each field embedding. 

E--->Z

FiBiNET总结_第1张图片

14.2 Excitation

learn the weight of each field embedding based on the statistic vector Z

2个全连接层

14.3 Re-Weight

也叫re-scale

  1. It does field-wise multiplication between the original field embedding E and field weight vec- tor A
  2. outputs the new embedding(SENET-Like embedding) V ={v1,···,vi,···,vf}.

(15)Bilinear-Interaction Layer

to calculate the second order fea- ture interactions

  1. 传统的交互层,特征交互的方法一般用 inner product and Hadamard product
  2. Inner product is widely used in shallow models such as FM and FFM
  3. the Hadamard product is commonly used in deep models such as AFM and NFM.

· denotes the regular inner product,

 ⊙ denotes the Hadamard product

Inner product and Hadamard prod- uct in Interaction layer are too simple to effectively model the feature interactions in sparse dataset.

【太简单了,不能完成特征交互的建模】

(16)本文提出3种类型,完成特征交互

FiBiNET总结_第2张图片

E--->p

V--->q

(17) Combination Layer

p和q,喂给FiBiNET的标准神经网络层

(18)Deep Network

由几个全连接层组成,抓取高阶特征的交互。

(19)输出层

(20)目标函数

FiBiNET总结_第3张图片

(21)FM和FNN的关系

我们的模型,去除SENET layer and Bilinear-Interaction layer, 是FNN。

再去除DNN part, 同时使用一个求和,shallow FiBiNET 降级为traditional FM

实验

在这个部分,我们组织了大量的实验,来回答以下几个问题:

(RQ1) How does our model perform as compared to the state-of- the-art methods for CTR prediction? CTR预测上面,和最先进的模型相比,我们的模型表现如何?

(RQ2) Can the different combinations of bilinear and Hadamard functions in Bilinear-Interaction layer impact its performance?

在双线性交叉层,双线性和哈达码乘积,不同的组合方式,是否能够影响他们的性能?

(RQ3) Can the different field types(Field-All, Field-Each and Field- Interaction) of Bilinear-Interaction layer impact its performance?

双线性交叉层的不同field类型,能否影响性能?

(RQ4) How do the settings of networks influence the performance of our model?

模型的网络结构设置,如何影响性能?

(RQ5) Which is the most important component in FiBiNET?

FiBiNET最重要的部分是哪个?

我们将要回答这些问题,通过展现一些基础的实验设置。

(22)数据集

4.1.1 数据集

1) Criteo.

CTR预估模型中,Criteo 被广泛使用。它包含45 millions的点击数据实例。有26个匿名的分类领域,13个连续的特征域。我们将数据随机地分为2部分:90%训练,其余的作为测试集。

2) Avazu

Avazu数据集包含几天的点击数据,按照年代排序。它包含了40millions的点击日志数据实例。对于每一个点击数据,有24个域,表示独立的广告影响。我们将这些数据随机分为2部分:80%训练,其余测试。

(23)评估指标

我们采用2个指标:AUCLog loss

AUC: Area under ROC curve is a widely used metric in evaluating classification problems. Besides, some work validates AUC as a good measurement in CTR prediction[3]. AUC is insensitive to the classification threshold and the positive ratio. The upper bound of AUC is 1, and the larger the better.

Log loss: Log loss is widely used metric in binary classification, measuring the distance between two distributions. The lower bound of log loss is 0, indicating the two distributions perfectly match, and a smaller value indicates better performance.

(24)基准方法

实验分为2组:shallow group and deep group

baseline models 也是2组: shallow baseline models and deep baseline models.

浅模型基准包含:LRFMFFMAFM

深度模型基准包含:FNNDCNDeepFMXDeepFM

请注意, CTR预测中,AUC的哪怕1‰ 的提高,被认为是非常重要的,因为它会给公司的收益带来巨大的增长,如果这个公司有大量的用户。

(25)实现细节

  1. 使用TensorFlow。
  2. embedding,Criteo 的维度是10,Avazu 的维度是50.
  3. 在优化方法中,我们使用了Adam,
  4.   --- Criteo  的 mini-batch size =100,Avazu 的是500
  5.   --- 学习率是0.0001.
  6. 在所有的深度模型中,层的深度设置为3,所有的激活函数是RELU,
  7.   --- 每层的神经元个数:Criteo  是400,Avazu  是2000.
  8.   --- dropout 是0.5.
  9. 在SENET部分,两层FC的激活函数RELU,衰减率是3. 
  10. 我们在2个Tesla K40 GPUs. 进行试验。

(26) 结论

根据现有的最好的模型的缺点,我们提出了一个新模型FiBiNET,全称Feature Importance and Bilinear feature Interaction NETwork ,目的是动态学习特征的重要性,更好地获取特征的交互。我们提出的FiBiNET,在以下几个方面有贡献:

1) CTR预测任务,SENET 模块可以动态学习特征的重要性。它增强了重要特征的权重,降低了不重要特征的权重。

2)我们介绍了双线性交互层的3种类型,学习特征的交互,而不是计算特征的交互,通过哈达码乘积和内积。

3 我们的浅模型,将SENET机制和双线性特征交互结合起来,比其他的浅模型效果好,例如:FMFFM

4)为了更好地提升效果,我们将浅模型和DNN进行结合,形成了深度模型。深度FiBiNET 持续地表现优秀,比其他的前沿的深度模型,例如DeepFMXdeepFM.

 

 

相关资料:

https://shenweichen.blog.csdn.net/article/details/95234555

你可能感兴趣的:(神经网络)