聚焦底层特征表示,提出自适应特征学习的多场景排序框架 MARIA Multi-scenario ranking framework with adaptmulti-scenario ranking framework with adaptive feature learning
引入Feature Scaling(FS)根据场景来缩放特征,放大重要特征,缩小不重要特征(类似SeNet)
引入Feature Refinement(FR)每个特征域设置一组特征微调器(Refiner),然后基于场景感知的gate网络(Selector)来进行选择 (类似于MoE)
引入Feature Correlation Modeling(FCM)对每个特征域进行显示交叉(类似PNN)
Q = [ h b ∣ ∣ u ∣ ∣ x i ∣ ∣ t ∣ ∣ c ] \mathbf Q = [\mathbf h_b || \mathbf u || \mathbf x_i || \mathbf t || \mathbf c] Q=[hb∣∣u∣∣xi∣∣t∣∣c]
这里用户特征 u \mathbf u u由用户ID Embedding及用户其他属性的Embedding拼接得到
用户属性集合 A u = { a u 1 , . . . , a u L } \mathcal{A_u} = \{a_u^1, ..., a_u^L\} Au={au1,...,auL}
u = [ e u ∣ ∣ a u 1 ∣ ∣ . . . ∣ ∣ a u L ] \mathbf u = [\mathbf e_u || \mathbf a_u^1 || ... || \mathbf a_u^L] u=[eu∣∣au1∣∣...∣∣auL]
这里物品特征 i \mathbf i i由物品ID Embedding及物品其他属性的Embedding拼接得到
用户属性集合 A i = { a i 1 , . . . , a i P } \mathcal{A_i} = \{a_i^1, ..., a_i^P\} Ai={ai1,...,aiP}
u = [ e i ∣ ∣ a i 1 ∣ ∣ . . . ∣ ∣ a i P ] \mathbf u = [\mathbf e_i || \mathbf a_i^1 || ... || \mathbf a_i^P] u=[ei∣∣ai1∣∣...∣∣aiP]
context特征集合 A c = { a c 1 , . . . , a c N c } \mathcal{A_c} = \{a_c^1, ..., a_c^{N_c}\} Ac={ac1,...,acNc}
trigger集合 A t = { a t 1 , . . . , a t O } \mathcal{A_t} = \{a_t^1, ..., a_t^O\} At={at1,...,atO}
t = [ a t 1 ∣ ∣ . . . ∣ ∣ a t P ] \mathbf t = [\mathbf a_t^1 || ... || \mathbf a_t^P] t=[at1∣∣...∣∣atP]
序列特征的Embedding矩阵 B u = { s e q 1 , s e q 2 , . . . , s e q m } \mathbf B_u = \{\mathbf seq_1,\mathbf seq_2, ..., \mathbf seq_m\} Bu={seq1,seq2,...,seqm}
经过target attention之后,得到
H u = { h 1 , h 2 , . . . , h m } \mathbf H_u = \{\mathbf h_1, \mathbf h_2, ... , \mathbf h_m\} Hu={h1,h2,...,hm}
h i = ∑ j = 1 m a j h j \mathbf h_i = \sum_{j=1}^m a_j \mathbf h_j hi=∑j=1majhj
attention分数 a i a_i ai计算如下
a i = e x p ( s i m ( t , h i ) ) ∑ j = 1 m e x p ( t , h j ) a_i = \frac {exp(sim(\mathbf t , \mathbf h_i))} { \sum_{j=1}^m exp(\mathbf t, \mathbf h_j)} ai=∑j=1mexp(t,hj)exp(sim(t,hi))
Q = [ h ∣ ∣ u ∣ ∣ i ∣ ∣ t ∣ ∣ c ] \mathbf Q = [\mathbf h || \mathbf u || \mathbf i || \mathbf t || \mathbf c] Q=[h∣∣u∣∣i∣∣t∣∣c]
α = λ ∗ S i g m o i d ( F C N ( f r e e z e ( Q ) ∣ ∣ e u ∣ ∣ e i ∣ ∣ e s ) ) \mathbf \alpha = \lambda * Sigmoid( \mathrm{FCN}(\mathrm{freeze}(\mathbf Q) || \mathbf e_u || \mathbf e_i || \mathbf e_s) ) α=λ∗Sigmoid(FCN(freeze(Q)∣∣eu∣∣ei∣∣es))
Q S = [ Q 1 α 1 , . . . , Q N Q α N Q ] \mathbf Q_S = [Q_1 \alpha_1, ..., Q_{N_Q} \alpha_{N_Q}] QS=[Q1α1,...,QNQαNQ]
= [ h ^ , u ^ , i ^ , t ^ , c ^ ] [ \mathbf {\hat h}, \mathbf {\hat u}, \mathbf {\hat i}, \mathbf {\hat t}, \mathbf {\hat c}] [h^,u^,i^,t^,c^]
以序列特征域为例,序列特征域Embedding为 h ^ \mathbf {\hat h} h^,场景ID的Embedding为 e s \mathbf e_s es
β = S o f t m a x ( S i g m o i d ( F C N [ h ^ ∣ ∣ e s ] ) \mathbf \beta = \mathrm{Softmax}(\mathrm{Sigmoid}(\mathrm{FCN}[\mathbf {\hat h} || \mathbf e_s]) β=Softmax(Sigmoid(FCN[h^∣∣es])
假设序列特征域有 k k k个Refiner,那么经过FR后序列特征为
h ~ = [ β 1 F C N 1 ( h ^ ) ∣ ∣ . . . ∣ ∣ β k F C N k ( h ^ ) ] \mathbf {\widetilde h} = [\mathbf {\beta_1} \mathrm {FCN_1} (\mathbf {\hat h}) || ... || \mathbf {\beta_k} \mathrm {FCN_k} ( \mathbf {\hat h})] h =[β1FCN1(h^)∣∣...∣∣βkFCNk(h^)]
Q R = [ h ~ ∣ ∣ u ~ ∣ ∣ i ~ ∣ ∣ t ~ ∣ ∣ c ~ ] \mathbf {Q_R} = [ \ \ \mathbf {\widetilde h} \ \ || \ \ \mathbf {\widetilde u} \ \ || \ \ \mathbf {\widetilde i} \ \ || \ \ \mathbf {\widetilde t} \ \ || \ \ \mathbf {\widetilde c} \ \ ] QR=[ h ∣∣ u ∣∣ i ∣∣ t ∣∣ c ]
这个就是处理各个特征域的交叉的,将 h ~ \mathbf {\widetilde h} h 、 u ~ \mathbf {\widetilde u} u 、 i ~ \mathbf {\widetilde i} i 、 t ~ \mathbf {\widetilde t} t 、 c ~ \mathbf {\widetilde c} c 映射成相同的维度
得到 h ‾ \mathbf {\overline h} h、 u ‾ \mathbf {\overline u} u、 i ‾ \mathbf {\overline i} i、 t ‾ \mathbf {\overline t} t、 c ‾ \mathbf {\overline c} c
Q C = [ h ‾ ⋅ u ‾ ∣ ∣ . . . ∣ ∣ t ‾ ⋅ c ‾ ] \mathbf {Q_C} = [ \ \mathbf {\overline h} \cdot \mathbf {\overline u} \ \ || ... || \ \ \mathbf {\overline t} \cdot \mathbf {\overline c} \ \ ] QC=[ h⋅u ∣∣...∣∣ t⋅c ]
经过Feature Refinement和Feature Correlation Modeling后的embedding拼接起来得到最终的输入embedding
Q f = [ Q R ∣ ∣ Q C ] \mathbf Q_f = [\mathbf Q_R || \mathbf Q_C] Qf=[QR∣∣QC]
Visual Search (VS)
Similar Search (SS)
Interest Search (IS)
Feature Scaling (FS)
Feature Refinement (FR)
Feature Correlation Modeling module (FCM)
Network Layer (NL)
the shared tower (ST) in the prediction layer
Gumbel Softmax (GS)