ranking is a prediction task on list of objects. 所以 point-wise, pair-wise 等方法的训练任务与工作场景有差异, list-wise 理应更好.
该论文见参考[1].
算是一个预训练, task为正负样本二分类, 为后面list-wise作准备.
图: 将点击与曝光内容分别pooling, 后与 target,user 作concat.
图 公式书写太差, 有误, (1)式中分子下标i可能为 t t t,分母下标i可能为 l l l; (2)式中i及右括号应放在上标位置.
each session with the contained item behaviors is treated as a list-wisw training sample
,但还不是很清楚.论文1的list-wise借鉴了参考2, ICML’2017的微软的论文.
定义 list-wise 的损失函数:
∑ i = 1 m L ( y ( i ) , z ( i ) ) (1) \sum _{i=1}^mL(y^{(i)},z^{(i)}) \tag 1 i=1∑mL(y(i),z(i))(1)
where m = ∣ t r a i n s e t ∣ m=|trainset| m=∣trainset∣ , y ( i ) = ( y 1 ( i ) , y 2 ( i ) , . . . , y n ( i ) ( i ) ) y^{(i)}=(y^{(i)}_1,y^{(i)}_2,...,y^{(i)}_{n^{(i)}}) y(i)=(y1(i),y2(i),...,yn(i)(i)), 是一个list,表示与query q ( i ) q^{(i)} q(i) 相关的 n ( i ) n^{(i)} n(i) 个文档的相关性得分. 与之类似, z ( i ) = ( f ( x 1 ( i ) ) , f ( x 2 ( i ) ) , . . . , f ( x n ( i ) ( i ) ) ) z^{(i)}=(f(x^{(i)}_1),f(x^{(i)}_2),...,f(x^{(i)}_{n^{(i)}})) z(i)=(f(x1(i)),f(x2(i)),...,f(xn(i)(i))) 是 ranking function f ( ⋅ ) f(\cdot) f(⋅) 计算出的预估相关性.
图: permutation probability
对size=n的list作全排列, 有 n ! n! n! 种结果, 计算量不可接受, 也就是 NP_Hard? 所以提出 top one probability.
图: top one probability.
图: 定理6, doc j 排第一的概率描述
图: 对 ϕ ( ⋅ ) \phi(\cdot) ϕ(⋅)作指数函数定义后, 可以改写 定理6 , 就成了soft-max
图: soft_max 得到 label与pred两个list的概率后, 用交叉熵作损失函数, 得到了最终的loss.
In listwise approaches the loss depends on the full permutation of items. Although these losses consider inter-item dependencies, the ranking function itself is pointwise, so at inference time the model still assigns a score to each item which does not depend on scores of other items (i.e., an item’s score will not change if it is placed in a different set).