FM、FMM、DeepFM整理(pytorch)

推荐系统与深度学习参考博客

一、分解机模型(Factorization Machine, FM)(2010年)

FM的模型方程为:
y = w 0 + ∑ i = 1 n w i x i x j + ∑ i = 1 n ∑ j = i + 1 n < v i , v j > x i x j y=w_0 + \sum_{i=1}^{n}{w_ix_ix_j}+\sum_{i=1}^{n}\sum_{j=i+1}^{n}{<v_i,v_j>x_ix_j} y=w0+i=1nwixixj+i=1nj=i+1n<vi,vj>xixj
X = ( x 1 , x 2 , … , x n ) ′ , X ∈ R n × 1 X=(x_1,x_2,…,x_n)^{'}, X\in R^{n\times1} X=(x1,x2,,xn),XRn×1。构造交叉项的权值向量为 V i = ( v i 1 , v i 2 , … , v i k ) ′ , V i ∈ R k × 1 ​ V_i=(v_{i1},v_{i2},…,v_{ik})^{'},V_i\in R^{k\times 1}​ Vi=(vi1,vi2,,vik),ViRk×1

我们问题的目标转化为求 T T T矩阵的上三角和。

其中:
T = [ V 1 ′ V 1 x 1 x 1 . . . V 1 ′ V n x 1 x n . . . V i ′ V j x i x j . . . V n ′ V 1 x n x 1 . . . V n ′ V n x n x n ] T=\left[ \begin{array}{ccc} V_1^{'}V_1x_1x_1 & ... & V_1^{'}V_nx_1x_n\\ ... & V_i^{'}V_jx_ix_j & ...\\ V_n^{'}V_1x_nx_1 & ... & V_n^{'}V_nx_nx_n \end{array} \right ] T=V1V1x1x1...VnV1xnx1...ViVjxixj...V1Vnx1xn...VnVnxnxn

V = [ V 1 ′ V 2 ′ . . . V n ′ ] n × k = [ ( v 11 v 12 . . . v 1 k ) ( v 21 v 22 . . . v 2 k ) . . . ( v n 1 v n 2 . . . v n k ) ] n × k = [ v 11 v 12 . . . v 1 k v 21 v 22 . . . v 2 k . . . . . . v i j . . . v n 1 v n 2 . . . v n k ] n × k V= \left[ \begin{array}{c} V_{1}^{'} \\ V_{2}^{'} \\ ... \\ V_{n}^{'} \end{array} \right ]_{n\times k} = \left[ \begin{array}{c} (v_{11} & v_{12} & ... & v_{1k}) \\ (v_{21} & v_{22} & ... & v_{2k}) \\ ... \\ (v_{n1} & v_{n2} & ... & v_{nk}) \end{array} \right ]_{n\times k} =\left[ \begin{array}{cccc} v_{11} & v_{12} & ... & v_{1k}\\ v_{21} & v_{22} & ... & v_{2k}\\ ... & ... & v_{ij} & ...\\ v_{n1} & v_{n2} & ... & v_{nk} \end{array} \right ]_{n \times k} V=V1V2...Vnn×k=(v11(v21...(vn1v12v22vn2.........v1k)v2k)vnk)n×k=v11v21...vn1v12v22...vn2......vij...v1kv2k...vnkn×k

解:

∑ i = 1 n ∑ j = i + 1 n < v i , v j > x i x j = 1 2 ∑ i = 1 n ∑ j = 1 n < v i , v j > x i x j − 1 2 ∑ i = 1 n < v i , v i > x i x i = 1 2 ∑ i = 1 n ∑ j = 1 n ∑ k = 1 K v i k v j k x i x j − 1 2 ∑ i = 1 n ∑ k = 1 K v i k v i k x i x i = 1 2 ∑ k = 1 K [ ∑ i = 1 n ( v i k x i ) ∑ j = 1 n ( v j k x j ) ] − 1 2 ∑ k = 1 K ∑ i = 1 n x i 2 v i k 2 = 1 2 ∑ k = 1 K ( ∑ i = 1 n v i k x i ) 2 − 1 2 ∑ k = 1 K ( ∑ i = 1 n v i k 2 x i 2 ) = 1 2 ∑ k = 1 K [ ( ∑ i = 1 n v i k x i ) 2 − ∑ i = 1 n v i k 2 x i 2 ] \sum_{i=1}^{n}\sum_{j=i+1}^{n}<v_i,v_j>x_ix_j = \\ \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}<v_i,v_j>x_ix_j - \frac{1}{2}\sum_{i=1}^{n}<v_i,v_i>x_ix_i \\ = \frac{1}{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{K}{v_{ik}v_{jk}x_ix_j} - \frac{1}{2}\sum_{i=1}^{n}\sum_{k=1}^{K}{v_{ik}v_{ik}x_ix_i} \\ = \frac{1}{2}\sum_{k=1}^{K}[\sum_{i=1}^{n}(v_{ik}x_i)\sum_{j=1}^{n}(v_{jk}x_j)] - \frac{1}{2}\sum_{k=1}^{K}\sum_{i=1}^{n}x_i^2v_{ik}^2 \\ = \frac{1}{2}\sum_{k=1}^{K}(\sum_{i=1}^{n}v_{ik}x_i)^2 - \frac{1}{2}\sum_{k=1}^{K}(\sum_{i=1}^{n}v_{ik}^2x_i^2) \\ = \frac{1}{2}\sum_{k=1}^{K}[(\sum_{i=1}^{n}v_{ik}x_i)^2-\sum_{i=1}^{n}v_{ik}^2x_i^2] i=1nj=i+1n<vi,vj>xixj=21i=1nj=1n<vi,vj>xixj21i=1n<vi,vi>xixi=21i=1nj=1nk=1Kvikvjkxixj21i=1nk=1Kvikvikxixi=21k=1K[i=1n(vikxi)j=1n(vjkxj)]21k=1Ki=1nxi2vik2=21k=1K(i=1nvikxi)221k=1K(i=1nvik2xi2)=21k=1K[(i=1nvikxi)2i=1nvik2xi2]
公式整理如下:
特征维度为n(包含各特征的one-hot信息和密集型特征),交叉项参数矩阵的维度为k。
记: 模型输入 i n p u t ∈ R n × 1 input \in R^{n \times 1} inputRn×1
f m 1 = ( v ′ ⋅ i n p u t ) 2 v ∈ R n × k f m 1 ∈ R k × 1 f m 2 = ( v ′ ) 2 ⋅ i n p u t 2 f m 2 ∈ R k × 1 o u t = W ′ ⋅ i n p u t + 1 2 ⋅ 1 ( f m 1 − f m 2 ) W ∈ R n × 1 fm_1 = (v^{'} \cdot input)^2 \qquad v \in R^{n \times k} \quad fm_1 \in R^{k \times 1} \\ fm_2 = (v^{'})^2 \cdot input^2 \qquad fm_2 \in R^{k \times 1} \\ out = W^{'} \cdot input + \frac{1}{2} \cdot \mathbf{1}(fm_1-fm_2) \qquad W \in R^{n \times 1} fm1=(vinput)2vRn×kfm1Rk×1fm2=(v)2input2fm2Rk×1out=Winput+211(fm1fm2)WRn×1
其中:
1 = ( 1 , 1 , 1 , . . . , 1 ) ∈ R 1 × k \mathbf{1} = (1,1,1,...,1) \in R^{1 \times k} 1=(1,1,1,...,1)R1×k
模型中待学习的参数是 W W W v v v
代码如下:

class FM_model(nn.Module):
    def __init__(self, n, k):
        super(FM_model, self).__init__()
        self.n = n # len(items) + len(users)
        self.k = k
        self.linear = nn.Linear(self.n, 1, bias=True)
        self.v = nn.Parameter(torch.randn(self.k, self.n))

    def fm_layer(self, x):
      	# x 属于 R^{batch*n}
        linear_part = self.linear(x)
        # 矩阵相乘 (batch*p) * (p*k)
        inter_part1 = torch.mm(x, self.v.t())  # out_size = (batch, k)
        # 矩阵相乘 (batch*p)^2 * (p*k)^2
        inter_part2 = torch.mm(torch.pow(x, 2), torch.pow(self.v, 2).t()) # out_size = (batch, k) 
        output = linear_part + 0.5 * torch.sum(torch.pow(inter_part1, 2) - inter_part2) 
        # 这里torch求和一定要用sum
        return output  # out_size = (batch, 1)

    def forward(self, x):
        output = self.fm_layer(x)
        return output

二、Field-aware FM(2015)

三、Deep FM(2017)

你可能感兴趣的:(FM、FMM、DeepFM整理(pytorch))