FM (factor machine)算法是有监督的机器学习算法,可以用来分类和回归,一般用来做CTR预估。FM算法的亮点是提出了一种特征组合的方式。
y = w 0 + ∑ i = 1 n w i ∗ x i + ∑ i < j w i , j ∗ x i ∗ x j y=w_{0}+\sum_{i=1}^{n}{w_{i}*x_{i}}+\sum_{i<j}{w_{i,j}*x_{i}*x_{j}} y=w0+i=1∑nwi∗xi+i<j∑wi,j∗xi∗xj
W = V ∗ V T W=V*V^T W=V∗VT
V T = ( V 1 T , V 2 T , ⋯   , V n T ) V^T=(V_{1}^{T},V_{2}^{T},\cdots,V_{n}^T) VT=(V1T,V2T,⋯,VnT)
如上面公式所示,FM 算法在逻辑回归的基础上加入了特征交叉项。如过给每个交叉项单独指定一个权重,参数的个数是n的二次方,容易过拟合。为了减少参数的个数,每个特征对应一个维度为K隐因子向量,特征交叉项的系数是两个交叉特征的隐因子的点乘。采用矩阵表示,最后一项写成
∑ i < j w i , j ∗ x i ∗ x j = 1 2 ( ∑ i , j < V i , V j > ∗ x i ∗ x j − ∑ i ∣ ∣ x i ∗ V i ∣ ∣ 2 ) \sum_{i<j}{w_{i,j}*x_{i}*x_{j}}=\frac{1}{2}(\sum_{i,j}<V_{i},V_{j}>*x_{i}*x_{j}-\sum_{i}||x_{i}*V_{i}||^2) i<j∑wi,j∗xi∗xj=21(i,j∑<Vi,Vj>∗xi∗xj−i∑∣∣xi∗Vi∣∣2)
= 1 2 { X ∗ V ∗ V T ∗ X T − ( X ⋅ X ) ∗ ( V ⋅ V ) . s u m ( a x i s = 1 ) } =\frac{1}{2}\{X*V*V^T*X^T-(X\cdot{X})*(V\cdot{V}).sum(axis=1)\} =21{X∗V∗VT∗XT−(X⋅X)∗(V⋅V).sum(axis=1)}
以X表示某个m 个样本构成的数据集m*n 矩阵,样本有n个特征,Y的预测值可以按下面公式表示
Y P = b + X ∗ w ⃗ + 1 2 { [ ( X ∗ V ) 2 − ( X ⋅ X ) ∗ ( V ⋅ V ) ] . s u m ( a x i s = 1 ) } Y^P=b+X*\vec{w}+\frac{1}{2}\{[(X*V)^2-(X\cdot{X})*(V\cdot{V})].sum(axis=1)\} YP=b+X∗w+21{[(X∗V)2−(X⋅X)∗(V⋅V)].sum(axis=1)}
本小节基于tensorflow 实现FM算法,代码如下:
import tensorflow as tf
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_svmlight_file
learning_rate = 0.1
training_epoches = 500
batch_size =100
display_step = 1
X_total,Y_total=load_svmlight_file("libsvmfinal_8_2")
shape=X_total.shape
Y_total=Y_total.reshape(-1, 1)
X_train, X_test, y_train, y_test = train_test_split(X_total, Y_total, test_size = 0.1, random_state = 42)
n_samples = X_train.shape[0]
n_features =shape[1]
k=2
#placeholder
x = tf.sparse_placeholder(tf.float64)
y = tf.placeholder(tf.float64, [None, 1])
#parametor to train
V = tf.Variable(tf.zeros([n_features, k],dtype=tf.float64),dtype=tf.float64,name="vk")
b = tf.Variable(tf.zeros([1],dtype=tf.float64),dtype=tf.float64,name="b")
w = tf.Variable(tf.random_normal((n_features,1),dtype=tf.float64),dtype=tf.float64,name="w")
#model logic
vx=tf.sparse_tensor_dense_matmul(x,V)
vx_sq=tf.multiply(vx,vx)
xx=tf.square(x)
vsq_xsq=tf.sparse_tensor_dense_matmul(xx,V*V)
biterm=vx_sq-vsq_xsq
preds=tf.nn.sigmoid(tf.sparse_tensor_dense_matmul(x,w)+0.5*tf.reduce_sum(biterm,reduction_indices=1)+b)
cost=tf.reduce_mean(-y*tf.log(tf.clip_by_value(preds,1e-10,1.0))-(1-y)*tf.log(tf.clip_by_value(1-preds,1e-10,1.0)))
optimizer = tf.train.AdamOptimizer(learning_rate).minimize(cost)
threshold=tf.constant(0.5,dtype=tf.float64)
plabel=tf.cast(threshold<y,tf.float64)
accuracy=tf.metrics.accuracy(y,plabel)
#auc=tf.metrics.auc(y,preds)
#accuracy=tf.reduce_mean(tf.cast(tf.equal(plabel,y),tf.float32))
init = tf.global_variables_initializer()
saver=tf.train.Saver()
with tf.Session() as sess:
#sess.run(init)
sess.run(tf.initialize_local_variables())
saver.restore(sess,"my_model3.ckpt198")
for epoch in range(training_epoches):
avg_cost = 0
total_batch = int(n_samples / batch_size)
batch=0
for i in range(total_batch):
xcsr=X_train[i*batch_size:(i+1)*batch_size]
coo=xcsr.tocoo()
indices=np.mat([coo.row,coo.col]).transpose()
_, c = sess.run([optimizer, cost],
feed_dict={x:tf.SparseTensorValue(indices,coo.data,coo.shape)
,y: y_train[i * batch_size : (i+1) * batch_size]})
avg_cost += c / total_batch
batch +=1
if batch%2000 ==1999:
print "epoch",epoch,"batch",batch
print "b", b.eval()
if epoch%100 ==99:
save_path=saver.save(sess,"./my_model3.ckpt"+str(epoch))
if (epoch+1) % display_step == 0:
print("Epoch:", "%04d" % (epoch+1), "cost=", avg_cost)
xcsr=X_test
coo=xcsr.tocoo()
indices=np.mat([coo.row,coo.col]).transpose()
act=sess.run([accuracy],feed_dict={x:tf.SparseTensorValue(indices,coo.data,coo.shape),y: y_test})
print "accuracy" ,act
print("Optimization Finished!")
tensorflow 程序脚本分为两步。第一步,使用tensorflow 定义的算子定义好模型,输入用placeholder 标识,变量使用variable定义。第二步,打开一个session,把输入放到placeholder 标识的dict 里,指定要执行的操作或要获取的输出,然后执行。在正式执行之前,要先初始化变量。
在第一步和第二步之间我们没看到的是,tensorflow 维护了一张默认的tfgraph。第一步定义的算子被编码在这张图里,第二步的run 执行的是这张图中的逻辑。tensorflow 可以看作一门编程语言。第一步是编译器前端,把运算逻辑解析成中间表达形式(实际上,tfgraph 非常像抽象语法树);第二步第一次运行计算图之前会对做一些中间代码优化,比如公共子表达式消除,图枝剪,常量折叠等,然后tensorflow 运行时对计算图解释执行。