因为wide-deep和deepfm有很大的相似点,这篇文章之后,博主会再次带来wide-deep的理解,也是论文+难点分析
deepfm论文的翻译:https://blog.csdn.net/a1066196847/article/details/100997968
(本文出自公众号:AI爱好者社区)
一、在 deepfm 中,从网络图结构中就可以看到有一部分是 fm方式对特征处理的 部分,一部分是dnn方式对特征处理后的 部分,两部分相加组成
博主对论文+源码进行研读后,总结出来整个数据在网络中是这样进行处理的,在deepfm中fm层、deep层是这样合作的
1:fm层献两部分,deep层贡献1部分,3部分合在一起,然后乘 weight + bias,再进行tf.add,得到输出值
2:fm的两部分
part1:F个特征,每个都有一个权重,权重*取值,得到(batch_size,F)
part2:F个特征,每个都有权重向量,长度为K,权重向量*取值,得(batch_size, F, K),再对每个index位置的向量进行sum,得(batch_size, F*K)
3:dnn的1部分
每个特征都和fm层的part2共享权重,每个特征得到K长度向量,即为F*K长度,得(batch_size, F*K)
4:三部分相加后,组成shape为(batch_size, F + K + F*K) 也就是下面的 concat_input,然后乘 weight,加 bias,再进行 add,得到输出
self.out = tf.add(tf.matmul(concat_input, self.weight["concat_projection"]), self.weights["concat_bias"])
5:然后得loss,优化器,得到输出
二、对于fm的原理,以及公式推导,可以在网上搜索文章,讲解的有很多,里面最重要的一部分就是下面的公式化简部分
那么这个公式在代码中是如何体现的呢?
先给出代码链接:https://github.com/ChenglongChen/tensorflow-DeepFM
因为设计到公式,以及一些排版,所有我将自己的笔记作为图片上传上来
三、对网络结构部分的代码,加上注释进行分析
def _init_graph(self):
self.graph = tf.Graph()
with self.graph.as_default():
tf.set_random_seed(self.random_seed)
# 先定义两个概念,F指特征域的数目,也就是有多少个特征。下面的feat_index和feat_value就是每个特征的唯一编码和对应的取值
self.feat_index = tf.placeholder(tf.int32, shape=[None, None],
name="feat_index") # None * F
self.feat_value = tf.placeholder(tf.float32, shape=[None, None],
name="feat_value") # None * F
#
self.label = tf.placeholder(tf.float32, shape=[None, 1], name="label") # None * 1
# 下面两个drop部分是对fm和dnn处理后的特征部分,进行随机删除一些神经元
self.dropout_keep_fm = tf.placeholder(tf.float32, shape=[None], name="dropout_keep_fm")
self.dropout_keep_deep = tf.placeholder(tf.float32, shape=[None], name="dropout_keep_deep")
self.train_phase = tf.placeholder(tf.bool, name="train_phase")
# 定义fm层、各个dnn层的权重矩阵,之后的fm层的特征查1维、K维向量都会从这里面出,dnn层的K维向量也会从这里面出
# fm层和dnn后处理后的连接部分的权重也从这里面出。
# _initialize_weights里面的代码部分跟网络结构几乎一致
self.weights = self._initialize_weights()
# model
self.embeddings = tf.nn.embedding_lookup(self.weights["feature_embeddings"],
self.feat_index) # None * F * K
feat_value = tf.reshape(self.feat_value, shape=[-1, self.field_size, 1])
self.embeddings = tf.multiply(self.embeddings, feat_value)
# 每个特征都有一个1维权重向量,和特征值相乘,得到fm的第1部分特征处理后的数据
# ---------- first order term ----------
self.y_first_order = tf.nn.embedding_lookup(self.weights["feature_bias"], self.feat_index) # None * F * 1
self.y_first_order = tf.reduce_sum(tf.multiply(self.y_first_order, feat_value), 2) # None * F
self.y_first_order = tf.nn.dropout(self.y_first_order, self.dropout_keep_fm[0]) # None * F
# 每个特征都有一个K维权重向量,和特征值相乘,得到fm的第2部分特征处理后的数据
# ---------- second order term ---------------
# sum_square part
self.summed_features_emb = tf.reduce_sum(self.embeddings, 1) # None * K
self.summed_features_emb_square = tf.square(self.summed_features_emb) # None * K
# square_sum part
self.squared_features_emb = tf.square(self.embeddings)
self.squared_sum_features_emb = tf.reduce_sum(self.squared_features_emb, 1) # None * K
# second order
self.y_second_order = 0.5 * tf.subtract(self.summed_features_emb_square, self.squared_sum_features_emb) # None * K
self.y_second_order = tf.nn.dropout(self.y_second_order, self.dropout_keep_fm[1]) # None * K
# ---------- Deep component ----------
self.y_deep = tf.reshape(self.embeddings, shape=[-1, self.field_size * self.embedding_size]) # None * (F*K)
self.y_deep = tf.nn.dropout(self.y_deep, self.dropout_keep_deep[0])
for i in range(0, len(self.deep_layers)):
self.y_deep = tf.add(tf.matmul(self.y_deep, self.weights["layer_%d" %i]), self.weights["bias_%d"%i]) # None * layer[i] * 1
if self.batch_norm:
self.y_deep = self.batch_norm_layer(self.y_deep, train_phase=self.train_phase, scope_bn="bn_%d" %i) # None * layer[i] * 1
self.y_deep = self.deep_layers_activation(self.y_deep)
self.y_deep = tf.nn.dropout(self.y_deep, self.dropout_keep_deep[1+i]) # dropout at each Deep layer
# ---------- DeepFM ----------
if self.use_fm and self.use_deep:
concat_input = tf.concat([self.y_first_order, self.y_second_order, self.y_deep], axis=1)
elif self.use_fm:
concat_input = tf.concat([self.y_first_order, self.y_second_order], axis=1)
elif self.use_deep:
concat_input = self.y_deep
self.out = tf.add(tf.matmul(concat_input, self.weights["concat_projection"]), self.weights["concat_bias"])
# loss
if self.loss_type == "logloss":
self.out = tf.nn.sigmoid(self.out)
self.loss = tf.losses.log_loss(self.label, self.out)
elif self.loss_type == "mse":
self.loss = tf.nn.l2_loss(tf.subtract(self.label, self.out))
# l2 regularization on weights
if self.l2_reg > 0:
self.loss += tf.contrib.layers.l2_regularizer(
self.l2_reg)(self.weights["concat_projection"])
if self.use_deep:
for i in range(len(self.deep_layers)):
self.loss += tf.contrib.layers.l2_regularizer(
self.l2_reg)(self.weights["layer_%d"%i])
# optimizer
if self.optimizer_type == "adam":
self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, beta1=0.9, beta2=0.999,
epsilon=1e-8).minimize(self.loss)
elif self.optimizer_type == "adagrad":
self.optimizer = tf.train.AdagradOptimizer(learning_rate=self.learning_rate,
initial_accumulator_value=1e-8).minimize(self.loss)
elif self.optimizer_type == "gd":
self.optimizer = tf.train.GradientDescentOptimizer(learning_rate=self.learning_rate).minimize(self.loss)
elif self.optimizer_type == "momentum":
self.optimizer = tf.train.MomentumOptimizer(learning_rate=self.learning_rate, momentum=0.95).minimize(
self.loss)
elif self.optimizer_type == "yellowfin":
self.optimizer = YFOptimizer(learning_rate=self.learning_rate, momentum=0.0).minimize(
self.loss)
# init
self.saver = tf.train.Saver()
init = tf.global_variables_initializer()
self.sess = self._init_session()
self.sess.run(init)
# number of params
total_parameters = 0
for variable in self.weights.values():
shape = variable.get_shape()
variable_parameters = 1
for dim in shape:
variable_parameters *= dim.value
total_parameters += variable_parameters
if self.verbose > 0:
print("#params: %d" % total_parameters)
四、像deepfm这种处理方式,实际上还可以做下扩展,就是在fm层,所有的单值特征乘以权重后进行sum/mean等操作,每个特征转成K维向量后,还可以进行进行一层embedding查询,这样网络结构虽然变得复杂,而且不太易于理解,但常常会带来更好的效果体验