一、Factorization-machine supported Neural Networks (FNN)
由前几期的介绍可知因子机(FM)可以对特征进行二阶交叉。当面对海量高度稀疏的用户行为反馈数据时,二阶交叉往往是不够的,三阶、四阶甚至更高阶的组合交叉能够进一步提升模型学习能力,但是高阶的特征交叉会使复杂度大大增加。
FNN的思想比较简单,直接在FM上接入若干全连接层。利用DNN对特征进行隐式交叉,可以减轻特征工程的工作,同时也能够将计算时间复杂度控制在一个合理的范围内。为了加速模型的收敛,充分利用FM的特征表达能力,FNN采用了两阶段训练方式。首先,针对任务构建FM模型,完成模型参数的学习。然后,将FM的参数作为FNN底层参数的初始值。这种两阶段方式的应用,是为了将FM作为先验知识加入到模型中,防止因为数据稀疏带来的歧义造成模型参数偏差。
其基本模型如下:
输出层,即预估的CTR值:
其中分别表示权值和偏置。
第二隐藏层:
第一隐藏层:
其中表示因子机的参数向量,,由第一层的因子机初始化:
损失函数可以采用交叉熵:
二、Product-based Neural Network(PNN)
PNN同样引入了神经网络对低阶特征进行组合,但与FNN不同,PNN并没有单纯使用全连接层来对低阶特征进行组合,而是设计了Product层对特征进行更细致的交叉运算。
输出层,即预估的CTR值:
第二隐藏层:
第一隐藏层:
其中分别表示线性信息和二次信息,PNN的核心即在于计算。
定义点积运算:
则:
其中:
综上:
其中表示经过embedding之后的特征向量,embedding过程与FNN保持一致,同时可以看到PNN保留了低阶特征,避免了FNN仅对二阶特征进行更高阶组合的缺点。表示成对特征交叉函数,定义不同的交叉方式就有不同的PNN结构,常见的有內积运算(Inner Product-based Neural Network, IPNN)和外积运算(Outer Product-based Neural Network, OPNN)。
2.1 IPNN 分析
定义 , 则:
空间复杂度:的计算空间复杂度,的计算空间复杂度,因此,product layer 整体计算空间复杂度为。
时间复杂度:的计算时间复杂度,计算 需要 时间开销,计算 需要 时间开销,又因为 需要 时间开销,所以 计算时间复杂度为 ,因此,product layer 整体计算时间复杂度为。
时空复杂度过高不适合工程实践,所以需要进行计算优化。因为 本身计算开销不大,所以将重点在于优化 的计算,受FM的参数矩阵分解启发,由于 都是对称方阵,所以使用一阶矩阵分解,假设 。将原本参数量为 的矩阵 ,分解为了参数量为 的向量 ,则:
其中,则:
优化后的时空复杂度:
空间复杂度由降为,
时间复杂度由降为。
2.2 OPNN分析
将特征交叉的方式由內积变为外积,边可得到PNN的另一种形式OPNN。
定义,则:
分析可知,OPNN时空复杂度均为。
为了进行计算优化,引入叠加的概念(sum pooling)。将 p 的计算公式重新定义为:
,则:
其中f_{\Sigma}的时间复杂度为,的时空复杂度均为,的时空复杂度均为,则的时空复杂度均为,的时空复杂度均为,则最终OPNN的时空复杂度为。
三、算法实现
3.1 包的调用与参数初始化
import numpy as np
import pandas as pd
import tensorflow as tf
class PNN():
def __init__(self, feature_size, field_size, embedding_size=8, deep_layers=[32,32], deep_init_size=50, epoch=10, batch_size=256, learning_rate=0.001, random_seed=2020):
self.feature_size = feature_size
self.field_size = field_size
self.embedding_size = embedding_size
self.deep_layers = deep_layers
self.deep_init_size = deep_init_size
self.epoch = epoch
self.batch_size = batch_size
self.learning_rate = learning_rate
self.random_seed = random_seed
self._init_graph()
3.2 图的初始化与一次部分的计算
def _init_graph(self):
self.graph = tf.Graph()
with self.graph.as_default():
tf.set_random_seed(self.random_seed)
self.feat_index = tf.placeholder(tf.int32, shape=[None,None], name='feat_index')
self.feat_value = tf.placeholder(tf.float32, shape=[None,None], name='feat_value')
self.label = tf.placeholder(tf.float32, shape=[None,1], name='label')
self.train_phase = tf.placeholder(tf.bool, name='train_phase')
self.weights = self._init_weights()
# embeddings
self.embeddings = tf.nn.embedding_lookup(self.weights['feature_embeddings'], self.feat_index) # N * F * K
feat_value = tf.reshape(self.feat_value, shape=[-1, self.field_size, 1])
self.embeddings = tf.multiply(self.embeddings, feat_value)
# linear part
linear_output = []
for i in range(self.deep_init_size):
lz_i = tf.reduce_sum(tf.multiply(self.embeddings, self.weights['product-linear'][i]),axis=[1,2])
linear_output.append(tf.reshape(lz_i, shape=(-1,1))) # N * 1
self.lz = tf.concat(linear_output, axis=1) # N * init_deep_size
3.3 二次部分的计算(IPNN OPNN)
# quardatic part
# IPNN
quadratic_output = []
for i in range(self.deep_init_size):
weight = tf.reshape(self.weights['product-quadratic-inner'][i], (1,-1,1)) # 1 * F * 1
f_segma = tf.reduce_sum(tf.multiply(self.embeddings, weight), axis=1) # N * K
lp_i = tf.reshape(tf.norm(f_segma, axis=1), shape=(-1,1))
quadratic_output.append(lp_i)
self.lp = tf.concat(quadratic_output, axis=1) # N * init_deep_size
# OPNN
# quadratic_output = []
# for i in range(self.deep_init_size):
# f_sigma = tf.reduce_sum(self.embeddings, axis=1)
# p = tf.matmul(tf.expand_dims(f_sigma,2), tf.expand_dims(f_sigma,1)) # N * K * K = (N * K * 1) * (N * 1 * K)
# self.weights['product-quadratic-outer'] = tf.Variable(tf.random_normal([self.deep_init_size, self.embedding_size, self.embedding_size], 0.0, 0.01))
#
# for i in range(self.deep_init_size):
# theta = tf.multiply(p, tf.expand_dims(self.weights['product-quadratic-outer'][i], 0 )) # N * K * K
# lp_i = tf.reshape(tf.reduce_sum(theta, axis=[1,2]), shape=(-1,1)) # N * 1
# quadratic_output.append(lp_i)
#
3.4 深度网络部分
# deep layer
self.y_deep = tf.nn.relu(tf.add(tf.add(self.lz, self.lp), self.weights['product-bias']))
self.y_deep = tf.add(tf.matmul(self.y_deep, self.weights['layer_0']), self.weights['bias_0'])
self.y_deep = tf.nn.relu(self.y_deep)
self.y_deep = tf.add(tf.matmul(self.y_deep, self.weights['layer_1']), self.weights['bias_1'])
self.y_deep = tf.nn.relu(self.y_deep)
self.out = tf.add(tf.matmul(self.y_deep, self.weights['output']), self.weights['output_bias'])
self.out = tf.nn.sigmoid(self.out)
self.loss = tf.losses.log_loss(self.label, self.out)
self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, beta1=0.9, beta2=0.999, epsilon=1e-8).minimize(self.loss)
self.saver = tf.train.Saver()
init = tf.global_variables_initializer()
self.sess = tf.Session()
self.sess.run(init)
writer = tf.summary.FileWriter("D:/logs/pnn/", tf.get_default_graph())
writer.close()
3.5 权值的初始化
def _init_weights(self):
weights = dict()
# embeddings
weights['feature_embeddings'] = tf.Variable(tf.random_normal([self.feature_size,self.embedding_size], 0.0, 0.01),name='feature_embeddings')
weights['feature_bias'] = tf.Variable(tf.random_normal([self.feature_size,1],0.0,1.0), name='feature_bias')
weights['product-quadratic-inner'] = tf.Variable(tf.random_normal([self.deep_init_size, self.field_size], 0.0, 0.01))
weights['product-linear'] = tf.Variable(tf.random_normal([self.deep_init_size, self.field_size, self.embedding_size], 0.0, 0.01))
weights['product-bias'] = tf.Variable(tf.random_normal([self.deep_init_size,], 0.0, 0.01))
# deep layer
input_size = self.deep_init_size
glorot = np.sqrt(2.0 / (input_size + self.deep_layers[0]))
weights['layer_0'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(input_size, self.deep_layers[0])),dtype=np.float32)
weights['bias_0'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(1, self.deep_layers[0])), dtype=np.float32)
glorot = np.sqrt(2.0 / (self.deep_layers[0] + self.deep_layers[1]))
weights['layer_1'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(self.deep_layers[0],self.deep_layers[1])),dtype=np.float32)
weights['bias_1'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(1, self.deep_layers[1])),dtype=np.float32)
glorot = np.sqrt(2.0 / (input_size + 1))
weights['output'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(self.deep_layers[-1],1)),dtype=np.float32)
weights['output_bias'] = tf.Variable(tf.constant(0.01), dtype=np.float32)
return weights
3.6 训练函数及其补充函数
def fit_on_batch(self, Xi, Xv, y):
feed_dict = {self.feat_index:Xi,
self.feat_value:Xv,
self.label:y,
self.train_phase:True}
loss, opt = self.sess.run([self.loss, self.optimizer], feed_dict=feed_dict)
return loss
def get_batch(self, Xi, Xv, y, batch_size, index):
start = index * batch_size
end = (index+1) * batch_size
end = end if end
3.7 文件加载与序列处理
TRAIN_FILE = "Driver_Prediction_Data/train.csv"
TEST_FILE = "Driver_Prediction_Data/test.csv"
NUMERIC_COLS = [
"ps_reg_01", "ps_reg_02", "ps_reg_03",
"ps_car_12", "ps_car_13", "ps_car_14", "ps_car_15"
]
IGNORE_COLS = [
"id", "target",
"ps_calc_01", "ps_calc_02", "ps_calc_03", "ps_calc_04",
"ps_calc_05", "ps_calc_06", "ps_calc_07", "ps_calc_08",
"ps_calc_09", "ps_calc_10", "ps_calc_11", "ps_calc_12",
"ps_calc_13", "ps_calc_14",
"ps_calc_15_bin", "ps_calc_16_bin", "ps_calc_17_bin",
"ps_calc_18_bin", "ps_calc_19_bin", "ps_calc_20_bin"
]
def load_data():
train_data = pd.read_csv(TRAIN_FILE)
test_data = pd.read_csv(TEST_FILE)
data = pd.concat([train_data, test_data])
cols = [c for c in train_data.columns if c not in ['id','target']]
cols = [c for c in cols if (c not in IGNORE_COLS)]
X_train = train_data[cols].values
y_train = train_data['target'].values
X_test = test_data[cols].values
ids_test = test_data['id'].values
return data, train_data, test_data, X_train, y_train, X_test, ids_test
3.8 数据预处理函数
def split_dimensions(data):
feat_dict = {}
tc = 0
for col in data.columns:
if col in IGNORE_COLS:
continue
if col in NUMERIC_COLS:
feat_dict[col] = tc
tc += 1
else:
us = data[col].unique()
feat_dict[col] = dict(zip(us,range(tc,len(us)+tc)))
tc += len(us)
feat_dimension = tc
return feat_dict, feat_dimension
def data_parse(data, feat_dict, training=True):
if training:
y = data['target'].values.tolist()
data.drop(['id','target'], axis=1, inplace=True)
else:
ids = data['id'].values.tolist()
data.drop(['id'], axis=1, inplace=True)
index = data.copy()
for col in data.columns:
if col in IGNORE_COLS:
data.drop(col, axis=1, inplace=True)
index.drop(col, axis=1, inplace=True)
continue
if col in NUMERIC_COLS:
index[col] = feat_dict[col]
else:
index[col] = data[col].map(feat_dict[col])
data[col] = 1.
xi = index.values.tolist()
xd = data.values.tolist()
if training:
return xi, xd, y
else:
return xi, xd, ids
3.9 主函数
def main():
data, train_data, test_data, X_train, y_train, X_test, ids_test = load_data()
feat_dict, feat_dimension = split_dimensions(data)
Xi_train, Xv_train, y_train = data_parse(train_data, feat_dict, training=True)
Xi_test, Xv_test, ids_test = data_parse(test_data, feat_dict, training=False)
pnn_model = PNN(feature_size = feat_dimension,
field_size = len(Xi_train[0]),
batch_size=128,
epoch=100
)
pnn_model.fit(Xi_train, Xv_train, y_train)
if __name__ == '__main__':
main()
训练过程的损失值如下所示:
epoch: 0 loss: 0.015076467
epoch: 1 loss: 0.00010797631
epoch: 2 loss: 3.4825567e-05
epoch: 3 loss: 1.7326316e-05
epoch: 4 loss: 1.0358356e-05
epoch: 5 loss: 6.848299e-06
···
epoch: 95 loss: -1.1920928e-07
epoch: 96 loss: -1.1920928e-07
epoch: 97 loss: -1.1920928e-07
epoch: 98 loss: -1.1920928e-07
epoch: 99 loss: -1.1920928e-07
参考资料
[1]. https://www.cnblogs.com/yinzm/p/11758595.html
[2]. https://www.cnblogs.com/yinzm/p/11775948.html
[3]. Qu Y, Cai H, Ren K, et al. Product-based neural networks for user response prediction[C]//2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 2016: 1149-1154.
[4]. Zhang W, Du T, Wang J. Deep learning over multi-field categorical data[C]//European conference on information retrieval. Springer, Cham, 2016: 45-57.
日月忽其不淹兮,春与秋其代序。 ——屈原《离骚》