推荐系统(七):基于PNN的推荐算法

一、Factorization-machine supported Neural Networks (FNN)

由前几期的介绍可知因子机(FM)可以对特征进行二阶交叉。当面对海量高度稀疏的用户行为反馈数据时,二阶交叉往往是不够的,三阶、四阶甚至更高阶的组合交叉能够进一步提升模型学习能力,但是高阶的特征交叉会使复杂度大大增加。
FNN的思想比较简单,直接在FM上接入若干全连接层。利用DNN对特征进行隐式交叉,可以减轻特征工程的工作,同时也能够将计算时间复杂度控制在一个合理的范围内。为了加速模型的收敛,充分利用FM的特征表达能力,FNN采用了两阶段训练方式。首先,针对任务构建FM模型,完成模型参数的学习。然后,将FM的参数作为FNN底层参数的初始值。这种两阶段方式的应用,是为了将FM作为先验知识加入到模型中,防止因为数据稀疏带来的歧义造成模型参数偏差。

一个4层的FNN结构

其基本模型如下:
输出层,即预估的CTR值:
其中分别表示权值和偏置。
第二隐藏层:
第一隐藏层:
其中表示因子机的参数向量,,由第一层的因子机初始化:

损失函数可以采用交叉熵:

二、Product-based Neural Network(PNN)

PNN同样引入了神经网络对低阶特征进行组合,但与FNN不同,PNN并没有单纯使用全连接层来对低阶特征进行组合,而是设计了Product层对特征进行更细致的交叉运算。

PNN典型结构

输出层,即预估的CTR值:
第二隐藏层:
第一隐藏层:
其中分别表示线性信息和二次信息,PNN的核心即在于计算。
定义点积运算:

则:
\begin{array}{ll} {l_{z}=\left(l_{z}^{1}, l_{z}^{2}, \ldots, l_{z}^{n}, \ldots, l_{z}^{D_{1}}\right),} & {l_{z}^{n}=W_{z}^{n} \odot z} \\ {l_{p}=\left(l_{p}^{1}, l_{p}^{2}, \ldots, l_{p}^{n}, \ldots, l_{p}^{D_{1}}\right),} & {l_{p}^{n}=W_{p}^{n} \odot p} \end{array}

其中:

综上:

其中表示经过embedding之后的特征向量,embedding过程与FNN保持一致,同时可以看到PNN保留了低阶特征,避免了FNN仅对二阶特征进行更高阶组合的缺点。表示成对特征交叉函数,定义不同的交叉方式就有不同的PNN结构,常见的有內积运算(Inner Product-based Neural Network, IPNN)和外积运算(Outer Product-based Neural Network, OPNN)。

2.1 IPNN 分析

定义 , 则:

空间复杂度:的计算空间复杂度,的计算空间复杂度,因此,product layer 整体计算空间复杂度为。
时间复杂度:的计算时间复杂度,计算 需要 时间开销,计算 需要 时间开销,又因为 需要 时间开销,所以 计算时间复杂度为 ,因此,product layer 整体计算时间复杂度为。

时空复杂度过高不适合工程实践,所以需要进行计算优化。因为 本身计算开销不大,所以将重点在于优化 的计算,受FM的参数矩阵分解启发,由于 都是对称方阵,所以使用一阶矩阵分解,假设 。将原本参数量为 的矩阵 ,分解为了参数量为 的向量 ,则:
\begin{aligned} l_{p}^{n} &=W_{p}^{n} \odot p=\sum_{i=1}^{N} \sum_{j=1}^{N}\left(W_{p}^{n}\right)_{i, j}\left\langle f_{i}, f_{j}\right\rangle \\ &=\sum_{i=1}^{N} \sum_{j=1}^{N} \theta_{i}^{n} \theta_{j}^{n}\left\langle f_{i}, f_{j}\right\rangle \\ &=\sum_{i=1}^{N} \sum_{j=1}^{N}\left\langle\theta_{i}^{n} f_{i}, \theta_{j}^{n} f_{j}\right\rangle \\ &=\left\langle\sum_{i=1}^{N} \delta_{i}^{n} f_{i}, \sum_{j=1}^{N} \theta_{j}^{n} f_{j}\right\rangle \\ &=\left\langle\sum_{i=1}^{N} \delta_{i}^{n} \|^{2}\right.\\ &=\left\|\sum_{i=1}^{N} \delta_{i}^{n}\right\|^{2} \end{aligned}

其中,则:

优化后的时空复杂度:
空间复杂度由降为,
时间复杂度由降为。

2.2 OPNN分析

将特征交叉的方式由內积变为外积,边可得到PNN的另一种形式OPNN。
定义,则:

分析可知,OPNN时空复杂度均为。
为了进行计算优化,引入叠加的概念(sum pooling)。将 p 的计算公式重新定义为:
,则:

其中f_{\Sigma}的时间复杂度为,的时空复杂度均为,的时空复杂度均为,则的时空复杂度均为,的时空复杂度均为,则最终OPNN的时空复杂度为。

三、算法实现

3.1 包的调用与参数初始化
import numpy as np 
import pandas as pd
import tensorflow as tf

class PNN():
    def __init__(self, feature_size, field_size, embedding_size=8, deep_layers=[32,32], deep_init_size=50, epoch=10, batch_size=256, learning_rate=0.001, random_seed=2020):
        self.feature_size = feature_size
        self.field_size = field_size
        self.embedding_size = embedding_size
        self.deep_layers = deep_layers
        self.deep_init_size = deep_init_size
        self.epoch = epoch
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.random_seed = random_seed
        
        self._init_graph()
3.2 图的初始化与一次部分的计算
    def _init_graph(self):
        self.graph = tf.Graph()
        with self.graph.as_default():
            tf.set_random_seed(self.random_seed)
            
            self.feat_index = tf.placeholder(tf.int32, shape=[None,None], name='feat_index')
            self.feat_value = tf.placeholder(tf.float32, shape=[None,None], name='feat_value')
            
            self.label = tf.placeholder(tf.float32, shape=[None,1], name='label')
            self.train_phase = tf.placeholder(tf.bool, name='train_phase')
            
            self.weights = self._init_weights()
            
            # embeddings
            self.embeddings = tf.nn.embedding_lookup(self.weights['feature_embeddings'], self.feat_index) # N * F * K
            feat_value = tf.reshape(self.feat_value, shape=[-1, self.field_size, 1])
            self.embeddings = tf.multiply(self.embeddings, feat_value)
            
            # linear part
            linear_output = []
            for i in range(self.deep_init_size):
                lz_i = tf.reduce_sum(tf.multiply(self.embeddings, self.weights['product-linear'][i]),axis=[1,2])
                linear_output.append(tf.reshape(lz_i, shape=(-1,1)))    # N * 1
            self.lz = tf.concat(linear_output, axis=1)  # N * init_deep_size
3.3 二次部分的计算(IPNN OPNN)
            # quardatic part
            # IPNN
            quadratic_output = []
            for i in range(self.deep_init_size):
                weight = tf.reshape(self.weights['product-quadratic-inner'][i], (1,-1,1))  # 1 * F * 1
                f_segma = tf.reduce_sum(tf.multiply(self.embeddings, weight), axis=1) # N * K
                lp_i = tf.reshape(tf.norm(f_segma, axis=1), shape=(-1,1))
                quadratic_output.append(lp_i)
                
            self.lp = tf.concat(quadratic_output, axis=1)  # N * init_deep_size
            
            # OPNN
#            quadratic_output = []
#            for i in range(self.deep_init_size):
#                f_sigma = tf.reduce_sum(self.embeddings, axis=1)
#                p = tf.matmul(tf.expand_dims(f_sigma,2), tf.expand_dims(f_sigma,1))  # N * K * K = (N * K * 1) * (N * 1 * K)
#                self.weights['product-quadratic-outer'] = tf.Variable(tf.random_normal([self.deep_init_size, self.embedding_size, self.embedding_size], 0.0, 0.01))
#                
#                for i in range(self.deep_init_size):
#                    theta = tf.multiply(p, tf.expand_dims(self.weights['product-quadratic-outer'][i], 0 ))   # N * K * K
#                    lp_i = tf.reshape(tf.reduce_sum(theta, axis=[1,2]), shape=(-1,1)) # N * 1
#                    quadratic_output.append(lp_i)
#                    
3.4 深度网络部分
            # deep layer
            self.y_deep = tf.nn.relu(tf.add(tf.add(self.lz, self.lp), self.weights['product-bias']))
            
            self.y_deep = tf.add(tf.matmul(self.y_deep, self.weights['layer_0']), self.weights['bias_0'])
            self.y_deep = tf.nn.relu(self.y_deep)
            
            self.y_deep = tf.add(tf.matmul(self.y_deep, self.weights['layer_1']), self.weights['bias_1'])
            self.y_deep = tf.nn.relu(self.y_deep)
            
            self.out = tf.add(tf.matmul(self.y_deep, self.weights['output']), self.weights['output_bias'])
            self.out = tf.nn.sigmoid(self.out)
            
            self.loss = tf.losses.log_loss(self.label, self.out)
            self.optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate, beta1=0.9, beta2=0.999, epsilon=1e-8).minimize(self.loss)
            
            self.saver = tf.train.Saver()
            init = tf.global_variables_initializer()
            self.sess = tf.Session()
            self.sess.run(init)
            writer = tf.summary.FileWriter("D:/logs/pnn/", tf.get_default_graph())
            writer.close()
3.5 权值的初始化
    def _init_weights(self):
        weights = dict()
        
        # embeddings
        weights['feature_embeddings'] = tf.Variable(tf.random_normal([self.feature_size,self.embedding_size], 0.0, 0.01),name='feature_embeddings')
        weights['feature_bias'] = tf.Variable(tf.random_normal([self.feature_size,1],0.0,1.0), name='feature_bias')
        weights['product-quadratic-inner'] = tf.Variable(tf.random_normal([self.deep_init_size, self.field_size], 0.0, 0.01))
        
        weights['product-linear'] = tf.Variable(tf.random_normal([self.deep_init_size, self.field_size, self.embedding_size], 0.0, 0.01))
        weights['product-bias'] = tf.Variable(tf.random_normal([self.deep_init_size,], 0.0, 0.01))
        
        # deep layer
        input_size = self.deep_init_size
        glorot = np.sqrt(2.0 / (input_size + self.deep_layers[0]))
        weights['layer_0'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(input_size, self.deep_layers[0])),dtype=np.float32)
        weights['bias_0'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(1, self.deep_layers[0])), dtype=np.float32)
        
        glorot = np.sqrt(2.0 / (self.deep_layers[0] + self.deep_layers[1]))
        weights['layer_1'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(self.deep_layers[0],self.deep_layers[1])),dtype=np.float32)
        weights['bias_1'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(1, self.deep_layers[1])),dtype=np.float32)
        
        glorot = np.sqrt(2.0 / (input_size + 1))
        weights['output'] = tf.Variable(np.random.normal(loc=0, scale=glorot, size=(self.deep_layers[-1],1)),dtype=np.float32)
        weights['output_bias'] = tf.Variable(tf.constant(0.01), dtype=np.float32)
        
        return weights
3.6 训练函数及其补充函数
    def fit_on_batch(self, Xi, Xv, y):
        feed_dict = {self.feat_index:Xi,
                     self.feat_value:Xv,
                     self.label:y,
                     self.train_phase:True}
        loss, opt = self.sess.run([self.loss, self.optimizer], feed_dict=feed_dict)
        return loss
    
    def get_batch(self, Xi, Xv, y, batch_size, index):
        start = index * batch_size
        end = (index+1) * batch_size
        end = end if end
PNN网络结构1
PNN网络结构2
3.7 文件加载与序列处理
TRAIN_FILE = "Driver_Prediction_Data/train.csv"
TEST_FILE = "Driver_Prediction_Data/test.csv"

NUMERIC_COLS = [
    "ps_reg_01", "ps_reg_02", "ps_reg_03",
    "ps_car_12", "ps_car_13", "ps_car_14", "ps_car_15"
]

IGNORE_COLS = [
    "id", "target",
    "ps_calc_01", "ps_calc_02", "ps_calc_03", "ps_calc_04",
    "ps_calc_05", "ps_calc_06", "ps_calc_07", "ps_calc_08",
    "ps_calc_09", "ps_calc_10", "ps_calc_11", "ps_calc_12",
    "ps_calc_13", "ps_calc_14",
    "ps_calc_15_bin", "ps_calc_16_bin", "ps_calc_17_bin",
    "ps_calc_18_bin", "ps_calc_19_bin", "ps_calc_20_bin"
]    

def load_data():
    train_data = pd.read_csv(TRAIN_FILE)
    test_data = pd.read_csv(TEST_FILE)
    data = pd.concat([train_data, test_data])
    cols = [c for c in train_data.columns if c not in ['id','target']]
    cols = [c for c in cols if (c not in IGNORE_COLS)]
    
    X_train = train_data[cols].values
    y_train = train_data['target'].values
    
    X_test = test_data[cols].values
    ids_test = test_data['id'].values
    
    return data, train_data, test_data, X_train, y_train, X_test, ids_test    
3.8 数据预处理函数
def split_dimensions(data):
    feat_dict = {}
    tc = 0
    for col in data.columns:
        if col in IGNORE_COLS:
            continue
        if col in NUMERIC_COLS:
            feat_dict[col] = tc
            tc += 1
        else:
            us = data[col].unique()
            feat_dict[col] = dict(zip(us,range(tc,len(us)+tc)))
            tc += len(us)
    feat_dimension = tc
    
    return feat_dict, feat_dimension

def data_parse(data, feat_dict, training=True):
    if training:
        y = data['target'].values.tolist()
        data.drop(['id','target'], axis=1, inplace=True)
    else:
        ids = data['id'].values.tolist()
        data.drop(['id'], axis=1, inplace=True)
    index = data.copy()
    for col in data.columns:
        if col in IGNORE_COLS:
            data.drop(col, axis=1, inplace=True)
            index.drop(col, axis=1, inplace=True)
            continue
        if col in NUMERIC_COLS:
            index[col] = feat_dict[col]
        else:
            index[col] = data[col].map(feat_dict[col])
            data[col] = 1.
            
    xi = index.values.tolist()
    xd = data.values.tolist()
    
    if training:
        return xi, xd, y
    else:
        return xi, xd, ids
3.9 主函数
def main():
    data, train_data, test_data, X_train, y_train, X_test, ids_test = load_data()
    feat_dict, feat_dimension = split_dimensions(data)
    Xi_train, Xv_train, y_train = data_parse(train_data, feat_dict, training=True)
    Xi_test, Xv_test, ids_test = data_parse(test_data, feat_dict, training=False)
    
    pnn_model = PNN(feature_size = feat_dimension,
                     field_size = len(Xi_train[0]),
                     batch_size=128,
                     epoch=100
                     )
    
    pnn_model.fit(Xi_train, Xv_train, y_train)

if __name__ == '__main__':
    main() 

训练过程的损失值如下所示:

epoch: 0  loss: 0.015076467
epoch: 1  loss: 0.00010797631
epoch: 2  loss: 3.4825567e-05
epoch: 3  loss: 1.7326316e-05
epoch: 4  loss: 1.0358356e-05
epoch: 5  loss: 6.848299e-06
···
epoch: 95  loss: -1.1920928e-07
epoch: 96  loss: -1.1920928e-07
epoch: 97  loss: -1.1920928e-07
epoch: 98  loss: -1.1920928e-07
epoch: 99  loss: -1.1920928e-07

参考资料

[1]. https://www.cnblogs.com/yinzm/p/11758595.html
[2]. https://www.cnblogs.com/yinzm/p/11775948.html
[3]. Qu Y, Cai H, Ren K, et al. Product-based neural networks for user response prediction[C]//2016 IEEE 16th International Conference on Data Mining (ICDM). IEEE, 2016: 1149-1154.
[4]. Zhang W, Du T, Wang J. Deep learning over multi-field categorical data[C]//European conference on information retrieval. Springer, Cham, 2016: 45-57.

日月忽其不淹兮,春与秋其代序。 ——屈原《离骚》

你可能感兴趣的:(推荐系统(七):基于PNN的推荐算法)