本文主要在学习2016年双11阿里新技术中,学到的重要知识点:
1. 学习能力 + 决策能力 = 智能体系
2.
Jeff Dean , Large Scale Distribution Deep Networks
Heng-Tze Cheng ,Wide & Deep Learning for Recommender Systems
H.Brendan McMahan, Ad Click Prediction : a View from the Trenches
Xinran He,Practical Lessons from Predicting Clicks on Ads at Facebook
Streaming FTRL stacking on DeltaGBDT @ 双11
0-3点 @ 双11 4-6点 @ 双11 7-9点 @ 双11
ODPS-GBDT 训练
人工特征工程 1-3点GBDT 4-6点GBDT 7-9点GBDT
实时训练样本 数据
FTRL
3. Wide & Deep Learning for Recommender Systems
Pairwise Sampling ,曝光日志到了后,不立即产生负样本,而是等到点击后找到关联的曝光,然后把正负成交样本缓存起来。
实时决策的必要性
big data’s failure
商品、品牌、店铺、类目
Correlation(Statistics ML, DM )
Causation
Reinforcement Learning
immediate reward + feature expectation = best strategy
Historical signal == best strategy
优化马尔科夫决策过程的基础上,最大化the discounted future reward.
Illustration on RL Process
基于强化学习的实时搜索排序调控
Tabular RL RL with Function Approximation
Discrete state + Discrete action -> continuous state + discrete action + continuous state + continuous action
Maei ,H.R. Toward off policy learning control with function approximation
off-policy
异步sgd更新造成模型不稳定
Adith Swaminathan, Thorsten Joachims, Counterfactual Evaluation and Learning
Artem Grotov ,Maarten De Rijke ,Online Learning to Rank for informaiton retrieval
Katja Hofman, Lihong li,Filip Kadlinski,Online Evaluation for Information Retrieval
David Silver ,Deep Reinforcement Learning