魔法 • 革

Python_头条推荐系统_推荐业务流实现与ABTest（4）

5.1 实时推荐业务介绍

学习目标

目标
- 无
应用
- 无

5.1.1 实时推荐逻辑

逻辑流程
- 1、后端发送推荐请求，实时推荐系统拿到请求参数
  - grpc对接
- 2、根据用户进行ABTest分流
  - ABTest实验中心，用于进行分流任务，方便测试调整不同的模型上线
- 3、推荐中心服务
  - 根据用户在ABTest分配的算法进行召回服务和排序服务读取返回结果
- 4、返回推荐结果和埋点参数封装
实时推荐的流程

ABTest与推荐中心逻辑

5.2 grpc接口对接

学习目标

目标
- 无
应用
- 无

5.2.1 头条推荐接口对接

请求参数：
- feed流推荐：用户ID，频道ID，推荐文章数量，请求推荐时间戳
- 相似文章获取：文章ID，推荐文章数量

返回参数：

feed流推荐：曝光参数，每篇文章的所有行为参数，上一条时间戳

# 埋点参数参考：
# {
#     "param": '{"action": "exposure", "userId": 1, "articleId": [1,2,3,4],  "algorithmCombine": "c1"}',
#     "recommends": [
#         {"article_id": 1, "param": {"click": "{"action": "click", "userId": "1", "articleId": 1, "algorithmCombine": 'c1'}", "collect": "", "share": "","read":""}},
#         {"article_id": 2, "param": {"click": "", "collect": "", "share": "", "read":""}},
#         {"article_id": 3, "param": {"click": "", "collect": "", "share": "", "read":""}},
#         {"article_id": 4, "param": {"click": "", "collect": "", "share": "", "read":""}}
#     ]
#     "timestamp": 1546391572
# }

相似文章获取：文章ID列表

5.2.2 简介

gRPC是由Google公司开源的高性能RPC框架。
gRPC支持多语言

gRPC原生使用C、Java、Go进行了三种实现，而C语言实现的版本进行封装后又支持C++、C#、Node、ObjC、 Python、Ruby、PHP等开发语言
gRPC支持多平台

支持的平台包括：Linux、Android、iOS、MacOS、Windows
gRPC的消息协议使用Google自家开源的Protocol Buffers协议机制（proto3）序列化
gRPC的传输使用HTTP/2标准，支持双向流和连接多路复用

使用方法

使用Protocol Buffers（proto3）的IDL接口定义语言定义接口服务，编写在文本文件（以.proto为后缀名）中。
使用protobuf编译器生成服务器和客户端使用的stub代码

在gRPC中推荐使用proto3版本。

5.2.3 代码结构

Protocol Buffers版本

Protocol Buffers文档的第一行非注释行，为版本申明，不填写的话默认为版本2。

syntax = "proto3";
或者
syntax = "proto2";

消息类型

Protocol Buffers使用message定义消息数据。在Protocol Buffers中使用的数据都是通过message消息数据封装基本类型数据或其他消息数据，对应Python中的类。

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

字段编号

消息定义中的每个字段都有唯一的编号。这些字段编号用于以消息二进制格式标识字段，并且在使用消息类型后不应更改。 请注意，1到15范围内的字段编号需要一个字节进行编码，包括字段编号和字段类型。16到2047范围内的字段编号占用两个字节。因此，您应该为非常频繁出现的消息元素保留数字1到15。请记住为将来可能添加的常用元素留出一些空间。

最小的标识号可以从1开始，最大到2^29 - 1,或 536,870,911。不可以使用其中的[19000－19999]的标识号， Protobuf协议实现中对这些进行了预留。如果非要在.proto文件中使用这些预留标识号，编译时就会报警。同样你也不能使用早期保留的标识号。

指定字段规则

消息字段可以是以下之一：

singular：格式良好的消息可以包含该字段中的零个或一个（但不超过一个）。
repeated：此字段可以在格式良好的消息中重复任意次数（包括零）。将保留重复值的顺序。对应Python的列表。
```
message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}
```
添加更多消息类型

可以在单个.proto文件中定义多个消息类型。

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

message SearchResponse {
 ...
}

安装protobuf编译器和grpc库

pip install grpcio-tools

编译生成代码

python -m grpc_tools.protoc -I. --python_out=.. --grpc_python_out=.. itcast.proto

-I表示搜索proto文件中被导入文件的目录
--python_out表示保存生成Python文件的目录，生成的文件中包含接口定义中的数据类型
--grpc_python_out表示保存生成Python文件的目录，生成的文件中包含接口定义中的服务类型

5.2.4 黑马头条推荐接口protoco协议定义

创建abtest目录，将相关接口代码放入user_reco.proto协议文件

用户刷新feed流接口
- user_recommend(User) returns (Track)
文章相似(猜你喜欢)接口
- article_recommend(Article) returns(Similar)

syntax = "proto3";

message User {

    string user_id = 1;
    int32 channel_id = 2;
    int32 article_num = 3;
    int64 time_stamp = 4;
}
// int32 ---> int64 article_id
message Article {

    int64 article_id = 1;
    int32 article_num = 2;

}

message param2 {
    string click = 1;
    string collect = 2;
    string share = 3;
    string read = 4;
}

message param1 {
    int64 article_id = 1;
    param2 params = 2;
}

message Track {
    string exposure = 1;
    repeated param1 recommends = 2;
    int64 time_stamp = 3;
}

message Similar {
    repeated int64 article_id = 1;
}

service UserRecommend {
    // feed recommend
    rpc user_recommend(User) returns (Track) {}
    rpc article_recommend(Article) returns(Similar) {}
}

通过命令生成

python -m grpc_tools.protoc -I. --python_out=. --grpc_python_out=. user_reco.proto

5.2.4 黑马头条grpc服务端编写

创建routing.py文件，填写服务端代码：

5.3 ABTest实验中心

学习目标

目标
- 无
应用
- 无

个性化推荐系统、搜索引擎、广告系统，这些系统都需要在线上不断上线，不断优化，优化之后怎么确定是好是坏。这时就需要ABTest来确定，最近想的办法、优化的算法、优化的逻辑数据是正向的，是有意义的，是提升数据效果的。

5.3.1 ABTest

有几个重要的功能

一个是ABTest实时分流服务，根据用户设备信息、用户信息进行ab分流。
实时效果分析统计，将分流后程序点击、浏览等通过hive、hadoop程序统计后，在统计平台上进行展示。

5.3.2 流量切分

A/B测试的流量切分是在Rank Server端完成的。我们根据用户ID将流量切分为多个桶（Bucket），每个桶对应一种排序策略，桶内流量将使用相应的策略进行排序。使用ID进行流量切分，是为了保证用户体验的一致性。

实验参数

from collections import namedtuple

# abtest参数信息
# ABTest参数
param = namedtuple('RecommendAlgorithm', ['COMBINE',
                                          'RECALL',
                                          'SORT',
                                          'CHANNEL',
                                          'BYPASS']
                   )

RAParam = param(
    COMBINE={
        'Algo-1': (1, [100, 101, 102, 103, 104], []),  # 首页推荐，所有召回结果读取+LR排序
        'Algo-2': (2, [100, 101, 102, 103, 104], [])  # 首页推荐，所有召回结果读取 排序
    },
    RECALL={
        100: ('cb_recall', 'als'),  # 离线模型ALS召回，recall:user:1115629498121 column=als:18
        101: ('cb_recall', 'content'),  # 离线word2vec的画像内容召回 'recall:user:5', 'content:1'
        102: ('cb_recall', 'online'),  # 在线word2vec的画像召回 'recall:user:1', 'online:1'
        103: 'new_article',  # 新文章召回 redis当中    ch:18:new
        104: 'popular_article',  # 基于用户协同召回结果 ch:18:hot
        105: ('article_similar', 'similar')  # 文章相似推荐结果 '1' 'similar:2'
    },
    SORT={
        200: 'LR',
    },
    CHANNEL=25,
    BYPASS=[
            {
                "Bucket": ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd'],
                "Strategy": "Algo-1"
            },
            {
                "BeginBucket": ['e', 'f'],
                "Strategy": "Algo-2"
            }
        ]
)

5.3.3 实验中心流量切分

哈希分桶，md5
推荐刷新逻辑(通过时间戳区分主要逻辑)
ABTest分流逻辑实现代码如下
- import hashlib
- from setting.default import DefaultConfig, RAParam

def feed_recommend(user_id, channel_id, article_num, time_stamp):
    """
    1、根据web提供的参数，进行分流
    2、找到对应的算法组合之后，去推荐中心调用不同的召回和排序服务
    3、进行埋点参数封装
    :param user_id:用户id
    :param article_num:推荐文章个数
    :return: track:埋点参数结果: 参考上面埋点参数组合
    """

    #  产品前期推荐由于较少的点击行为，所以去做 用户冷启动 + 文章冷启动
    # 用户冷启动：'推荐'频道：热门频道的召回+用户实时行为画像召回（在线的不保存画像）  'C2'组合
    #            # 其它频道：热门召回 + 新文章召回   'C1'组合
    # 定义返回参数的类
    class TempParam(object):
        user_id = -10
        channel_id = -10
        article_num = -10
        time_stamp = -10
        algo = ""

    temp = TempParam()
    temp.user_id = user_id
    temp.channel_id = channel_id
    temp.article_num = article_num
    # 请求的时间戳大小
    temp.time_stamp = time_stamp

    # 先读取缓存数据redis+待推荐hbase结果
    # 如果有返回并加上埋点参数
    # 并且写入hbase 当前推荐时间戳用户（登录和匿名）的历史推荐文章列表

    # 传入用户id为空的直接召回结果
    if temp.user_id == "":
        temp.algo = ""
        return add_track([], temp)
    # 进行分桶实现分流，制定不同的实验策略
    bucket = hashlib.md5(user_id.encode()).hexdigest()[:1]
    if bucket in RAParam.BYPASS[0]['Bucket']:
        temp.algo = RAParam.BYPASS[0]['Strategy']
    else:
        temp.algo = RAParam.BYPASS[1]['Strategy']

    # 推荐服务中心推荐结果(这里做测试)
    track = add_track([], temp)

    return track

5.5 推荐中心逻辑

学习目标

目标
- 无
应用
- 无

5.5.1 推荐中心作用

推荐中一般作为整体召回结果读取与排序模型进行排序过程的作用，主要是产生推荐结果的部分。

5.5.2 推荐目录

server目录为整个推荐中心建立的目录
- recall_service.:召回数据读取目录
- reco_centor:推荐中心逻辑代码
- redis_cache:推荐结果缓存目录

5.5.3 推荐中心刷新逻辑

根据时间戳
- 时间戳T小于HBASE历史推荐记录，则获取历史记录，返回该时间戳T上次的时间戳T-1
- 时间戳T大于HBASE历史推荐记录，则获取新推荐，则获取HBASE数据库中最近的一次时间戳
  - 如果有缓存，从缓存中拿，并且写入推荐历史表中
  - 如果没有缓存，就进行一次指定算法组合的召回结果读取，排序，然后写入待推荐wait_recommend中，其中推荐出去的放入历史推荐表中
HBASE 数据库表设计
- wait_recommend: 经过各种多路召回，排序之后的待推荐结果
  - 只要刷新一次，没有缓存，才主动收集各种召回集合一起给wait_recommend写入，所以不用设置多个版本
- history_recommend: 每次真正推荐出去给用户的历史推荐结果列表
  - 1、按照频道存储用户的历史推荐结果
  - 2、需要保留多个版本，才需要建立版本信息

create 'wait_recommend', 'channel'

put 'wait_recommend', 'reco:1', 'channel:18', [17283, 140357, 14668, 15182, 17999, 13648, 12884, 17302, 13846, 18135]
put 'wait_recommend', 'reco:1', 'channel:0', [17283, 140357, 14668, 15182, 17999, 13648, 12884, 17302, 13846, 18135]

创建一个历史hbase结果

create 'history_recommend', {NAME=>'channel', TTL=>7776000, VERSIONS=>999999}   86400
# 每次指定一个时间戳,可以达到不同版本的效果
put 'history_recommend', 'reco:his:1', 'channel:18', [17283, 140357, 14668, 15182, 17999, 13648, 12884, 17302, 13846, 18135]


# 修改的时候必须指定family名称
hbase(main):084:0> alter 'history_recommend',NAME => 'channel', TTL => '7776000'
Updating all regions with the new schema...
1/1 regions updated.
Done.
Took 2.0578 seconds

alter 'history_recommend',NAME => 'channel', VERSIONS=>999999, TTL=>7776000

放入历史数据，存在时间戳，到时候取出历史数据就是每个用户的历史时间戳可以

get "history_recommend", 'reco:his:1', {COLUMN=>'channel:18',VERSIONS=>1000, TIMESTAMP=>1546242869000}

这里与上次召回cb_recall以及history_recall有不同用处：

过滤热门和新文章等推荐过的历史记录，history_recommend存入的是真正推荐过的历史记录
history_recall只过滤召回的结果

5.5.4 feed流推荐中心逻辑

目的：根据ABTest分流之后的用户，进行制定算法的召回和排序读取
步骤：
- 1、根据时间戳进行推荐逻辑判断
- 2、读取召回结果(无实时排序)

创建特征中心类：

import os
import sys

BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, os.path.join(BASE_DIR))
import hashlib
from setting.default import RAParam
from server.utils import HBaseUtils
from server import pool
from server import recall_service
from datetime import datetime
import logging
import json

logger = logging.getLogger('recommend')


def add_track(res, temp):
    """
    封装埋点参数
    :param res: 推荐文章id列表
    :param cb: 合并参数
    :param rpc_param: rpc参数
    :return: 埋点参数
        文章列表参数
        单文章参数
    """
    # 添加埋点参数
    track = {}

    # 准备曝光参数
    # 全部字符串形式提供，在hive端不会解析问题
    _exposure = {"action": "exposure", "userId": temp.user_id, "articleId": json.dumps(res),
                 "algorithmCombine": temp.algo}

    track['param'] = json.dumps(_exposure)
    track['recommends'] = []

    # 准备其它点击参数
    for _id in res:
        # 构造字典
        _dic = {}
        _dic['article_id'] = _id
        _dic['param'] = {}

        # 准备click参数
        _p = {"action": "click", "userId": temp.user_id, "articleId": str(_id),
              "algorithmCombine": temp.algo}

        _dic['param']['click'] = json.dumps(_p)
        # 准备collect参数
        _p["action"] = 'collect'
        _dic['param']['collect'] = json.dumps(_p)
        # 准备share参数
        _p["action"] = 'share'
        _dic['param']['share'] = json.dumps(_p)
        # 准备detentionTime参数
        _p["action"] = 'read'
        _dic['param']['read'] = json.dumps(_p)

        track['recommends'].append(_dic)

    track['timestamp'] = temp.time_stamp
    return track


class RecoCenter(object):
    """推荐中心
    """
    def __init__(self):
        self.hbu = HBaseUtils(pool)
        self.recall_service = recall_service.ReadRecall()

1、增加feed_recommend_logic函数，进行时间戳逻辑判断

传入temp ABTest中的获取的参数
- 根据时间戳
  - 时间戳T小于HBASE历史推荐记录，则获取历史记录，返回该时间戳T上次的时间戳T-1
  - 时间戳T大于HBASE历史推荐记录，则获取新推荐，则获取HBASE数据库中最近的一次时间戳

获取这个用户该频道的历史结果

# 判断用请求的时间戳大小决定获取历史记录还是刷新推荐文章
        try:
            last_stamp = self.hbu.get_table_row('history_recommend', 'reco:his:{}'.format(temp.user_id).encode(),
                                                'channel:{}'.format(temp.channel_id).encode(), include_timestamp=True)[
                1]
            logger.info("{} INFO get user_id:{} channel:{} history last_stamp".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
        except Exception as e:
            logger.warning("{} WARN read history recommend exception:{}".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
            last_stamp = 0

如果历史时间戳最近的一次小于用户请求时候的时间戳，Hbase的时间戳是time.time() * 1000这个值的大小，与Web后台传入的一样类型，如果Web后台传入的不是改大小，注意修改

然后返回推荐结果以及此次请求的上一次时间戳
- 用于用户获取历史记录

if last_stamp < temp.time_stamp:
            # 1、获取缓存
            # res = redis_cache.get_reco_from_cache(temp, self.hbu)
            #
            # # 如果没有，然后走一遍算法推荐 召回+排序，同时写入到hbase待推荐结果列表
            # if not res:
            #     logger.info("{} INFO get user_id:{} channel:{} recall/sort data".
            #                 format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
            #
            #     res = self.user_reco_list(temp)

            # 2、直接拿推荐结果
            res = self.user_reco_list(temp)

            temp.time_stamp = int(last_stamp)

            track = add_track(res, temp)

如果历史时间戳大于用户请求的这次时间戳，那么就是在获取历史记录，用户请求的历史时间戳是具体某个历史记录的时间戳T，Hbase当中不能够直接用T去获取，而需要去TT>T的时间戳获取，才能拿到包含T时间的结果，并且使用get_table_cells去获取

分以下情况考虑
- 1、如果没有历史数据，返回时间戳0以及结果空列表
- 2、如果历史数据只有一条，返回这一条历史数据以及时间戳正好为请求时间戳，修改时间戳为0，表示后面请求以后就没有历史数据了(APP的行为就是翻历史记录停止了)
- 3、如果历史数据多条，返回最近的第一条历史数据，然后返回之后第二条历史数据的时间戳

        else:

            logger.info("{} INFO read user_id:{} channel:{} history recommend data".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

            try:
                row = self.hbu.get_table_cells('history_recommend',
                                               'reco:his:{}'.format(temp.user_id).encode(),
                                               'channel:{}'.format(temp.channel_id).encode(),
                                               timestamp=temp.time_stamp + 1,
                                               include_timestamp=True)
            except Exception as e:
                logger.warning("{} WARN read history recommend exception:{}".format(
                    datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
                row = []
                res = []

            # 1、如果没有历史数据，返回时间戳0以及结果空列表
            # 2、如果历史数据只有一条，返回这一条历史数据以及时间戳正好为请求时间戳，修改时间戳为0
            # 3、如果历史数据多条，返回最近一条历史数据，然后返回
            if not row:
                temp.time_stamp = 0
                res = []
            elif len(row) == 1 and row[0][1] == temp.time_stamp:
                res = eval(row[0][0])
                temp.time_stamp = 0
            elif len(row) >= 2:
                res = eval(row[0][0])
                temp.time_stamp = int(row[1][1])

            res = list(map(int, res))
            logger.info(
                "{} INFO history:{}, {}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), res, temp.time_stamp))
            track = add_track(res, temp)
            # 曝光参数设置为空
            track['param'] = ''

        return track

完整代码：

    def feed_recommend_logic(self, temp):
        """推荐流业务逻辑
        :param temp:ABTest传入的业务请求参数
        """

        # 判断用请求的时间戳大小决定获取历史记录还是刷新推荐文章
        try:
            last_stamp = self.hbu.get_table_row('history_recommend', 'reco:his:{}'.format(temp.user_id).encode(),
                                                'channel:{}'.format(temp.channel_id).encode(), include_timestamp=True)[1]
            logger.info("{} INFO get user_id:{} channel:{} history last_stamp".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
        except Exception as e:
            logger.warning("{} WARN read history recommend exception:{}".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
            last_stamp = 0

        # 如果小于，走一遍正常的推荐流程，缓存或者召回排序
        logger.info("{} INFO history last_stamp:{},temp.time_stamp:{}".
                    format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), last_stamp, temp.time_stamp))
        if last_stamp < temp.time_stamp:

            # 获取
            res = redis_cache.get_reco_from_cache(temp, self.hbu)

            # 如果没有，然后走一遍算法推荐 召回+排序，同时写入到hbase待推荐结果列表
            if not res:
                logger.info("{} INFO get user_id:{} channel:{} recall/sort data".
                            format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

                res = self.user_reco_list(temp)

            temp.time_stamp = int(last_stamp)

            track = add_track(res, temp)

        else:

            logger.info("{} INFO read user_id:{} channel:{} history recommend data".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

            try:
                row = self.hbu.get_table_cells('history_recommend',
                                          'reco:his:{}'.format(temp.user_id).encode(),
                                          'channel:{}'.format(temp.channel_id).encode(),
                                          timestamp=temp.time_stamp + 1,
                                          include_timestamp=True)
            except Exception as e:
                logger.warning("{} WARN read history recommend exception:{}".format(
                    datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
                row = []
                res = []

            # 1、如果没有历史数据，返回时间戳0以及结果空列表
            # 2、如果历史数据只有一条，返回这一条历史数据以及时间戳正好为请求时间戳，修改时间戳为0
            # 3、如果历史数据多条，返回最近一条历史数据，然后返回
            if not row:
                temp.time_stamp = 0
                res = []
            elif len(row) == 1 and row[0][1] == temp.time_stamp:
                res = eval(row[0][0])
                temp.time_stamp = 0
            elif len(row) >= 2:
                res = eval(row[0][0])
                temp.time_stamp = int(row[1][1])

            res = list(map(int, res))
            logger.info(
                "{} INFO history:{}, {}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), res, temp.time_stamp))
            track = add_track(res, temp)
            # 曝光参数设置为空
            track['param'] = ''
        return track

修改ABTest中的推荐调用

from server.reco_center import RecoCenter

# 推荐
track = RecoCenter().feed_recommend_logic(temp)

获取多路召回结果，过滤历史记录逻辑

user_reco_list

1、循环算法组合参数，遍历不同召回结果进行过滤

reco_set = []
        # 1、循环算法组合参数，遍历不同召回结果进行过滤
        for _num in RAParam.COMBINE[temp.algo][1]:
            # 进行每个召回结果的读取100,101,102,103,104
            if _num == 103:
                # 新文章召回读取
                _res = self.recall_service.read_redis_new_article(temp.channel_id)
                reco_set = list(set(reco_set).union(set(_res)))
            elif _num == 104:
                # 热门文章召回读取
                _res = self.recall_service.read_redis_hot_article(temp.channel_id)
                reco_set = list(set(reco_set).union(set(_res)))
            else:
                _res = self.recall_service.\
                    read_hbase_recall_data(RAParam.RECALL[_num][0],
                                           'recall:user:{}'.format(temp.user_id).encode(),
                                           '{}:{}'.format(RAParam.RECALL[_num][1], temp.channel_id).encode())
                # 进行合并某个协同过滤召回的结果
                reco_set = list(set(reco_set).union(set(_res)))

2、过滤当前该请求频道推荐历史结果，如果不是0频道需要过滤0频道推荐结果，防止出现
- 比如Python频道和0频道相同的推荐结果

# reco_set都是新推荐的结果，进行过滤
        history_list = []
        try:
            data = self.hbu.get_table_cells('history_recommend',
                                            'reco:his:{}'.format(temp.user_id).encode(),
                                            'channel:{}'.format(temp.channel_id).encode())
            for _ in data:
                history_list = list(set(history_list).union(set(eval(_))))

            logger.info("{} INFO filter user_id:{} channel:{} history data".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
        except Exception as e:
            logger.warning(
                "{} WARN filter history article exception:{}".format(datetime.now().
                                                                     strftime('%Y-%m-%d %H:%M:%S'), e))

        # 如果0号频道有历史记录，也需要过滤

        try:
            data = self.hbu.get_table_cells('history_recommend',
                                            'reco:his:{}'.format(temp.user_id).encode(),
                                            'channel:{}'.format(0).encode())
            for _ in data:
                history_list = list(set(history_list).union(set(eval(_))))

            logger.info("{} INFO filter user_id:{} channel:{} history data".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, 0))
        except Exception as e:
            logger.warning(
                "{} WARN filter history article exception:{}".format(datetime.now().
                                                                     strftime('%Y-%m-%d %H:%M:%S'), e))

        # 过滤操作 reco_set 与history_list进行过滤
        reco_set = list(set(reco_set).difference(set(history_list)))

3、过滤之后，推荐出去指定个数的文章列表，写入历史记录，剩下多的写入待推荐结果

# 如果没有内容，直接返回
        if not reco_set:
            return reco_set
        else:

            # 类型进行转换
            reco_set = list(map(int, reco_set))

            # 跟后端需要推荐的文章数量进行比对 article_num
            # article_num > reco_set
            if len(reco_set) <= temp.article_num:
                res = reco_set
            else:
                # 之取出推荐出去的内容
                res = reco_set[:temp.article_num]
                # 剩下的推荐结果放入wait_recommend等待下次帅新的时候直接推荐
                self.hbu.get_table_put('wait_recommend',
                                       'reco:{}'.format(temp.user_id).encode(),
                                       'channel:{}'.format(temp.channel_id).encode(),
                                       str(reco_set[temp.article_num:]).encode(),
                                       timestamp=temp.time_stamp)
                logger.info(
                    "{} INFO put user_id:{} channel:{} wait data".format(
                        datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

            # 放入历史记录表当中
            self.hbu.get_table_put('history_recommend',
                                   'reco:his:{}'.format(temp.user_id).encode(),
                                   'channel:{}'.format(temp.channel_id).encode(),
                                   str(res).encode(),
                                   timestamp=temp.time_stamp)
            # 放入历史记录日志
            logger.info(
                "{} INFO store recall/sorted user_id:{} channel:{} history_recommend data".format(
                    datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

            return res

修改调用读取召回数据的部分

# 2、不开启缓存
res = self.user_reco_list(temp)
temp.time_stamp = int(last_stamp)
track = add_track(res, temp)

运行grpc服务之后，测试结果

hbase(main):007:0> get "history_recommend", 'reco:his:1115629498121846784', {COLUMN=>'channel:18',VERSIONS=>1000}
COLUMN                     CELL                                                                        
 channel:18                timestamp=1558189615378, value=[13890, 14915, 13891, 15429, 15944, 44371, 18
                           005, 15196, 13410, 13672]                                                   
 channel:18                timestamp=1558189317342, value=[17966, 17454, 14125, 16174, 14899, 44339, 16
                           437, 18743, 44090, 18238]                                                   
 channel:18                timestamp=1558143073173, value=[19200, 17665, 16151, 16411, 19233, 13090, 15
                           140, 16421, 19494, 14381]

待推荐表中有

hbase(main):008:0> scan 'wait_recommend'
ROW                        COLUMN+CELL                                                                 
 reco:1115629498121846784  column=channel:18, timestamp=1558189615378, value=[44137, 18795, 19052, 4465
                           2, 44654, 44657, 14961, 17522, 43894, 44412, 16000, 14208, 44419, 17802, 142
                           23, 18836, 140956, 18335, 13728, 14498, 44451, 44456, 18609, 18353, 44468, 1
                           8103, 135869, 16062, 14015, 13757, 13249, 44483, 17605, 14021, 15309, 18127,
                            43983, 44754, 43986, 19413, 14805, 18904, 44761, 17114, 13272, 14810, 18907
                           , 13022, 14300, 17120, 17632, 14299, 43997, 17889, 17385, 18156, 15085, 1329
                           5, 44020, 14839, 44024, 14585, 18172, 44541]

完整代码：

    def user_reco_list(self, temp):
        """
        获取用户的召回结果进行推荐
        :param temp:
        :return:
        """
        reco_set = []
        # 1、循环算法组合参数，遍历不同召回结果进行过滤
        for _num in RAParam.COMBINE[temp.algo][1]:
            # 进行每个召回结果的读取100,101,102,103,104
            if _num == 103:
                # 新文章召回读取
                _res = self.recall_service.read_redis_new_article(temp.channel_id)
                reco_set = list(set(reco_set).union(set(_res)))
            elif _num == 104:
                # 热门文章召回读取
                _res = self.recall_service.read_redis_hot_article(temp.channel_id)
                reco_set = list(set(reco_set).union(set(_res)))
            else:
                _res = self.recall_service.\
                    read_hbase_recall_data(RAParam.RECALL[_num][0],
                                           'recall:user:{}'.format(temp.user_id).encode(),
                                           '{}:{}'.format(RAParam.RECALL[_num][1], temp.channel_id).encode())
                # 进行合并某个协同过滤召回的结果
                reco_set = list(set(reco_set).union(set(_res)))

        # reco_set都是新推荐的结果，进行过滤
        history_list = []
        try:
            data = self.hbu.get_table_cells('history_recommend',
                                            'reco:his:{}'.format(temp.user_id).encode(),
                                            'channel:{}'.format(temp.channel_id).encode())
            for _ in data:
                history_list = list(set(history_list).union(set(eval(_))))

            logger.info("{} INFO filter user_id:{} channel:{} history data".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
        except Exception as e:
            logger.warning(
                "{} WARN filter history article exception:{}".format(datetime.now().
                                                                     strftime('%Y-%m-%d %H:%M:%S'), e))

        # 如果0号频道有历史记录，也需要过滤

        try:
            data = self.hbu.get_table_cells('history_recommend',
                                            'reco:his:{}'.format(temp.user_id).encode(),
                                            'channel:{}'.format(0).encode())
            for _ in data:
                history_list = list(set(history_list).union(set(eval(_))))

            logger.info("{} INFO filter user_id:{} channel:{} history data".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, 0))
        except Exception as e:
            logger.warning(
                "{} WARN filter history article exception:{}".format(datetime.now().
                                                                     strftime('%Y-%m-%d %H:%M:%S'), e))

        # 过滤操作 reco_set 与history_list进行过滤
        reco_set = list(set(reco_set).difference(set(history_list)))

        # 排序代码逻辑
        # _sort_num = RAParam.COMBINE[temp.algo][2][0]
        # reco_set = sort_dict[RAParam.SORT[_sort_num]](reco_set, temp, self.hbu)

        # 如果没有内容，直接返回
        if not reco_set:
            return reco_set
        else:

            # 类型进行转换
            reco_set = list(map(int, reco_set))

            # 跟后端需要推荐的文章数量进行比对 article_num
            # article_num > reco_set
            if len(reco_set) <= temp.article_num:
                res = reco_set
            else:
                # 之取出推荐出去的内容
                res = reco_set[:temp.article_num]
                # 剩下的推荐结果放入wait_recommend等待下次帅新的时候直接推荐
                self.hbu.get_table_put('wait_recommend',
                                       'reco:{}'.format(temp.user_id).encode(),
                                       'channel:{}'.format(temp.channel_id).encode(),
                                       str(reco_set[temp.article_num:]).encode(),
                                       timestamp=temp.time_stamp)
                logger.info(
                    "{} INFO put user_id:{} channel:{} wait data".format(
                        datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

            # 放入历史记录表当中
            self.hbu.get_table_put('history_recommend',
                                   'reco:his:{}'.format(temp.user_id).encode(),
                                   'channel:{}'.format(temp.channel_id).encode(),
                                   str(res).encode(),
                                   timestamp=temp.time_stamp)
            # 放入历史记录日志
            logger.info(
                "{} INFO store recall/sorted user_id:{} channel:{} history_recommend data".format(
                    datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

            return res

5.4 召回集读取服务

学习目标

目标
- 无
应用
- 无

5.4.1 召回集读取服务

添加一个server的目录
- 会添加推荐中心，召回读取服务，模型排序服务，缓存服务
- 这里先添加一个召回集的结果读取服务recall_service.py
- utils.py中装有自己封装的hbase数据库读取存储工具

5.4.1 Hbase读取存储等工具类封装

为什么封装?

在写happybase代码的时候会有过多的重复代码，将这些封装成简便的工具，减少代码冗余

包含方法
- get_table_row(self, table_name, key_format, column_format=None, include_timestamp=False):
  - 获取具体表中的键、列族中的行数据
- get_table_cells(self, table_name, key_format, column_format=None, timestamp=None, include_timestamp=False):
  - 获取Hbase中多个版本数据
- get_table_put(self, table_name, key_format, column_format, data, timestamp=None):
  - 存储数据到Hbase当中
- get_table_delete(self, table_name, key_format, column_format):
  - 删除Hbase中的数据

class HBaseUtils(object):
    """HBase数据库读取工具类
    """
    def __init__(self, connection):
        self.pool = connection

    def get_table_row(self, table_name, key_format, column_format=None, include_timestamp=False):
        """
        获取HBase数据库中的行记录数据
        :param table_name: 表名
        :param key_format: key格式字符串, 如表的'user:reco:1', 类型为bytes
        :param column_format: column, 列族字符串,如表的 column 'als:18',类型为bytes
        :param include_timestamp: 是否包含时间戳
        :return: 返回数据库结果data
        """
        if not isinstance(key_format, bytes):
            raise KeyError("key_format or column type error")

        if not isinstance(table_name, str):
            raise KeyError("table_name should str type")

        with self.pool.connection() as conn:
            table = conn.table(table_name)

            if column_format:
                data = table.row(row=key_format, columns=[column_format], include_timestamp=include_timestamp)
            else:
                data = table.row(row=key_format)
            conn.close()

        if column_format:
            return data[column_format]
        else:
            # {b'als:5': (b'[141440]', 1555519429582)}
            # {b'als:5': '[141440]'}
            return data

    def get_table_cells(self, table_name, key_format, column_format=None, timestamp=None, include_timestamp=False):
        """
        获取HBase数据库中多个版本数据
        :param table_name: 表名
        :param key_format: key格式字符串, 如表的'user:reco:1', 类型为bytes
        :param column_format: column, 列族字符串,如表的 column 'als:18',类型为bytes
        :param timestamp: 指定小于该时间戳的数据
        :param include_timestamp: 是否包含时间戳
        :return: 返回数据库结果data
        """
        if not isinstance(key_format, bytes) or not isinstance(column_format, bytes):
            raise KeyError("key_format or column type error")

        if not isinstance(table_name, str):
            raise KeyError("table_name should str type")

        with self.pool.connection() as conn:
            table = conn.table(table_name)

            data = table.cells(row=key_format, column=column_format, timestamp=timestamp,
                               include_timestamp=include_timestamp)

            conn.close()
        # [(,), ()]
        return data

    def get_table_put(self, table_name, key_format, column_format, data, timestamp=None):
        """

        :param table_name: 表名
        :param key_format: key格式字符串, 如表的'user:reco:1', 类型为bytes
        :param column_format: column, 列族字符串,如表的 column 'als:18',类型为bytes
        :param data: 插入的数据
        :param timestamp: 指定拆入数据的时间戳
        :return: None
        """
        if not isinstance(key_format, bytes) or not isinstance(column_format, bytes) or not isinstance(data, bytes):
            raise KeyError("key_format or column or data type error")

        if not isinstance(table_name, str):
            raise KeyError("table_name should str type")

        with self.pool.connection() as conn:
            table = conn.table(table_name)

            table.put(key_format, {column_format: data}, timestamp=timestamp)

            conn.close()
        return None

    def get_table_delete(self, table_name, key_format, column_format):
        """
        删除列族中的内容
        :param table_name: 表名称
        :param key_format: key
        :param column_format: 列格式
        :return:
        """
        if not isinstance(key_format, bytes) or not isinstance(column_format, bytes):
            raise KeyError("key_format or column type error")

        if not isinstance(table_name, str):
            raise KeyError("table_name should str type")
        with self.pool.connection() as conn:
            table = conn.table(table_name)
            table.delete(row=key_format, columns=[column_format])
            conn.close()
        return None

5.4.2 多路召回结果读取

目的：读取离线和在线存储的召回结果
- hbase的存储：cb_recall, als, content, online
步骤：
- 1、初始化redis,hbase相关工具
- 2、在线画像召回，离线画像召回，离线协同召回数据的读取
- 3、redis新文章和热门文章结果读取
- 4、相似文章读取接口

初始化redis,hbase相关工具

import os
import sys
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, os.path.join(BASE_DIR))

from server import redis_client
from server import pool
import logging
from datetime import datetime
from server.utils import HBaseUtils

logger = logging.getLogger('recommend')


class ReadRecall(object):
    """读取召回集的结果
    """
    def __init__(self):
        self.client = redis_client
        self.hbu = HBaseUtils(pool)

并且添加了获取结果打印日志设置

# 实施推荐日志
# 离线处理更新打印日志
trace_file_handler = logging.FileHandler(
  os.path.join(logging_file_dir, 'recommend.log')
)
trace_file_handler.setFormatter(logging.Formatter('%(message)s'))
log_trace = logging.getLogger('recommend')
log_trace.addHandler(trace_file_handler)
log_trace.setLevel(logging.INFO)

在init文件中添加相关初始化数据库变量

import redis
import happybase
from setting.default import DefaultConfig
from pyspark import SparkConf
from pyspark.sql import SparkSession


pool = happybase.ConnectionPool(size=10, host="hadoop-master", port=9090)

redis_client = redis.StrictRedis(host=DefaultConfig.REDIS_HOST,
                                 port=DefaultConfig.REDIS_PORT,
                                 db=10,
                                 decode_responses=True)

# 缓存在8号当中
cache_client = redis.StrictRedis(host=DefaultConfig.REDIS_HOST,
                                 port=DefaultConfig.REDIS_PORT,
                                 db=8,
                                 decode_responses=True)

2、在线画像召回，离线画像召回，离线协同召回数据的读取

读取用户的指定列族的召回数据，并且读取之后要删除原来的推荐召回结果'cb_recall'

    def read_hbase_recall_data(self, table_name, key_format, column_format):
        """
        读取cb_recall当中的推荐数据
        读取的时候可以选择列族进行读取als, online, content

        :return:
        """
        recall_list = []
        try:
            data = self.hbu.get_table_cells(table_name, key_format, column_format)

            # data是多个版本的推荐结果[[],[],[],]
            for _ in data:
                recall_list = list(set(recall_list).union(set(eval(_))))

            # self.hbu.get_table_delete(table_name, key_format, column_format)
        except Exception as e:
            logger.warning("{} WARN read {} recall exception:{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                                                                     table_name, e))
        return recall_list

测试：

if __name__ == '__main__':
    rr = ReadRecall()
    # 召回结果的读取封装
    # print(rr.read_hbase_recall_data('cb_recall', b'recall:user:1114864874141253632', b'online:18'))

3、redis新文章和热门文章结果读取

    def read_redis_new_article(self, channel_id):
        """
        读取新闻章召回结果
        :param channel_id: 提供频道
        :return:
        """
        logger.warning("{} WARN read channel {} redis new article".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                                                                 channel_id))
        _key = "ch:{}:new".format(channel_id)
        try:
            res = self.client.zrevrange(_key, 0, -1)
        except Exception as e:
            logger.warning("{} WARN read new article exception:{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
            res = []

        return list(map(int, res))

热门文章读取:热门文章记录了很多，可以选取前K个

    def read_redis_hot_article(self, channel_id):
        """
        读取新闻章召回结果
        :param channel_id: 提供频道
        :return:
        """
        logger.warning("{} WARN read channel {} redis hot article".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), channel_id))
        _key = "ch:{}:hot".format(channel_id)
        try:
            res = self.client.zrevrange(_key, 0, -1)

        except Exception as e:
            logger.warning("{} WARN read new article exception:{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
            res = []

        # 由于每个频道的热门文章有很多，因为保留文章点击次数
        res = list(map(int, res))
        if len(res) > self.hot_num:
            res = res[:self.hot_num]
        return res

测试：

print(rr.read_redis_new_article(18))
print(rr.read_redis_hot_article(18))

4、相似文章读取接口

最后相似文章读取接口代码

会有接口获取固定的文章数量(用在黑马头条APP中的猜你喜欢接口)

    def read_hbase_article_similar(self, table_name, key_format, article_num):
        """获取文章相似结果
        :param article_id: 文章id
        :param article_num: 文章数量
        :return:
        """
        # 第一种表结构方式测试：
        # create 'article_similar', 'similar'
        # put 'article_similar', '1', 'similar:1', 0.2
        # put 'article_similar', '1', 'similar:2', 0.34
        try:
            _dic = self.hbu.get_table_row(table_name, key_format)

            res = []
            _srt = sorted(_dic.items(), key=lambda obj: obj[1], reverse=True)
            if len(_srt) > article_num:
                _srt = _srt[:article_num]
            for _ in _srt:
                res.append(int(_[0].decode().split(':')[1]))
        except Exception as e:
            logger.error(
                "{} ERROR read similar article exception: {}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
            res = []
        return res

完整代码：

import os
import sys
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.insert(0, os.path.join(BASE_DIR))

from server import redis_client
from server import pool
import logging
from datetime import datetime
from abtest.utils import HBaseUtils

logger = logging.getLogger('recommend')


class ReadRecall(object):
    """读取召回集的结果
    """
    def __init__(self):
        self.client = redis_client
        self.hbu = HBaseUtils(pool)

    def read_hbase_recall_data(self, table_name, key_format, column_format):
        """获取指定用户的对应频道的召回结果,在线画像召回，离线画像召回，离线协同召回
        :return:
        """
        # 获取family对应的值
        # 数据库中的键都是bytes类型，所以需要进行编码相加
        # 读取召回结果多个版本合并
        recall_list = []
        try:

            data = self.hbu.get_table_cells(table_name, key_format, column_format)
            for _ in data:
                recall_list = list(set(recall_list).union(set(eval(_))))

            # 读取所有这个用户的在线推荐的版本，清空该频道的数据
            # self.hbu.get_table_delete(table_name, key_format, column_format)
        except Exception as e:
            logger.warning(
                "{} WARN read recall data exception:{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
        return recall_list

    def read_redis_new_data(self, channel_id):
        """获取redis新文章结果
        :param channel_id:
        :return:
        """
        # format结果
        logger.info("{} INFO read channel:{} new recommend data".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), channel_id))
        _key = "ch:{}:new".format(channel_id)
        try:
            res = self.client.zrevrange(_key, 0, -1)
        except redis.exceptions.ResponseError as e:
            logger.warning("{} WARN read new article exception:{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
            res = []
        return list(map(int, res))

    def read_redis_hot_data(self, channel_id):
        """获取redis热门文章结果
        :param channel_id:
        :return:
        """
        # format结果
        logger.info("{} INFO read channel:{} hot recommend data".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), channel_id))
        _key = "ch:{}:hot".format(channel_id)
        try:
            _res = self.client.zrevrange(_key, 0, -1)
        except redis.exceptions.ResponseError as e:
            logger.warning("{} WARN read hot article exception:{}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
            _res = []
        # 每次返回前50热门文章
        res = list(map(int, _res))
        if len(res) > 50:
            res = res[:50]
        return res

    def read_hbase_article_similar(self, table_name, key_format, article_num):
        """获取文章相似结果
        :param article_id: 文章id
        :param article_num: 文章数量
        :return:
        """
        # 第一种表结构方式测试：
        # create 'article_similar', 'similar'
        # put 'article_similar', '1', 'similar:1', 0.2
        # put 'article_similar', '1', 'similar:2', 0.34
        try:
            _dic = self.hbu.get_table_row(table_name, key_format)

            res = []
            _srt = sorted(_dic.items(), key=lambda obj: obj[1], reverse=True)
            if len(_srt) > article_num:
                _srt = _srt[:article_num]
            for _ in _srt:
                res.append(int(_[0].decode().split(':')[1]))
        except Exception as e:
            logger.error("{} ERROR read similar article exception: {}".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), e))
            res = []
        return res


if __name__ == '__main__':

    rr = ReadRecall()
    print(rr.read_hbase_article_similar('article_similar', b'13342', 10))
    print(rr.read_hbase_recall_data('cb_recall', b'recall:user:1115629498121846784', b'als:18'))

    # rr = ReadRecall()
    # print(rr.read_redis_new_data(18))

5.6 推荐缓存服务

学习目标

目标
- 无
应用
- 无

5.6.1 待推荐结果的redis缓存

目的：对待推荐结果进行二级缓存，多级缓存减少数据库读取压力
步骤：
- 1、获取redis结果，进行判断
  - 如果redis有，读取需要推荐的文章数量放回，并删除这些文章，并且放入推荐历史推荐结果中
  - 如果redis当中不存在，则从wait_recommend中读取
    - 如果wait_recommend中也没有，直接返回
    - 如果wait_recommend有，从wait_recommend取出所有结果，定一个数量(如100篇)存入redis,剩下放回wait_recommend,不够100，全部放入redis，然后清空wait_recommend
    - 从redis中拿出要推荐的文章结果，然后放入历史推荐结果中

增加一个缓存数据库

# 缓存在8号当中
cache_client = redis.StrictRedis(host=DefaultConfig.REDIS_HOST,
                                 port=DefaultConfig.REDIS_PORT,
                                 db=8,
                                 decode_responses=True)

1、redis 8 号数据库读取

# 1、直接去redis拿取对应的键，如果为空
    # 构造读redis的键
    key = 'reco:{}:{}:art'.format(temp.user_id, temp.channel_id)
    # 读取，删除，返回结果
    pl = cache_client.pipeline()

    # 拿督redis数据
    res = cache_client.zrevrange(key, 0, temp.article_num - 1)
    if res:
        # 手动删除读取出来的缓存结果
        pl.zrem(key, *res)

2、redis没有数据，进行wait_recommend读取，放入redis中

else:

        # 如果没有redis缓存数据
        # 删除键
        cache_client.delete(key)
        try:
            # 1、# - 首先从wait_recommend中读取，没有直接返回空，进去正常召回流程
            wait_cache = eval(hbu.get_table_row('wait_recommend',
                                                'reco:{}'.format(temp.user_id).encode(),
                                                'channel:{}'.format(temp.channel_id).encode()))
        except Exception as e:
            logger.warning("{} WARN read user_id:{} wait_recommend exception:{} not exist".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, e))
            wait_cache = []

        if not wait_cache:
            return wait_cache
        # 2、- 首先从wait_recommend中读取，有数据，读取出来放入自定义100个文章到redis当中，如有剩余放回到wait_recommend。小于自定义100，全部放入redis，wait_recommend直接清空
        # - 直接取出被推荐的结果，记录一下到历史记录当中
        # 假设是放入到redis当中为100个数据

        if len(wait_cache) > 100:
            logger.info(
                "{} INFO reduce user_id:{} channel:{} wait_recommend data".format(
                    datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

            cache_redis = wait_cache[:100]
            # 前100个数据放入redis
            pl.zadd(key, dict(zip(cache_redis, range(len(cache_redis)))))

            # 100个后面的数据，在放回wait_recommend
            hbu.get_table_put('wait_recommend',
                              'reco:{}'.format(temp.user_id).encode(),
                              'channel:{}'.format(temp.channel_id).encode(),
                              str(wait_cache[100:]).encode())

        else:
            logger.info(
                "{} INFO delete user_id:{} channel:{} wait_recommend data".format(
                    datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
            # 清空wait_recommend数据
            hbu.get_table_put('wait_recommend',
                              'reco:{}'.format(temp.user_id).encode(),
                              'channel:{}'.format(temp.channel_id).encode(),
                               str([]).encode())

            # 所有不足100个数据，放入redis
            pl.zadd(key, dict(zip(wait_cache, range(len(wait_cache)))))

        res = cache_client.zrange(key, 0, temp.article_num - 1)

3、推荐出去的结果放入历史结果

 # redis初始有无数据
    pl.execute()

    # 进行类型转换
    res = list(map(int, res))

    logger.info("{} INFO store user_id:{} channel:{} cache history data".format(
        datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
    # 进行推荐出去，要做放入历史推荐结果当中
    hbu.get_table_put('history_recommend',
                      'reco:his:{}'.format(temp.user_id).encode(),
                      'channel:{}'.format(temp.channel_id).encode(),
                      str(res).encode(),
                      timestamp=temp.time_stamp
                      )
    return res

完整逻辑代码：

from server import cache_client
import logging
from datetime import datetime

logger = logging.getLogger('recommend')


def get_reco_from_cache(temp, hbu):
    """读取数据库缓存
    redis: 存储在 8 号
    """
    # 1、直接去redis拿取对应的键，如果为空
    # 构造读redis的键
    key = 'reco:{}:{}:art'.format(temp.user_id, temp.channel_id)
    # 读取，删除，返回结果
    pl = cache_client.pipeline()

    # 拿督redis数据
    res = cache_client.zrevrange(key, 0, temp.article_num - 1)
    if res:
        # 手动删除读取出来的缓存结果
        pl.zrem(key, *res)
    else:

        # 如果没有redis缓存数据
        # 删除键
        cache_client.delete(key)
        try:
            # 1、# - 首先从wait_recommend中读取，没有直接返回空，进去正常召回流程
            wait_cache = eval(hbu.get_table_row('wait_recommend',
                                                'reco:{}'.format(temp.user_id).encode(),
                                                'channel:{}'.format(temp.channel_id).encode()))
        except Exception as e:
            logger.warning("{} WARN read user_id:{} wait_recommend exception:{} not exist".format(
                datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, e))
            wait_cache = []

        if not wait_cache:
            return wait_cache
        # 2、- 首先从wait_recommend中读取，有数据，读取出来放入自定义100个文章到redis当中，如有剩余放回到wait_recommend。小于自定义100，全部放入redis，wait_recommend直接清空
        # - 直接取出被推荐的结果，记录一下到历史记录当中
        # 假设是放入到redis当中为100个数据

        if len(wait_cache) > 100:
            logger.info(
                "{} INFO reduce user_id:{} channel:{} wait_recommend data".format(
                    datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

            cache_redis = wait_cache[:100]
            # 前100个数据放入redis
            pl.zadd(key, dict(zip(cache_redis, range(len(cache_redis)))))

            # 100个后面的数据，在放回wait_recommend
            hbu.get_table_put('wait_recommend',
                              'reco:{}'.format(temp.user_id).encode(),
                              'channel:{}'.format(temp.channel_id).encode(),
                              str(wait_cache[100:]).encode())

        else:
            logger.info(
                "{} INFO delete user_id:{} channel:{} wait_recommend data".format(
                    datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
            # 清空wait_recommend数据
            hbu.get_table_put('wait_recommend',
                              'reco:{}'.format(temp.user_id).encode(),
                              'channel:{}'.format(temp.channel_id).encode(),
                               str([]).encode())

            # 所有不足100个数据，放入redis
            pl.zadd(key, dict(zip(wait_cache, range(len(wait_cache)))))

        res = cache_client.zrange(key, 0, temp.article_num - 1)

    # redis初始有无数据
    pl.execute()

    # 进行类型转换
    res = list(map(int, res))

    logger.info("{} INFO store user_id:{} channel:{} cache history data".format(
        datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
    # 进行推荐出去，要做放入历史推荐结果当中
    hbu.get_table_put('history_recommend',
                      'reco:his:{}'.format(temp.user_id).encode(),
                      'channel:{}'.format(temp.channel_id).encode(),
                      str(res).encode(),
                      timestamp=temp.time_stamp
                      )
    return res

5.6.2 在推荐中心加入缓存逻辑

from server import redis_cache

# 1、获取缓存
res = redis_cache.get_reco_from_cache(temp, self.hbu)

# 如果没有，然后走一遍算法推荐 召回+排序，同时写入到hbase待推荐结果列表
 if not res:
     logger.info("{} INFO get user_id:{} channel:{} recall/sort data".
                 format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))

     res = self.user_reco_list(temp)

5.7 排序模型在线预测

学习目标

目标
- 无
应用
- 应用spark完成

5.7.1排序模型服务

提供多种不同模型排序逻辑
- SPARK LR/Tensorflow

5.7.2 排序模型在线预测

召回之后的文章结果进行排序
步骤：
- 1、读取用户特征中心特征
- 2、读取文章特征中心特征、合并用户文章特征构造预测样本
- 4、预测并进行排序是筛选

import os
import sys
# 如果当前代码文件运行测试需要加入修改路径，避免出现后导包问题
BASE_DIR = os.path.dirname(os.getcwd())
sys.path.insert(0, os.path.join(BASE_DIR))

PYSPARK_PYTHON = "/miniconda2/envs/reco_sys/bin/python"
os.environ["PYSPARK_PYTHON"] = PYSPARK_PYTHON
os.environ["PYSPARK_DRIVER_PYTHON"] = PYSPARK_PYTHON
from pyspark import SparkConf
from pyspark.sql import SparkSession
from server.utils import HBaseUtils
from server import pool
from pyspark.ml.linalg import DenseVector
from pyspark.ml.classification import LogisticRegressionModel
import pandas as pd


conf = SparkConf()
config = (
    ("spark.app.name", "sort"),
    ("spark.executor.memory", "2g"),    # 设置该app启动时占用的内存用量，默认1g
    ("spark.master", 'yarn'),
    ("spark.executor.cores", "2"),   # 设置spark executor使用的CPU核心数
)

conf.setAll(config)
spark = SparkSession.builder.config(conf=conf).getOrCreate()

1、读取用户特征中心特征

hbu = HBaseUtils(pool)
# 排序
# 1、读取用户特征中心特征
try:
    user_feature = eval(hbu.get_table_row('ctr_feature_user',
                                               '{}'.format(1115629498121846784).encode(),
                                               'channel:{}'.format(18).encode()))
except Exception as e:
    user_feature = []

2、读取文章特征中心特征，并与用户特征进行合并，构造要推荐文章的样本

合并特征向量(channel_id1个+文章向量100个+用户特征权重10个+文章关键词权重) = 121个特征

if user_feature:
    # 2、读取文章特征中心特征
    result = []

    for article_id in [17749, 17748, 44371, 44368]:
        try:
            article_feature = eval(hbu.get_table_row('ctr_feature_article',
                                                     '{}'.format(article_id).encode(),
                                                     'article:{}'.format(article_id).encode()))
        except Exception as e:
            article_feature = [0.0] * 111
        f = []
        # 第一个channel_id
        f.extend([article_feature[0]])
        # 第二个article_vector
        f.extend(article_feature[11:])
        # 第三个用户权重特征
        f.extend(user_feature)
        # 第四个文章权重特征
        f.extend(article_feature[1:11])
        vector = DenseVector(f)

        result.append([1115629498121846784, article_id, vector])

文章特征中心存的顺序

+----------+----------+--------------------+--------------------+--------------------+
|article_id|channel_id|             weights|       articlevector|            features|
+----------+----------+--------------------+--------------------+--------------------+
|        26|        17|[0.19827163395829...|[0.02069368539384...|[17.0,0.198271633...|
|        29|        17|[0.26031398249056...|[-0.1446092289546...|[17.0,0.260313982...|

最终结果：

3、处理样本格式，模型加载预测

# 4、预测并进行排序是筛选
df = pd.DataFrame(result, columns=["user_id", "article_id", "features"])
test = spark.createDataFrame(df)

# 加载逻辑回归模型
model = LogisticRegressionModel.load("hdfs://hadoop-master:9000/headlines/models/LR.obj")
predict = model.transform(test)

预测结果进行筛选

def vector_to_double(row):
    return float(row.article_id), float(row.probability[1])
res = predict.select(['article_id', 'probability']).rdd.map(vector_to_double).toDF(['article_id', 'probability']).sort('probability', ascending=False)

获取排序之后前N个文章

article_list = [i.article_id for i in res.collect()]
if len(article_list) > 100:
    article_list = article_list[:100]
reco_set = list(map(int, article_list))

5.7.3 添加实时排序的模型预测

添加spark配置

grpc启动灰将spark相关信息初始化

from pyspark import SparkConf
from pyspark.sql import SparkSession
# spark配置
conf = SparkConf()
conf.setAll(DefaultConfig.SPARK_GRPC_CONFIG)

SORT_SPARK = SparkSession.builder.config(conf=conf).getOrCreate()



# SPARK grpc配置
SPARK_GRPC_CONFIG = (
  ("spark.app.name", "grpcSort"),  # 设置启动的spark的app名称，没有提供，将随机产生一个名称
  ("spark.master", "yarn"),
  ("spark.executor.instances", 4)
)

添加模型服务预测函数

from server import SORT_SPARK
from pyspark.ml.linalg import DenseVector
from pyspark.ml.classification import LogisticRegressionModel
import pandas as pd
import numpy as np
from datetime import datetime
import logging

logger = logging.getLogger("recommend")

预测函数

def lr_sort_service(reco_set, temp, hbu):
    """
    排序返回推荐文章
    :param reco_set:召回合并过滤后的结果
    :param temp: 参数
    :param hbu: Hbase工具
    :return:
    """
    # 排序
    # 1、读取用户特征中心特征
    try:
        user_feature = eval(hbu.get_table_row('ctr_feature_user',
                                              '{}'.format(temp.user_id).encode(),
                                              'channel:{}'.format(temp.channel_id).encode()))
        logger.info("{} INFO get user user_id:{} channel:{} profile data".format(
            datetime.now().strftime('%Y-%m-%d %H:%M:%S'), temp.user_id, temp.channel_id))
    except Exception as e:
        user_feature = []

    if user_feature:
        # 2、读取文章特征中心特征
        result = []

        for article_id in reco_set:
            try:
                article_feature = eval(hbu.get_table_row('ctr_feature_article',
                                                         '{}'.format(article_id).encode(),
                                                         'article:{}'.format(article_id).encode()))
            except Exception as e:

                article_feature = [0.0] * 111
            f = []
            # 第一个channel_id
            f.extend([article_feature[0]])
            # 第二个article_vector
            f.extend(article_feature[11:])
            # 第三个用户权重特征
            f.extend(user_feature)
            # 第四个文章权重特征
            f.extend(article_feature[1:11])
            vector = DenseVector(f)
            result.append([temp.user_id, article_id, vector])

        # 4、预测并进行排序是筛选
        df = pd.DataFrame(result, columns=["user_id", "article_id", "features"])
        test = SORT_SPARK.createDataFrame(df)

        # 加载逻辑回归模型
        model = LogisticRegressionModel.load("hdfs://hadoop-master:9000/headlines/models/LR.obj")
        predict = model.transform(test)

        def vector_to_double(row):
            return float(row.article_id), float(row.probability[1])

        res = predict.select(['article_id', 'probability']).rdd.map(vector_to_double).toDF(
            ['article_id', 'probability']).sort('probability', ascending=False)
        article_list = [i.article_id for i in res.collect()]
        logger.info("{} INFO sorting user_id:{} recommend article".format(datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
                                                                          temp.user_id))
        # 排序后，只将排名在前100个文章ID返回给用户推荐
        if len(article_list) > 100:
            article_list = article_list[:100]
        reco_set = list(map(int, article_list))

    return reco_set

推荐中心加入排序

# 配置default
RAParam = param(
    COMBINE={
        'Algo-1': (1, [100, 101, 102, 103, 104], [200]),  # 首页推荐，所有召回结果读取+LR排序
        'Algo-2': (2, [100, 101, 102, 103, 104], [200])  # 首页推荐，所有召回结果读取 排序
    },

# reco_center
from server.sort_service import lr_sort_service
sort_dict = {
    'LR': lr_sort_service,
}

# 排序代码逻辑
_sort_num = RAParam.COMBINE[temp.algo][2][0]
reco_set = sort_dict[RAParam.SORT[_sort_num]](reco_set, temp, self.hbu)

5.7.4 supervisor添加grpc实时推荐程序

[program:online]
environment=JAVA_HOME=/root/bigdata/jdk,SPARK_HOME=/root/bigdata/spark,HADOOP_HOME=/root/bigdata/hadoop,PYSPARK_PYTHON=/miniconda2/envs/reco_sys/bin/python ,PYSPARK_DRIVER_PYTHON=/miniconda2/envs/reco_sys/bin/python
command=/miniconda2/envs/reco_sys/bin/python /root/toutiao_project/reco_sys/abtest/routing.py
directory=/root/toutiao_project/reco_sys/abtest
user=root
autorestart=true
redirect_stderr=true
stdout_logfile=/root/logs/recommendsuper.log
loglevel=info
stopsignal=KILL
stopasgroup=true
killasgroup=true

你可能感兴趣的:(Python_头条推荐系统_推荐业务流实现与ABTest（4）)

如何通过API用Python获取北向资金流向数据？量化问财量化软件 QMT 量化交易 Python 量化炒股 PTrade QMT 量化交易量化软件 deepseek
推荐阅读：《【最全攻略】免费的量化软件有哪些？券商的交易接口怎么获取？》如何通过API用Python获取北向资金流向数据？北向资金指的是通过沪港通和深港通渠道，从香港市场流入A股市场的资金。对于投资者来说，了解北向资金流向对于把握市场趋势和投资决策具有重要意义。本文将介绍如何通过API用Python获取北向资金流向数据。理解北向资金流向数据北向资金流向数据主要包括以下几个方面：资金流入量：指通过沪
letcode hot 100 第5题 int main* letcode热题100 leetcode 数据结构 c++算法
letcodehot100第5题题目盛最多水的容器给定一个长度为n的整数数组height。有n条垂线，第i条线的两个端点是(i,0)和(i,height[i])。找出其中的两条线，使得它们与x轴共同构成的容器可以容纳最多的水。返回容器可以储存的最大水量。说明：你不能倾斜容器。示例1：输入：[1,8,6,2,5,4,8,3,7]输出：49解释：图中垂直线代表输入数组[1,8,6,2,5,4,8,3,
likeadmin 安装与使用指南强和毓Hadley
likeadmin安装与使用指南项目地址:https://gitcode.com/gh_mirrors/li/likeadmin目录结构及介绍在克隆或下载likeadmin项目后，你会看到以下主要目录：admin:存放所有后端管理相关的代码。controller:控制器目录，负责处理HTTP请求。model:数据模型目录，用于数据库操作。service:服务层目录，提供业务逻辑。frontend:
ESP32 小智 AI 机器人入门教程从原理到实现（自己云端部署）与光同尘大道至简人工智能机器人 python 人机交互 github visual studio 单片机
此博客为一篇针对初学者的详细教程，涵盖小智AI机器人的原理、硬件准备、软件环境搭建、代码实现、云端部署以及优化扩展。文章结合了现有的网络资源，取长补短，确保内容易于理解和操作。简介：本教程将指导初学者使用ESP32微控制器开发一个简单的语音对话机器人“小智”。我们将介绍所需的基础原理、硬件准备、软件环境搭建，以及如何编写代码实现语音唤醒和与云端大模型的对接。通过本教程，即使没有深厚的AI或嵌入式经
我与DeepSeek读《大型网站技术架构》（5）下诺亚凹凸曼架构
万无一失：网站的高可用架构4.高可用的数据保证数据存储高可用的手段主要是数据备份和失效转移机制。缓存服务的高可用争论1.缓存服务需要实现高可用核心论据：避免雪崩效应：缓存宕机导致数据库瞬时压力骤增，可能引发级联故障。提升用户体验：缓存直接支撑高频访问，其不可用会导致延迟飙升、功能异常。数据热备价值：部分缓存数据（如会话信息）可能无持久化备份，丢失后影响业务连续性。适用场景：高并发实时系统：如电商秒
我与DeepSeek读《大型网站技术架构》（3）诺亚凹凸曼架构
大型网站架构的核心要素《大型网站技术架构：核心原理与案例分析》第三章聚焦于大型网站架构的核心要素，从技术维度剖析了构建高可用、高性能、可扩展系统的关键设计方向。1.五大核心架构要素(1)性能（Performance）目标：快速响应用户请求，优化用户体验。关键策略：前端优化：CDN加速静态资源、合并压缩JS/CSS、浏览器缓存。服务端优化：缓存（Redis/Memcached）、异步处理（消息队列）
WordPress建站给外贸人带来的负担小机出海建站常谈服务器 ssl https
WordPress是全球最大的开源建站平台，有着丰富的主题与插件，尽管功能非常强大，但也给想要建站的外贸人带来了一些负担。一、技术门槛与学习成本1、由于WordPress发展了几十年，里面的功能应有尽有，但往往这些复杂的功能导致建站新手对它的学习成本变得很高，需要理解各个模块与功能点，增加了上手的复杂度。2、WordPress的建站服务商他不会告诉你，你可能需要知道一些代码知识（HTML、CSS、
算法与数据结构（回文数） a_j58 数据结构
题目思路对于这个我的第一想法就是转换为字符串然后判断字符串是否为回文，它会消耗额外的地址空间。还有一种想法就是将数字反转并判断是否为回文，但可能需要处理数字溢出的问题。若要避免出现数字溢出的问题，我们可以只反转它的一半，若前半部分和后半部分相同，则说明它是一个回文数。如123321，我们将它的后半部分反转，得到123，它与前半部分相同，说明它是一个回文数。算法首先，我们可以先考虑到它的一些临界情况
Manus联创澄清：我们并未使用MCP技术耶耶Norsea 网络杂烩人工智能
摘要近日，Manus联创针对外界关于其产品可能涉及“沙盒越狱”的疑问进行了正式回应。公司明确表示并未使用Anthropic的MCP（模型上下文协议）技术，并强调MCP是一个旨在标准化应用程序与大型语言模型（LLM）之间上下文交互的开放标准。此外，Manus联创宣布了开源计划，以增强透明度和社区参与。季逸超也确认他们没有采用MCP技术，进一步澄清了相关质疑。关键词沙盒越狱,MCP技术,开源计划,透明
Shodan的概述与安装耶耶Norsea Shodan 安全 web安全 python
一、Shodan简述Shodan是一个独特的网络搜索引擎，它专门针对互联网上的设备进行不间断扫描，并将扫描结果存储起来，供用户检索。这使得Shodan能够快速搜索到网络中的各种设备和服务，例如Web服务器、路由器、摄像头、物联网设备等，甚至包括某些已知漏洞的暴露设备。Shodan的主要用途：设备搜索：通过Shodan，你可以搜索到全球范围内连接到互联网的各种设备，如企业服务器、摄像头、智能家居设备
手把手教你学Simulink实例：基于Simulink的三相桥式全控整流电路设计与仿真实例小蘑菇二号手把手教你学 MATLAB 专栏手把手教你学 Simulink 单片机嵌入式硬件 matlab simulink
目录手把手教你学Simulink实例：基于Simulink的三相桥式全控整流电路设计与仿真实例一、背景介绍二、所需工具和环境三、步骤详解步骤1：创建Simulink模型步骤1.1：打开Simulink并新建模型步骤2：添加电源模块步骤2.1：添加三相交流电源步骤3：设计三相桥式全控整流电路步骤3.1：添加可控硅模块步骤3.2：连接三相桥式全控整流电路步骤4：添加负载模块步骤4.1：添加电阻性负载步
斐波拉契数列 RichardK. c++学习
题目描述给定正整数n，求斐波那契数列的第n项F(n)。令F(n)表示斐波那契数列的第n项，它的定义是：当n=1时，F(n)=1；当n=2时，F(n)=1；当n>2时，F(n)=F(n−1)+F(n−2)。大数据版：斐波拉契数列-大数据版输入描述一个正整数n（1≤n≤104）。输出描述斐波那契数列的第n项F(n)。由于结果可能很大，因此将结果对10007取模后输出。样例1输入1输出1解释边界定义：F
JavaScript模块化开发的演进历程 IronKee JavaScript javascript 前端
写在前面的话js模块化历程记录了js模块化思想的诞生与变迁历史不是过去，历史正在上演，一切终究都会成为历史拥抱变化，面向未来延伸阅读-JavaScript诞生（这也解释了JS为何一开始没有模块化）JavaScript因为互联网而生，紧随着浏览器的出现而问世1990年底，欧洲核能研究组织（CERN）科学家Tim，发明了万维网（WorldWideWeb），最早的网页只能在操作系统的终端里浏览，非常不方
30KPA42CA双向二极管：精准电压控制，卓越性能 GR6692 二极管物联网数据库管理员 python eclipse
30KPA42CA双向TVS瞬态抑制二极管二极管产品已经跟我们的生活有着密不可分的联系了，TVS瞬态抑制二极管，是一种高效能保护二极管，产品体积小、功率大、响应快等诸多优点，产品应用广泛。TVS瞬态抑制二极管30KPA42CA，是一种二极管形式的高效能被动保护器件贴片TVS瞬态抑制二极管详情简介TVS瞬态抑制二极管30KPA42CA极性(单双向)：双向VRWM(V)电压：42V最大箝位电压@IPP
垃圾收集算法与收集器 HBryce24 JVM jvm
在JVM中，垃圾收集（GarbageCollection,GC）算法的核心目标是自动回收无用对象的内存，同时尽量减少对应用性能的影响。以下是JVM中主要垃圾收集算法的原理、流程及实际应用场景的详细介绍：一、标记-清除算法（Mark-Sweep）原理标记阶段：从GCRoots（如栈引用、静态变量）出发，遍历对象图，标记所有存活对象。清除阶段：扫描堆内存，回收未被标记的对象所占用的内存（直接释放，不整
DSP28335 ADC模块SOC触发机制详解（附完整代码） DOMINICHZL dsp 单片机嵌入式硬件
[摘要]本文基于TITMS320F28335芯片，深入讲解其ADC模块的SOC（Start-of-Conversion）触发机制，涵盖软件触发、ePWM硬件触发等模式，并提供完整的配置代码与实验验证方法。1.ADC模块与SOC概述DSP28335的ADC模块为12位精度、16通道模数转换器，支持8个独立的SOC（Start-of-Conversion）配置。每个SOC可独立配置以下参数：触发源（软
储能变流器硬件工程师能力提升路径 DOMINICHZL 硬件能源硬件工程
储能变流器（PCS，PowerConversionSystem）作为储能系统的核心部件，其硬件设计涉及电力电子、控制理论、热管理、电磁兼容（EMC）等多领域技术。以下是储能变流器行业硬件工程师需要具备的核心能力，以及技术提升的路径建议：一、储能变流器硬件工程师的核心能力电力电子基础能力拓扑设计与分析：熟悉Boost/Buck、双向DC-DC、三相逆变器、LLC谐振变换器等拓扑结构，并能根据效率、成
STM32应用(六)一阶卡尔曼滤波代码和简单应用 2401_87557129 stm32 嵌入式硬件单片机
STM32应用(五)基于输入捕获的超声波HC-SR04模块使用1.一阶卡尔曼滤波代码实现1.1Kalman滤波代码1.1.1Kalman.c文件#include"Kalman.h"voidKalman\_Init(){kfp.Last_P=1;kfp.Now_P=0;kfp.out=0;kfp.Kg=0;kfp.Q=0;kfp.R=0.01;}/\*\*\*卡尔曼滤波器\*@paramKalman
提升空间卫生，稀土抗菌剂让铺地材料更健康金士镧新材料有限公司全文检索科技生活安全
一、稀土元素的抗菌特性稀土元素包括镧系元素及其他一些具有特定化学性质的元素（如钪、钇等），这些元素具有较强的催化性和化学活性，能有效抑制细菌的生长和繁殖。稀土元素尤其是铈、钕、钬、钇等，因其在抗菌方面的特殊作用，能够有效杀灭多种常见的细菌和真菌，并能防止细菌的耐药性产生。稀土抗菌剂的抗菌抑菌机理有四个层面:1.稀土化合物与细菌表面静电结合，造成直接的杀灭；2.基于稀土的光催化半导体特性，通过光生氧
Redis缓存穿透、雪崩、击穿的解决方案 Fanxt_Ja 缓存 redis 数据库 java spring cloud intellij-idea
在大型业务系统或用户活跃量较大的环境中，用户往往对某些数据的访问量会非常大，为了保护数据库而引入了缓存Redis，但是其也会出现一些问题，而导致严重的后果，比如缓存穿透、缓存雪崩、缓存击穿，下面我将针对这几个问题给出解决方案。1.缓存穿透缓存穿透发生的原因就是“用户”访问一个缓存中不存在，数据库中也不存在的数据。当这种请求量非常大时，就会对数据库造成非常大的压力。为了解决这个问题，通常有两种解决方
针对AF调试过程中PD多窗机制是如何打分的爱写BUG的长歌人工智能计算机视觉算法
在AF（自动对焦）调试中，PD多窗机制（PhaseDetectionMulti-Window）是提升相位对焦精度和鲁棒性的关键技术，其核心是通过在画面中划分多个相位检测窗口，分别计算各窗口的相位差（PhaseDifference）并进行综合评分，最终选择最优对焦位置。以下是其打分机制的核心逻辑和调试要点：1.多窗口布局与权重分配窗口划分根据Sensor的PDAF像素分布，将画面划分为多个区域（例如
仅仅使用pytorch来手撕transformer架构(4)：解码器和解码器模块类的实现和向前传播 KangkangLoveNLP 手撕系列 #transformer pytorch transformer 人工智能深度学习 python 机器学习
仅仅使用pytorch来手撕transformer架构(4)：解码器和解码器模块类的实现和向前传播仅仅使用pytorch来手撕transformer架构(1)：位置编码的类的实现和向前传播最适合小白入门的Transformer介绍仅仅使用pytorch来手撕transformer架构(2)：多头注意力MultiHeadAttention类的实现和向前传播仅仅使用pytorch来手撕transfor
基于Java的智能家居设计：模块化智能插座的设计与实现 AGI大模型与大数据研究院 DeepSeek R1 &大数据AI人工智能 java python javascript kotlin golang 架构人工智能
智能家居,Java,模块化设计,智能插座,物联网,MQTT,RESTfulAPI1.背景介绍智能家居已成为现代生活的重要趋势，它通过将各种智能设备连接到网络，实现对家居环境的自动化控制和远程管理。智能插座作为智能家居的基础设备之一，能够远程控制电器开关，监测电器功耗，并根据用户需求实现定时开关等功能。传统的智能插座往往采用单片机或嵌入式系统，功能相对单一，难以扩展和升级。随着物联网技术的快速发展，
【30天玩转python】项目实战：从零开始开发一个Python项目爱技术的小伙子 30天玩转python linux 运维服务器
项目实战：从零开始开发一个Python项目在学习Python的过程中，开发一个完整的项目是非常重要的实战练习。它不仅能够帮助你巩固所学的知识，还能提高实际编程能力。本文将带领你从零开始开发一个Python项目，介绍从项目规划、环境搭建、代码实现到项目发布的完整过程。我们将以一个简单的“任务管理系统”为例，逐步讲解如何构建、测试和优化这个项目。1.项目规划1.1项目简介我们将开发一个基于命令行的任务
决策树的核心思想 code 旭 AI人工智能学习决策树算法机器学习
一、决策树的核心思想本质：通过特征判断对数据集递归划分，形成树形结构。目标：生成一组“若-则”规则，使数据划分到叶子节点时尽可能纯净。关键流程：特征选择：选择最佳分裂特征（如信息增益最大）。节点分裂：根据特征取值划分子节点。停止条件：节点样本纯度过高或样本数过少时终止。二、数学公式与理论1.信息熵（InformationEntropy）衡量数据集的混乱程度：H(D)=−∑k=1Kpklog⁡2pk
卡尔曼滤波算法从理论到实践：在STM32中的嵌入式实现 DOMINICHZL STM32 算法 stm32 嵌入式硬件
摘要：卡尔曼滤波（KalmanFilter）是传感器数据融合领域的经典算法，在姿态解算、导航定位等嵌入式场景中广泛应用。本文将从公式推导、代码实现、参数调试三个维度深入解析卡尔曼滤波，并给出基于STM32硬件的完整工程案例。一、卡尔曼滤波核心思想1.1什么是卡尔曼滤波？卡尔曼滤波是一种最优递归估计算法，通过融合预测值（系统模型）与观测值（传感器数据），在噪声干扰环境下实现对系统状态的动态估计。其核
IEC104协议解析上海研博数据后端
一、IEC104协议核心特性与应用场景IEC104（IEC60870-5-104）是电力系统中广泛使用的通信协议，基于TCP/IP实现主从站（SCADA与RTU/变电站设备）的实时数据交互‌。其核心功能包括：1.四遥操作‌：‌遥测‌（YC）：采集电压、电流等模拟量数据（如类型标识0x0D）‌。遥信‌（YX）：监测开关状态等数字量信号（如M_SP_NA_1单点遥信）‌。遥控‌（YK）：远程控制断路器
flink+kafka实现流数据处理学习上海研博数据 java
在应用系统的建设过程中，通常都会遇到需要实时处理数据的场景，处理实时数据的框架有很多，本文将以一个示例来介绍flink+kafka在流数据处理中的应用。1、概念介绍flink：是一个分布式、高可用、高可靠的大数据处理引擎，提供了一种高效、可靠、可扩展的方式来处理和分析实时数据。kafka：是用于构建实时数据管道和流应用程序并具有横向扩展，容错，wickedfast（变态快）等优点的一种消息中间件。
修改uview组件样式无效走，带你去玩 uni-app
在自己的components组件目录下修改uview组件样式不起效果，添加如下代码与metnods平级即可exportdefault{options:{styleIsolation:'shared'},}
Elasticsearch 入门教学：从零开始掌握分布式搜索引擎格子先生Lab 搜索引擎 elasticsearch 分布式
引言Elasticsearch是一个开源的分布式搜索引擎，基于ApacheLucene构建，能够实现近乎实时的数据搜索和分析。它广泛应用于日志分析、全文搜索、数据可视化等场景。本文将带你从零开始学习Elasticsearch，掌握其基本概念、安装配置、数据操作及搜索功能。1.Elasticsearch简介1.1什么是Elasticsearch？Elasticsearch是一个分布式的RESTful
ASM系列六利用TreeApi 添加和移除类成员 lijingyao8206 jvm 动态代理 ASM 字节码技术 TreeAPI
同生成的做法一样，添加和移除类成员只要去修改fields和methods中的元素即可。这里我们拿一个简单的类做例子，下面这个Task类，我们来移除isNeedRemove方法，并且添加一个int 类型的addedField属性。 package asm.core; /** * Created by yunshen.ljy on 2015/6/
Springmvc-权限设计 bee1314 spring Web jsp
万丈高楼平地起。权限管理对于管理系统而言已经是标配中的标配了吧，对于我等俗人更是不能免俗。同时就目前的项目状况而言，我们还不需要那么高大上的开源的解决方案，如Spring Security，Shiro。小伙伴一致决定我们还是从基本的功能迭代起来吧。目标： 1.实现权限的管理（CRUD） 2.实现部门管理（CRUD) 3.实现人员的管理（CRUD） 4.实现部门和权限
算法竞赛入门经典（第二版）第2章习题 CrazyMizzz c 算法
2.4.1 输出技巧 #include <stdio.h> int main() { int i, n; scanf("%d", &n); for (i = 1; i <= n; i++) printf("%d\n", i); return 0; } 习题2-2 水仙花数(daffodil
struts2中jsp自动跳转到Action 麦田的设计者 jsp webxml struts2 自动跳转
1、在struts2的开发中，经常需要用户点击网页后就直接跳转到一个Action，执行Action里面的方法，利用mvc分层思想执行相应操作在界面上得到动态数据。毕竟用户不可能在地址栏里输入一个Action（不是专业人士） 2、＜jsp:forward page="xxx.action" /＞，这个标签可以实现跳转，page的路径是相对地址,不同与jsp和j
php 操作webservice实例 IT独行者 PHP webservice
首先大家要简单了解了何谓webservice，接下来就做两个非常简单的例子，webservice还是逃不开server端与client端。我测试的环境为：apache2.2.11 php5.2.10做这个测试之前，要确认你的php配置文件中已经将soap扩展打开，即extension=php_soap.dll; OK 现在我们来体验webservice //server端 serve
Windows下使用Vagrant安装linux系统 _wy_ windows vagrant
准备工作：下载安装 VirtualBox ：https://www.virtualbox.org/ 下载安装 Vagrant ：http://www.vagrantup.com/ 下载需要使用的 box ：官方提供的范例：http://files.vagrantup.com/precise32.box 还可以在 http://www.vagrantbox.es/
更改linux的文件拥有者及用户组(chown和chgrp) 无量 c linux chgrp chown
本文（转） http://blog.163.com/yanenshun@126/blog/static/128388169201203011157308/ http://ydlmlh.iteye.com/blog/1435157 一、基本使用：使用chown命令可以修改文件或目录所属的用户：命令
linux下抓包工具矮蛋蛋 linux
原文地址： http://blog.chinaunix.net/uid-23670869-id-2610683.html tcpdump -nn -vv -X udp port 8888 上面命令是抓取udp包、端口为8888 netstat -tln 命令是用来查看linux的端口使用情况 13 . 列出所有的网络连接 lsof -i 14. 列出所有tcp 网络连接信息 l
我觉得mybatis是垃圾！：“每一个用mybatis的男纸，你伤不起” alafqq mybatis
最近看了每一个用mybatis的男纸，你伤不起原文地址：http://www.iteye.com/topic/1073938 发表一下个人看法。欢迎大神拍砖；个人一直使用的是Ibatis框架，公司对其进行过小小的改良；最近换了公司，要使用新的框架。听说mybatis不错；就对其进行了部分的研究；发现多了一个mapper层；个人感觉就是个dao；
解决java数据交换之谜百合不是茶数据交换
交换两个数字的方法有以下三种，其中第一种最常用 /* 输出最小的一个数 */ public class jiaohuan1 { public static void main(String[] args) { int a =4; int b = 3; if(a<b){ // 第一种交换方式 int tmep =
渐变显示 bijian1013 JavaScript
<style type="text/css"> #wxf { FILTER: progid:DXImageTransform.Microsoft.Gradient(GradientType=0, StartColorStr=#ffffff, EndColorStr=#97FF98); height: 25px; } </style>
探索JUnit4扩展：断言语法assertThat bijian1013 java 单元测试 assertThat
一.概述 JUnit 设计的目的就是有效地抓住编程人员写代码的意图，然后快速检查他们的代码是否与他们的意图相匹配。 JUnit 发展至今，版本不停的翻新，但是所有版本都一致致力于解决一个问题，那就是如何发现编程人员的代码意图，并且如何使得编程人员更加容易地表达他们的代码意图。JUnit 4.4 也是为了如何能够
【Gson三】Gson解析{"data":{"IM":["MSN","QQ","Gtalk"]}} bit1129 gson
如何把如下简单的JSON字符串反序列化为Java的POJO对象? {"data":{"IM":["MSN","QQ","Gtalk"]}} 下面的POJO类Model无法完成正确的解析： import com.google.gson.Gson;
【Kafka九】Kafka High Level API vs. Low Level API bit1129 kafka
1. Kafka提供了两种Consumer API High Level Consumer API Low Level Consumer API(Kafka诡异的称之为Simple Consumer API，实际上非常复杂) 在选用哪种Consumer API时，首先要弄清楚这两种API的工作原理，能做什么不能做什么，能做的话怎么做的以及用的时候，有哪些可能的问题
在nginx中集成lua脚本：添加自定义Http头，封IP等 ronin47 nginx lua
Lua是一个可以嵌入到Nginx配置文件中的动态脚本语言，从而可以在Nginx请求处理的任何阶段执行各种Lua代码。刚开始我们只是用Lua 把请求路由到后端服务器，但是它对我们架构的作用超出了我们的预期。下面就讲讲我们所做的工作。强制搜索引擎只索引mixlr.com Google把子域名当作完全独立的网站，我们不希望爬虫抓取子域名的页面，降低我们的Page rank。 location /{
java-归并排序 bylijinnan java
import java.util.Arrays; public class MergeSort { public static void main(String[] args) { int[] a={20,1,3,8,5,9,4,25}; mergeSort(a,0,a.length-1); System.out.println(Arrays.to
Netty源码学习-CompositeChannelBuffer bylijinnan java netty
CompositeChannelBuffer体现了Netty的“Transparent Zero Copy” 查看API（ http://docs.jboss.org/netty/3.2/api/org/jboss/netty/buffer/package-summary.html#package_description）可以看到，所谓“Transparent Zero Copy”是通
Android中给Activity添加返回键 hotsunshine Activity
// this need android:minSdkVersion="11" getActionBar().setDisplayHomeAsUpEnabled(true); @Override public boolean onOptionsItemSelected(MenuItem item) {
静态页面传参 ctrain 静态
$(document).ready(function () { var request = { QueryString : function (val) { var uri = window.location.search; var re = new RegExp("" + val + "=([^&?]*)", &
Windows中查找某个目录下的所有文件中包含某个字符串的命令 daizj windows 查找某个目录下的所有文件包含某个字符串
findstr可以完成这个工作。 [html] view plain copy >findstr /s /i "string" *.* 上面的命令表示，当前目录以及当前目录的所有子目录下的所有文件中查找"string&qu
改善程序代码质量的一些技巧 dcj3sjt126com 编程 PHP 重构
有很多理由都能说明为什么我们应该写出清晰、可读性好的程序。最重要的一点，程序你只写一次，但以后会无数次的阅读。当你第二天回头来看你的代码时，你就要开始阅读它了。当你把代码拿给其他人看时，他必须阅读你的代码。因此，在编写时多花一点时间，你会在阅读它时节省大量的时间。让我们看一些基本的编程技巧：尽量保持方法简短尽管很多人都遵
SharedPreferences对数据的存储 dcj3sjt126com
SharedPreferences简介： &nbs
linux复习笔记之bash shell (2) bash基础 eksliang bash bash shell
转载请出自出处： http://eksliang.iteye.com/blog/2104329 1.影响显示结果的语系变量（locale） 1.1locale这个命令就是查看当前系统支持多少种语系，命令使用如下： [root@localhost shell]# locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8"
Android零碎知识总结 gqdy365 android
1、CopyOnWriteArrayList add(E) 和remove(int index)都是对新的数组进行修改和新增。所以在多线程操作时不会出现java.util.ConcurrentModificationException错误。所以最后得出结论：CopyOnWriteArrayList适合使用在读操作远远大于写操作的场景里，比如缓存。发生修改时候做copy，新老版本分离，保证读的高
HoverTree.Model.ArticleSelect类的作用 hvt Web .net C#hovertree asp.net
ArticleSelect类在命名空间HoverTree.Model中可以认为是文章查询条件类，用于存放查询文章时的条件，例如HvtId就是文章的id。HvtIsShow就是文章的显示属性，当为-1是，该条件不产生作用，当为0时，查询不公开显示的文章，当为1时查询公开显示的文章。HvtIsHome则为是否在首页显示。HoverTree系统源码完全开放，开发环境为Visual Studio 2013
PHP 判断是否使用代理 PHP Proxy Detector 天梯梦 proxy
1. php 类 I found this class looking for something else actually but I remembered I needed some while ago something similar and I never found one. I'm sure it will help a lot of developers who try to
apache的math库中的回归——regression（翻译） lvdccyb Math apache
这个Math库，虽然不向weka那样专业的ML库，但是用户友好，易用。多元线性回归，协方差和相关性（皮尔逊和斯皮尔曼），分布测试（假设检验，t，卡方，G），统计。数学库中还包含，Cholesky，LU，SVD，QR，特征根分解，真不错。基本覆盖了：线代，统计，矩阵，最优化理论曲线拟合常微分方程遗传算法（GA），还有3维的运算。。。
基础数据结构和算法十三：Undirected Graphs (2) sunwinner Algorithm
Design pattern for graph processing. Since we consider a large number of graph-processing algorithms, our initial design goal is to decouple our implementations from the graph representation
云计算平台最重要的五项技术 sumapp 云计算云平台智城云
云计算平台最重要的五项技术 1、云服务器云服务器提供简单高效，处理能力可弹性伸缩的计算服务，支持国内领先的云计算技术和大规模分布存储技术，使您的系统更稳定、数据更安全、传输更快速、部署更灵活。特性机型丰富通过高性能服务器虚拟化为云服务器，提供丰富配置类型虚拟机，极大简化数据存储、数据库搭建、web服务器搭建等工作；仅需要几分钟，根据CP
《京东技术解密》有奖试读获奖名单公布 ITeye管理员活动
ITeye携手博文视点举办的12月技术图书有奖试读活动已圆满结束，非常感谢广大用户对本次活动的关注与参与。 12月试读活动回顾： http://webmaster.iteye.com/blog/2164754 本次技术图书试读活动获奖名单及相应作品如下：一等奖（两名） Microhardest：http://microhardest.ite