SakamataZ

secretflow推理服务源码解读

secretflow-serving（https://github.com/secretflow/serving）是隐语提供的一套aby3的推理服务，代码量只有clickhouse的百分之一（一万行不到），但是麻雀虽小，五脏俱全，有模型加载和推理的整套流程，还结合Prometheus实现了监控服务。
secretflow-serving使用了C++17，代码也写的很清晰易懂，本文就结合它的架构解读一下它的源码，因为笔者并非机器学习专业人士，有错误之处希望读者不吝指教。

架构

secretflow-serving的整体架构，分为启动和预测服务两个阶段。启动主要是读取模型并且开启了brpc的几个rpc服务，具体可见（https://brpc.apache.org/zh/docs/server/serve-grpc/）。

启动阶段

Source Adapter: 作为模型存储适配器，负责与不同的模型存储接口（如http、dm、filesystem）进行对接。
Model Loader: 扮演模型加载器的角色，依据模型类型以其对应格式加载模型。
Executable: 模型执行框架，其接口服务于RPC，不同的模型加载器将生成不同类型的Executable。

预测阶段

Prediction Service: 作为预测RPC请求的入口。
Scheduler: 根据配置对上游请求进行批处理。
Prediction Controller: 转换预测RPC请求，组织并向各参与节点发送预测执行请求。
Execution Service: 作为模型执行RPC请求的入口。
Feature Adapter: 对接不同的特征服务SPI，负责获取特征值并应用特征映射规则（在线特征映射到输入模型的特征）。
Executable: 接收模型执行请求（executable->Run()），并返回预测分数。

启动阶段

入口函数

入口函数在https://github.com/secretflow/serving/blob/475bb3356e3a246f444fc25f62f9619874870680/secretflow_serving/server/main.cc#L46
这里用absl和gflags处理输入参数。
OpFactory 是一个Singleton，secretflow在这里实现了一个标准的Meyers’ Singleton(https://github.com/secretflow/serving/blob/475bb3356e3a246f444fc25f62f9619874870680/secretflow_serving/core/singleton.h#L20):

    {
      auto op_def_list =
          secretflow::serving::op::OpFactory::GetInstance()->GetAllOps();
      std::vector<std::string> op_names;
      std::for_each(
          op_def_list.begin(), op_def_list.end(),
          [&](const std::shared_ptr<const secretflow::serving::op::OpDef>& o) {
            op_names.emplace_back(o->name());
          });

      SPDLOG_INFO("op list: {}",
                  fmt::join(op_names.begin(), op_names.end(), ", "));
    }

OpFactory在https://github.com/secretflow/serving/blob/475bb3356e3a246f444fc25f62f9619874870680/secretflow_serving/ops/op_factory.h。

class OpFactory final : public Singleton<OpFactory> {
 public:
  void Register(const std::shared_ptr<OpDef>& op_def) {
    std::lock_guard<std::mutex> lock(mutex_);
    SERVING_ENFORCE(op_defs_.emplace(op_def->name(), op_def).second,
                    errors::ErrorCode::LOGIC_ERROR,
                    "duplicated op_def registered for {}", op_def->name());
  }

  const std::shared_ptr<OpDef> Get(const std::string& name) {
    std::lock_guard<std::mutex> lock(mutex_);
    auto iter = op_defs_.find(name);
    SERVING_ENFORCE(iter != op_defs_.end(), errors::ErrorCode::UNEXPECTED_ERROR,
                    "no op_def registered for {}", name);
    return iter->second;
  }

  std::vector<std::shared_ptr<const OpDef>> GetAllOps() {
    std::vector<std::shared_ptr<const OpDef>> result;

    std::lock_guard<std::mutex> lock(mutex_);
    for (const auto& pair : op_defs_) {
      result.emplace_back(pair.second);
    }
    return result;
  }

 private:
  std::unordered_map<std::string, std::shared_ptr<OpDef>> op_defs_;
  std::mutex mutex_;
};

我们可以看到，OP使用REGISTER_OP静态注册。并且在执行图节点被使用（见https://github.com/secretflow/serving/blob/475bb3356e3a246f444fc25f62f9619874870680/secretflow_serving/ops/node.cc#L23）

#define REGISTER_OP(op_name, version, desc)     \
  static OpRegister const regist_op_##op_name = \
      OpRegister{} << internal::OpDefBuilderWrapper(#op_name, version, desc)

之后读取了server的各种参数，然后开始启动流程。
整体代码如下：


// @hint 入口函数
int main(int argc, char* argv[]) {
    // Initialize the symbolizer to get a human-readable stack trace
    // 这里用absl和gflags处理输入参数
    absl::InitializeSymbolizer(argv[0]);

    gflags::SetVersionString(SERVING_VERSION_STRING);
    gflags::AllowCommandLineReparsing();
    gflags::ParseCommandLineFlags(&argc, &argv, true);

    try {
        // init logger
        secretflow::serving::LoggingConfig log_config;
        if (!FLAGS_logging_config_file.empty()) {
            secretflow::serving::LoadPbFromJsonFile(FLAGS_logging_config_file,
                &log_config);
        }
        secretflow::serving::SetupLogging(log_config);

        SPDLOG_INFO("version: {}", SERVING_VERSION_STRING);

        {
            // OpFactory 是一个Meyers' Singleton
            // OP使用REGISTER_OP静态注册
            auto op_def_list =
            secretflow::serving::op::OpFactory::GetInstance()->GetAllOps();
            std::vector<std::string> op_names;
            std::for_each(
                op_def_list.begin(), op_def_list.end(),
                [&](const std::shared_ptr<const secretflow::serving::op::OpDef>& o) {
                    op_names.emplace_back(o->name());
                });

            SPDLOG_INFO("op list: {}",
                fmt::join(op_names.begin(), op_names.end(), ", "));
        }

        STRING_EMPTY_VALIDATOR(FLAGS_serving_config_file);

        // init server options
        secretflow::serving::Server::Options server_opts;
        if (FLAGS_config_mode == "kuscia") {
            secretflow::serving::kuscia::KusciaConfigParser config_parser(
            FLAGS_serving_config_file);
            server_opts.server_config = config_parser.server_config();
            server_opts.cluster_config = config_parser.cluster_config();
            server_opts.model_config = config_parser.model_config();
            server_opts.feature_source_config = config_parser.feature_config();
            server_opts.service_id = config_parser.service_id();
        } else {
            secretflow::serving::ServingConfig serving_conf;
            LoadPbFromJsonFile(FLAGS_serving_config_file, &serving_conf);

            server_opts.server_config = serving_conf.server_conf();
            server_opts.cluster_config = serving_conf.cluster_conf();
            server_opts.model_config = serving_conf.model_conf();
            if (serving_conf.has_feature_source_conf()) {
                server_opts.feature_source_config = serving_conf.feature_source_conf();
            }
            server_opts.service_id = serving_conf.id();
        }
    	// 启动服务
        secretflow::serving::Server server(std::move(server_opts));
        server.Start();
    	// 运行直到brpc服务结束
        server.WaitForEnd();
    } catch (const secretflow::serving::Exception& e) {
        // TODO: custom status sink
        SPDLOG_ERROR("server startup failed, code: {}, msg: {}, stack: {}",
            e.code(), e.what(), e.stack_trace());
        return -1;
    } catch (const std::exception& e) {
        // TODO: custom status sink
        SPDLOG_ERROR("server startup failed, msg:{}", e.what());
        return -1;
    }

    return 0;
}

模型定义

在介绍启动流程之前，我们先看一下secretflow中用于推理的模型是怎么定义的：
https://github.com/secretflow/serving/blob/475bb3356e3a246f444fc25f62f9619874870680/secretflow_serving/protos/bundle.proto#L37
model_bundle是一个proto定义, 包含了完整的模型信息。
GraphDef是执行图的定义，包括了一组携带数据的节点信息(NodeDef)和一组图的执行信息(ExecutionDef)。

// Represents an exported secertflow model. It consists of a GraphDef and extra
// metadata required for serving.
message ModelBundle {
  string name = 1;

  string desc = 2;

  GraphDef graph = 3;
}

// The definition of a Graph. A graph consists of a set of nodes carrying data
// and a set of executions that describes the scheduling of the graph.
message GraphDef {
  // Version of the graph
  string version = 1;

  repeated NodeDef node_list = 2;

  repeated ExecutionDef execution_list = 3;
}

我们继续看NodeDef和ExecutionDef：


// The definition of a node.
message NodeDef {
  // Must be unique among all nodes of the graph.
  string name = 1;

  // The operator name.
  string op = 2;

  // The parent node names of the node. The order of the parent nodes should
  // match the order of the inputs of the node.
  repeated string parents = 3;
	// 节点OP的属性
  // The attribute values configed in the node. Note that this should include
  // all attrs defined in the corresponding OpDef.
  map attr_values = 4;

  // The operator version.
  string op_version = 5;
}


// The value of an attribute
message AttrValue {
  oneof value {
    // INT
    int32 i32 = 1;
    int64 i64 = 2;
    // FLOAT
    float f = 3;
    double d = 4;
    // STRING
    string s = 5;
    // BOOL
    bool b = 6;
    // BYTES
    bytes by = 7;

    // Lists

    // INTS
    Int32List i32s = 11;
    Int64List i64s = 12;
    // FLOATS
    FloatList fs = 13;
    DoubleList ds = 14;
    // STRINGS
    StringList ss = 15;
    // BOOLS
    BoolList bs = 16;
    // BYTESS
    BytesList bys = 17;
  }
}

// The definition of a execution. A execution represents a subgraph within a
// graph that can be scheduled for execution in a specified pattern.
message ExecutionDef {
  // 包含运行时配置和节点
  // Represents the nodes contained in this execution. Note that
  // these node names should be findable and unique within the node
  // definitions. One node can only exist in one execution and must exist in
  // one.
  repeated string nodes = 1;

  // The runtime config of the execution.
  RuntimeConfig config = 2;
}

启动流程

启动的代码在
https://github.com/secretflow/serving/blob/475bb3356e3a246f444fc25f62f9619874870680/secretflow_serving/server/server.cc#L58

模型文件的拉取和执行图初始化

SourceFactory也是一个Singleton，初始化之后从文件拉取模型：

/*SourceFactory 初始化*/

  // get model package
  auto source = SourceFactory::GetInstance()->Create(opts_.model_config,
                                                     opts_.service_id);
  // @hint 拉取模型， channels 初始化
  // 这一步从文件读取
  auto package_path = source->PullModel();

/*PullModel代码如下*/

std::string Source::PullModel() {
  auto dst_dir = std::filesystem::path(data_dir_).append(config_.model_id());
  if (!std::filesystem::exists(dst_dir)) {
    std::filesystem::create_directories(dst_dir);
  }

  auto dst_file_path = dst_dir.append(kModelFileName);
  const auto& source_sha256 = config_.source_sha256();
  if (std::filesystem::exists(dst_file_path)) {
    if (!source_sha256.empty()) {
      if (SysUtil::CheckSHA256(dst_file_path.string(), source_sha256)) {
        return dst_file_path;
      }
    }
    SPDLOG_INFO("remove tmp model file:{}", dst_file_path.string());
    std::filesystem::remove(dst_file_path);
  }
  // OnPullModel 从oss拉取模型
  OnPullModel(dst_file_path);
  if (!source_sha256.empty()) {
    SERVING_ENFORCE(SysUtil::CheckSHA256(dst_file_path.string(), source_sha256),
                    errors::ErrorCode::IO_ERROR,
                    "model({}) sha256 check failed", config_.source_path());
  }

  return dst_file_path;
}

然后根据参与方信息初始化rpc channel：

// build channels
  std::string self_address;
  std::vector<std::string> cluster_ids;
  // 通过channels和每一方通信
  auto channels = std::make_shared<PartyChannelMap>();
  for (const auto& party : opts_.cluster_config.parties()) {
    cluster_ids.emplace_back(party.id());
    if (party.id() == self_party_id) {
      self_address = party.listen_address().empty() ? party.address()
                                                    : party.listen_address();
      continue;
    }
    channels->emplace(
        party.id(),
        CreateBrpcChannel(
            party.address(), opts_.cluster_config.channel_desc().protocol(),
            FLAGS_enable_peers_load_balancer,
            opts_.cluster_config.channel_desc().rpc_timeout_ms() > 0
                ? opts_.cluster_config.channel_desc().rpc_timeout_ms()
                : kPeerRpcTimeoutMs,
            opts_.cluster_config.channel_desc().connect_timeout_ms() > 0
                ? opts_.cluster_config.channel_desc().connect_timeout_ms()
                : kPeerConnectTimeoutMs,
            opts_.cluster_config.channel_desc().has_tls_config()
                ? &opts_.cluster_config.channel_desc().tls_config()
                : nullptr));
  }

然后从oss拉取的文件中解压并且读取proto文件，这里我们关注load模型的过程和graph的构造函数：

  // load model package
  auto loader = std::make_unique<ModelLoader>();
  loader->Load(package_path);
  const auto& model_bundle = loader->GetModelBundle();
  Graph graph(model_bundle->graph());
// 此处model_bundle是一个proto定义, 包含了完整的模型信息
// Represents an exported secertflow model. It consists of a GraphDef and extra
// metadata required for serving.
// message ModelBundle {
//   string name = 1;
//   string desc = 2;
//   GraphDef graph = 3;
// }

/*Load方法*/

void ModelLoader::Load(const std::string& file_path) {
  SPDLOG_INFO("begin load file: {}", file_path);

  auto model_dir =
      std::filesystem::path(file_path).parent_path().append("data");
  if (std::filesystem::exists(model_dir)) {
    // remove tmp model dir
    SPDLOG_WARN("remove tmp model dir: {}", model_dir.string());
    std::filesystem::remove_all(model_dir);
  }

  // unzip package file
  try {
    SysUtil::ExtractGzippedArchive(file_path, model_dir);
  } catch (const std::exception& e) {
    std::filesystem::remove_all(file_path);
    SERVING_THROW(errors::ErrorCode::IO_ERROR,
                  "failed to extract model package {}, detail: {}", file_path,
                  e.what());
  }

  auto manifest_path =
      std::filesystem::path(model_dir).append(kManifestFileName);
  SERVING_ENFORCE(
      std::filesystem::exists(manifest_path), errors::ErrorCode::IO_ERROR,
      "can not find manifest file {}, model package file is corrupted",
      manifest_path.string());

  // load manifest
  ModelManifest manifest;
    
  // pb文件反序列化
  LoadPbFromJsonFile(manifest_path.string(), &manifest);

  auto model_file_path = model_dir.append(manifest.bundle_path());

  auto model_bundle = std::make_shared<ModelBundle>();
  if (manifest.bundle_format() == FileFormatType::FF_PB) {
    LoadPbFromBinaryFile(model_file_path.string(), model_bundle.get());
  } else if (manifest.bundle_format() == FileFormatType::FF_JSON) {
    LoadPbFromJsonFile(model_file_path.string(), model_bundle.get());
  } else {
    SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                  "found unknown bundle_format:{}",
                  FileFormatType_Name(manifest.bundle_format()));
  }
  model_bundle_ = std::move(model_bundle);

  SPDLOG_INFO("end load model bundle, name: {}, desc: {}, graph version: {}",
              model_bundle_->name(), model_bundle_->desc(),
              model_bundle_->graph().version());
}


/*Graph构造函数*/

Graph::Graph(GraphDef graph_def) : def_(std::move(graph_def)) {
  // TODO: check version

  // TODO: consider not storing def_ to avoiding multiple copies of node_defs
  // and execution_defs

  graph_view_.set_version(def_.version());
  for (auto& node : def_.node_list()) {
    NodeView view;
    *(view.mutable_name()) = node.name();
    *(view.mutable_op()) = node.op();
    *(view.mutable_op_version()) = node.op_version();
    *(view.mutable_parents()) = node.parents();
    graph_view_.mutable_node_list()->Add(std::move(view));
  }
  *(graph_view_.mutable_execution_list()) = def_.execution_list();

  // create nodes
  // 读取node，node是一个std::unordered_map> nodes_;
  for (int i = 0; i < def_.node_list_size(); ++i) {
    const auto node_name = def_.node_list(i).name();
    auto node = std::make_shared<Node>(def_.node_list(i));
    SERVING_ENFORCE(nodes_.emplace(node_name, node).second,
                    errors::ErrorCode::LOGIC_ERROR, "found duplicate node:{}",
                    node_name);
  }

  // create edges
  // 构建edge，edge只是一个顺序存放的vector
  for (const auto& [name, node] : nodes_) {
    const auto& input_nodes = node->GetInputNodeNames();
    if (input_nodes.empty()) {
      SERVING_ENFORCE(node->GetOpDef()->inputs_size() == 1,
                      errors::ErrorCode::LOGIC_ERROR,
                      "the entry op should only have one input to accept "
                      "the features, node:{}, op:{}",
                      name, node->node_def().op());
      entry_nodes_.emplace_back(node);
    }
    for (size_t i = 0; i < input_nodes.size(); ++i) {
      auto n_iter = nodes_.find(input_nodes[i]);
      SERVING_ENFORCE(n_iter != nodes_.end(), errors::ErrorCode::LOGIC_ERROR,
                      "can not found input node:{} for node:{}", input_nodes[i],
                      name);
      auto edge = std::make_shared<Edge>(n_iter->first, name, i);
      n_iter->second->AddOutEdge(edge);
      node->AddInEdge(edge);
      edges_.emplace_back(edge);
    }
  }

  // find exit node
  // exit_node 在一个执行图中只有一个
  size_t exit_node_count = 0;
  for (const auto& pair : nodes_) {
    if (pair.second->out_edges().empty()) {
      exit_node_ = pair.second;
      ++exit_node_count;
    }
  }
  SERVING_ENFORCE(!entry_nodes_.empty(), errors::ErrorCode::LOGIC_ERROR,
                  "can not found any entry node, please check graph def.");
  SERVING_ENFORCE(exit_node_count == 1, errors::ErrorCode::LOGIC_ERROR,
                  "found {} exit nodes, expect only 1 in graph",
                  exit_node_count);
  SERVING_ENFORCE(exit_node_->GetOpDef()->tag().returnable(),
                  errors::ErrorCode::LOGIC_ERROR,
                  "exit node({}) op({}) must returnable", exit_node_->GetName(),
                  exit_node_->GetOpDef()->name());

  CheckNodesReachability();
  CheckEdgeValidate();
// 这里构建了Execution
  BuildExecution();
  CheckExecutionValidate();
}

Executor的构建

模型的执行图初始化完成之后，就开始构建Executor、Executable并且用它们初始化ExecutionCore了，Executable是Executor的容器类，所以我们在这里主要看Executor的构建过程。
Executor是Execute执行的最小单位，serving预测通过Executor实现了动态调度的能力。


// 构建Executor
  std::vector<std::shared_ptr<Executor>> executors;
  for (const auto& execution : graph.GetExecutions()) {
    executors.emplace_back(std::make_shared<Executor>(execution));
  }
  ExecutionCore::Options exec_opts;
  exec_opts.id = opts_.service_id;
  exec_opts.party_id = self_party_id;

  exec_opts.executable = std::make_shared<Executable>(std::move(executors));
  if (opts_.server_config.op_exec_worker_num() > 0) {
    exec_opts.op_exec_workers_num = opts_.server_config.op_exec_worker_num();
  }

  if (!opts_.server_config.feature_mapping().empty()) {
    exec_opts.feature_mapping = {opts_.server_config.feature_mapping().begin(),
                                 opts_.server_config.feature_mapping().end()};
  }
  exec_opts.feature_source_config = opts_.feature_source_config;
// 构建ExecutionCore
  auto execution_core = std::make_shared<ExecutionCore>(std::move(exec_opts));

构建Executor的时候做了什么呢？我们继续看这里的构造函数：
首先，给每一个node 创建了一个op_kernel,和之前的op一样，op_kernel也有一个工厂类，也同样采用静态注册的方式注册op_kernel，这里注册的时候会在creators_里面放一个function用于回调。Executor执行时就会调度到op_kernel。
https://github.com/secretflow/serving/blob/475bb3356e3a246f444fc25f62f9619874870680/secretflow_serving/ops/op_kernel_factory.h#L22
Create会根据opts.op_def->name()调用不同的回调，**这里OpKernel的子类有三个：ArrowProcessing、MergeY、DotProduct，前两个都是Arrow表示的数据，后一个是Eigen表示的数据。**在构造时会构造不同的schema。


Executor::Executor(const std::shared_ptr<Execution>& execution)
    : execution_(execution) {
  // create op_kernel
  auto nodes = execution_->nodes();
  node_items_ = std::make_shared<
      std::unordered_map<std::string, std::shared_ptr<NodeItem>>>();

  for (const auto& [node_name, node] : nodes) {
    op::OpKernelOptions ctx{node->node_def(), node->GetOpDef()};

    auto item = std::make_shared<NodeItem>();
    item->node = node;
    item->op_kernel =
        op::OpKernelFactory::GetInstance()->Create(std::move(ctx));
    node_items_->emplace(node_name, item);
  }

我们以ArrowProcessing为例，看看它的构造流程。首先它调用了父类构造：


// 调用父类方法
explicit OpKernel(OpKernelOptions opts) : opts_(std::move(opts)) {
    num_inputs_ = opts_.op_def->inputs_size();
    if (opts_.op_def->tag().variable_inputs()) {
      // The actual number of inputs for op with variable parameters
      // depends on node's parents.
      num_inputs_ = opts_.node_def.parents_size();
    }
  }

然后执行 BuildInputSchema()和BuildOutputSchema()，BuildInputSchema调用了GetNodeBytesAttr，GetNodeBytesAttr是一个inline方法，用来规避ODR，也是常用的写法，一般都会建议多用inline少用static。

// 使用GetNodeBytesAttr(opts_.node_def, "input_schema_bytes") 调用
inline std::string GetNodeBytesAttr(const NodeDef& node_def,
                                    const std::string& attr_name) {
  std::string value;
  if (!GetNodeBytesAttr(node_def, attr_name, &value)) {
    SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                  "can not get attr:{} from node:{}, op:{}", attr_name,
                  node_def.name(), node_def.op());
  }
  return value;
}
bool GetNodeBytesAttr(const NodeDef& node_def, const std::string& attr_name,
                      std::vector<std::string>* value) {
  AttrValue attr_value;
  if (!GetAttrValue(node_def, attr_name, &attr_value)) {
    return false;
  }
  SERVING_ENFORCE(
      attr_value.has_by(), errors::ErrorCode::LOGIC_ERROR,
      "attr_value({}) does not have expected type(bytes) value, node: {}",
      attr_name, node_def.name());
  SERVING_ENFORCE(!attr_value.bys().data().empty(),
                  errors::ErrorCode::INVALID_ARGUMENT,
                  "attr_value({}) type(BytesList) has empty value, node: {}",
                  attr_name, node_def.name());
  value->reserve(attr_value.bys().data().size());
  for (const auto& v : attr_value.bys().data()) {
    value->emplace_back(v);
  }
  return true;
}

从这里我们也可以看出来， ** input_schema_bytes存放在map attr_values 中**，读者可以回头看看NodeDef在proto中的定义。
然后调用arrow进行反序列化就得到了input schema。BuildOutputSchema(）也是同样的流程，这里就不重复介绍了。


std::shared_ptr<arrow::Schema> DeserializeSchema(const std::string& buf) {
  std::shared_ptr<arrow::Schema> result;

  std::shared_ptr<arrow::io::RandomAccessFile> buffer_reader =
      std::make_shared<arrow::io::BufferReader>(buf);

  arrow::ipc::DictionaryMemo tmp_memo;
  SERVING_GET_ARROW_RESULT(
      arrow::ipc::ReadSchema(
          std::static_pointer_cast<arrow::io::InputStream>(buffer_reader).get(),
          &tmp_memo),
      result);

  return result;
}

之后的代码在arrow_processing中注册函数调用，这个在预测中我会再次介绍。
https://github.com/secretflow/serving/blob/475bb3356e3a246f444fc25f62f9619874870680/secretflow_serving/ops/arrow_processing.cc#L143
继续看Executor的构造流程，剩下部分建立了node_name到input_schema的映射，然后找到输入特征的schema，并且对node中所有的input_schema进行校验。


  // get input schema
  const auto& entry_nodes = execution_->GetEntryNodes();
  for (const auto& node : entry_nodes) {
    const auto& node_name = node->node_def().name();
    auto iter = node_items_->find(node_name);
    SERVING_ENFORCE(iter != node_items_->end(), errors::ErrorCode::LOGIC_ERROR);
    const auto& input_schema = iter->second->op_kernel->GetAllInputSchema();
    entry_node_names_.emplace_back(node_name);
    input_schema_map_.emplace(node_name, input_schema);
  }

  if (execution_->IsEntry()) {
    // build feature schema from entry execution
    auto iter = input_schema_map_.begin();
    const auto& first_input_schema_list = iter->second;
    SERVING_ENFORCE(first_input_schema_list.size() == 1,
                    errors::ErrorCode::LOGIC_ERROR);
    const auto& target_schema = first_input_schema_list.front();
    ++iter;
    for (; iter != input_schema_map_.end(); ++iter) {
      SERVING_ENFORCE_EQ(iter->second.size(), 1U,
                         "entry nodes should have only one input table");
      const auto& schema = iter->second.front();

      SERVING_ENFORCE_EQ(
          target_schema->num_fields(), schema->num_fields(),
          "entry nodes should have same shape inputs, expect: {}, found: {}",
          target_schema->num_fields(), schema->num_fields());
      CheckReferenceFields(schema, target_schema,
                           fmt::format("entry nodes should have same input "
                                       "schema, found node:{} mismatch",
                                       iter->first));
    }
    input_feature_schema_ = target_schema;
  }
}

初始化ExecutionCore

ExecutionCore是预测时的核心模块，我们看一下这部分代码：


ExecutionCore::ExecutionCore(Options opts)
    : opts_(std::move(opts)),
      stats_({{"handler", "ExecutionCore"}, {"party_id", opts_.party_id}}) {
  SERVING_ENFORCE(!opts_.id.empty(), errors::ErrorCode::INVALID_ARGUMENT);
  SERVING_ENFORCE(!opts_.party_id.empty(), errors::ErrorCode::INVALID_ARGUMENT);
  SERVING_ENFORCE(opts_.executable, errors::ErrorCode::INVALID_ARGUMENT);
  SERVING_ENFORCE(opts_.op_exec_workers_num > 0,
                  errors::ErrorCode::INVALID_ARGUMENT);
// 线程池初始化
  ThreadPool::GetInstance()->Start(opts_.op_exec_workers_num);

  // key: model input feature name
  // value: source or predefined feature name
  // 这里可以插入一些预定义的模型特征，前面的初始化流程中可以进行配置
  std::unordered_map<std::string, std::string> model_feature_mapping;
  valid_feature_mapping_flag_ = false;
  if (opts_.feature_mapping.has_value()) {
    for (const auto& pair : opts_.feature_mapping.value()) {
      if (pair.first != pair.second) {
        valid_feature_mapping_flag_ = true;
      }
      SERVING_ENFORCE(
          model_feature_mapping.emplace(pair.second, pair.first).second,
          errors::ErrorCode::INVALID_ARGUMENT,
          "found duplicate feature mapping value:{}", pair.second);
    }
  }
// 读取Executable的所有schema
  const auto& model_input_schema = opts_.executable->GetInputFeatureSchema();
// 加入source_schema_中
  if (model_feature_mapping.empty()) {
    source_schema_ = model_input_schema;
  } else {
    arrow::SchemaBuilder builder;
    int num_fields = model_input_schema->num_fields();
    for (int i = 0; i < num_fields; ++i) {
      const auto& f = model_input_schema->field(i);
      auto iter = model_feature_mapping.find(f->name());
      SERVING_ENFORCE(iter != model_feature_mapping.end(),
                      errors::ErrorCode::INVALID_ARGUMENT,
                      "can not found {} in feature mapping rule", f->name());
      SERVING_CHECK_ARROW_STATUS(
          builder.AddField(arrow::field(iter->second, f->type())));
    }
    SERVING_GET_ARROW_RESULT(builder.Finish(), source_schema_);
  }
// 初始化feature_adapter_
  if (opts_.feature_source_config.has_value()) {
    SPDLOG_INFO("create feature adapter, type:{}",
                static_cast<int>(opts_.feature_source_config->options_case()));
    feature_adapter_ = feature::FeatureAdapterFactory::GetInstance()->Create(
        *opts_.feature_source_config, opts_.id, opts_.party_id, source_schema_);
  }
}

FeatureSourceConfig也是一个proto定义：

// Config for a feature source
message FeatureSourceConfig {
  oneof options {
    MockOptions mock_opts = 1;
    HttpOptions http_opts = 2;
    CsvOptions csv_opts = 3;
  }
}

可以看出来，它对三种格式做了适配，并且通过OnFetchFeature方法处理请求，这个我们在预测阶段再介绍。

实例化监控服务

这里一共实例化了两个brpc server，第一个是普罗米修斯监控服务，普罗米修斯应该是现在最常用的监控框架。


  // start mertrics server
  if (opts_.server_config.metrics_exposer_port() > 0) {
    std::vector<std::string> strs = absl::StrSplit(self_address, ':');
    SERVING_ENFORCE(strs.size() == 2, errors::ErrorCode::LOGIC_ERROR,
                    "invalid self address.");
    auto metrics_listen_address = fmt::format(
        "{}:{}", strs[0], opts_.server_config.metrics_exposer_port());

    brpc::ServerOptions metrics_server_options;
    if (opts_.server_config.has_tls_config()) {
      auto* ssl_opts = metrics_server_options.mutable_ssl_options();
      ssl_opts->default_cert.certificate =
          opts_.server_config.tls_config().certificate_path();
      ssl_opts->default_cert.private_key =
          opts_.server_config.tls_config().private_key_path();
      ssl_opts->verify.verify_depth = 1;
      ssl_opts->verify.ca_file_path =
          opts_.server_config.tls_config().ca_file_path();
    }
    // @hint 注册普罗米修斯监控服务
    auto* metrics_service = new metrics::MetricsService();
    metrics_service->RegisterCollectable(metrics::GetDefaultRegistry());

    metrics_server_.set_version(SERVING_VERSION_STRING);
    if (metrics_server_.AddService(metrics_service,
                                   brpc::SERVER_OWNS_SERVICE) != 0) {
      SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                    "fail to add metrics service into brpc server.");
    }

    if (metrics_server_.Start(metrics_listen_address.c_str(),
                              &metrics_server_options) != 0) {
      SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                    "fail to start metrics server at {}", self_address);
    }

    SPDLOG_INFO("begin metrics service listen at {}, ", metrics_listen_address);
  }

实例化模型server - 模型服务

模型server包括三个service，第一个是模型服务，这里暂时还没有动态模型注册功能，只是能获取当前模型的状态信息等。


// build model_info_collector
// @hint 模型信息收集 
  ModelInfoCollector::Options m_c_opts;
  m_c_opts.model_bundle = model_bundle;
  m_c_opts.service_id = opts_.service_id;
  m_c_opts.self_party_id = self_party_id;
  m_c_opts.remote_channel_map = channels;
  ModelInfoCollector model_info_collector(std::move(m_c_opts));
  {
    auto max_retry_cnt =
        opts_.cluster_config.channel_desc().handshake_max_retry_cnt();
    if (max_retry_cnt != 0) {
      model_info_collector.SetRetryCounts(max_retry_cnt);
    }
    auto retry_interval_ms =
        opts_.cluster_config.channel_desc().handshake_retry_interval_ms();
    if (retry_interval_ms != 0) {
      model_info_collector.SetRetryIntervalMs(retry_interval_ms);
    }
  }

  // add services
  auto* model_service = new ModelServiceImpl(
      {{opts_.service_id, model_info_collector.GetSelfModelInfo()}},
      self_party_id);
  // @hint brpc注册服务
  // 模型服务，没有动态模型注册功能
  if (service_server_.AddService(model_service, brpc::SERVER_OWNS_SERVICE) !=
      0) {
    SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                  "fail to add model service into brpc server.");
  }

接口如下：


// 模型 - 服务入口
class ModelServiceImpl : public apis::ModelService {
 public:
  explicit ModelServiceImpl(std::map<std::string, ModelInfo> model_infos,
                            const std::string& self_party_id);

  void GetModelInfo(::google::protobuf::RpcController* controller,
                    const apis::GetModelInfoRequest* request,
                    apis::GetModelInfoResponse* response,
                    ::google::protobuf::Closure* done) override;

 private:
  struct Stats {
    // for request api
    ::prometheus::Family<::prometheus::Counter>& api_request_counter_family;
    ::prometheus::Family<::prometheus::Summary>&
        api_request_duration_summary_family;

    explicit Stats(std::map<std::string, std::string> labels,
                   const std::shared_ptr<::prometheus::Registry>& registry =
                       metrics::GetDefaultRegistry());
  };

  void RecordMetrics(const apis::GetModelInfoRequest& request,
                     const apis::GetModelInfoResponse& response,
                     double duration_ms, const std::string& action);

实例化模型server - 执行服务

执行服务使用execution_core进行初始化，有状态信息、执行、监控接口:

  auto* execution_service = new ExecutionServiceImpl(execution_core);
  if (service_server_.AddService(execution_service,
                                 brpc::SERVER_OWNS_SERVICE) != 0) {
    SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                  "fail to add execution service into brpc server.");
  }

class ExecutionServiceImpl : public apis::ExecutionService {
 public:
  explicit ExecutionServiceImpl(
      const std::shared_ptr<ExecutionCore>& execution_core);

  void Execute(::google::protobuf::RpcController* controller,
               const apis::ExecuteRequest* request,
               apis::ExecuteResponse* response,
               ::google::protobuf::Closure* done) override;

 private:
  void RecordMetrics(const apis::ExecuteRequest& request,
                     const apis::ExecuteResponse& response, double duration_ms,
                     const std::string& action);
  struct Stats {
    // for service interface
    ::prometheus::Family<::prometheus::Counter>& api_request_counter_family;
    ::prometheus::Family<::prometheus::Summary>&
        api_request_duration_summary_family;

    Stats(std::map<std::string, std::string> labels,
          const std::shared_ptr<::prometheus::Registry>& registry =
              metrics::GetDefaultRegistry());
  };

实例化模型server - 预测服务

预测服务会比执行服务多一些预处理逻辑，最后仍然会调用到execution_core。
这里在初始化之后会进行一些server的参数设置：

  auto* prediction_service = new PredictionServiceImpl(self_party_id);
  if (service_server_.AddService(prediction_service,
                                 brpc::SERVER_OWNS_SERVICE) != 0) {
    SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                  "fail to add prediction service into brpc server.");
  }

  // build services server opts
  brpc::ServerOptions server_options;
  server_options.max_concurrency = opts_.server_config.max_concurrency();
  if (opts_.server_config.worker_num() > 0) {
    server_options.num_threads = opts_.server_config.worker_num();
  }
  if (opts_.server_config.brpc_builtin_service_port() > 0) {
    server_options.has_builtin_services = true;
    server_options.internal_port =
        opts_.server_config.brpc_builtin_service_port();
    SPDLOG_INFO("internal port: {}", server_options.internal_port);
  }
  if (opts_.server_config.has_tls_config()) {
    auto* ssl_opts = server_options.mutable_ssl_options();
    ssl_opts->default_cert.certificate =
        opts_.server_config.tls_config().certificate_path();
    ssl_opts->default_cert.private_key =
        opts_.server_config.tls_config().private_key_path();
    ssl_opts->verify.verify_depth = 1;
    ssl_opts->verify.ca_file_path =
        opts_.server_config.tls_config().ca_file_path();
  }
  health::ServingHealthReporter hr;
  server_options.health_reporter = &hr;

  // start services server
  service_server_.set_version(SERVING_VERSION_STRING);
  if (service_server_.Start(self_address.c_str(), &server_options) != 0) {
    SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                  "fail to start brpc server at {}", self_address);
  }

之后设置prediction_core，prediction_core只记录一些信息。从这里我们可以看出来，execution是单方的，predict不仅有多方，而且有主从，prediction_core用来协调各方的推理流程。

PredictionCore::Options prediction_core_opts;
  prediction_core_opts.service_id = opts_.service_id;
  prediction_core_opts.party_id = self_party_id;
  prediction_core_opts.cluster_ids = std::move(cluster_ids);
  prediction_core_opts.predictor = predictor;
  auto prediction_core =
      std::make_shared<PredictionCore>(std::move(prediction_core_opts));
  prediction_service->Init(prediction_core);

到这里启动流程就结束了，返回main函数，运行直到brpc服务结束。

预测阶段

ExecuteRequest

我们先看execute请求的proto定义：

// Execute request containing one or more requests.
message ExecuteRequest {
  // Custom data. The header will be passed to the downstream system which
  // implement the feature service spi.
  Header header = 1;

  // Represents the id of the requesting party
  string requester_id = 2;

  // Model service specification.
  // 在execute时只需要检查和ExecutionCore定义的id一致即可。
  ServiceSpec service_spec = 3;

  // Represents the session of this execute.
  string session_id = 4;

  FeatureSource feature_source = 5;

  ExecutionTask task = 6;
}

FeatureSource定义了特征的拉取策略：

// Support feature source type
enum FeatureSourceType {
  UNKNOWN_FS_TYPE = 0;

  // No need features.
  FS_NONE = 1;
  // Fetch features from feature service.
  FS_SERVICE = 2;
  // The feature is defined in the request.
  FS_PREDEFINED = 3;
}

// Descriptive feature source
message FeatureSource {
  // Identifies the source type of the features
  FeatureSourceType type = 1;

  // Custom parameter for fetch features from feature service or other systems.
  // Valid when `type==FeatureSourceType::FS_SERVICE`
  FeatureParam fs_param = 2;

  // Defined features.
  // Valid when `type==FeatureSourceType::FS_PREDEFINED`
  repeated Feature predefineds = 3;
}

ExecutionTask指定了execution id和execution的入参，入参通过NodeIo序列化传入。


// Execute request task.
message ExecutionTask {
  // Specified the execution id.
  int32 execution_id = 1;

  repeated NodeIo nodes = 2;
}

// The serialized data of the node input/output.
message IoData {
  repeated bytes datas = 1;
}

// Represents the node input/output data.
message NodeIo {
  // Node name.
  string name = 1;

  repeated IoData ios = 2;
}

特征拉取

因为预测也会调用到Execute，所以我们先看Execution服务。
这里有两种拉取特征的类型，一种是远程拉取，一种是本地拉取。


    std::shared_ptr<arrow::RecordBatch> features;
    if (request->feature_source().type() ==
        apis::FeatureSourceType::FS_SERVICE) {
      SERVING_ENFORCE(
          !request->feature_source().fs_param().query_datas().empty(),
          errors::ErrorCode::INVALID_ARGUMENT,
          "get empty feature service query datas.");
      SERVING_ENFORCE(request->task().nodes().empty(),
                      errors::ErrorCode::LOGIC_ERROR);
      features = BatchFetchFeatures(request, response);
    } else if (request->feature_source().type() ==
               apis::FeatureSourceType::FS_PREDEFINED) {
      SERVING_ENFORCE(!request->feature_source().predefineds().empty(),
                      errors::ErrorCode::INVALID_ARGUMENT,
                      "get empty predefined features.");
      SERVING_ENFORCE(request->task().nodes().empty(),
                      errors::ErrorCode::LOGIC_ERROR);
      features = FeaturesToTable(request->feature_source().predefineds(),
                                 source_schema_);
    }

先来看远程拉取：


std::shared_ptr<arrow::RecordBatch> ExecutionCore::BatchFetchFeatures(
    const apis::ExecuteRequest* request,
    apis::ExecuteResponse* response) const {
  SERVING_ENFORCE(feature_adapter_, errors::ErrorCode::INVALID_ARGUMENT,
                  "feature source is not set, please check config.");

  yacl::ElapsedTimer timer;
  try {
    feature::FeatureAdapter::Request fa_request;
    fa_request.header = &request->header();
    fa_request.fs_param = &request->feature_source().fs_param();
    feature::FeatureAdapter::Response fa_response;
    fa_response.header = response->mutable_header();
    // 拉取特征
    feature_adapter_->FetchFeature(fa_request, &fa_response);

    RecordBatchFeatureMetrics(request->service_spec().id(),
                              request->requester_id(), errors::ErrorCode::OK,
                              timer.CountMs());
    return fa_response.features;
  } catch (Exception& e) {
    RecordBatchFeatureMetrics(request->service_spec().id(),
                              request->requester_id(), e.code(),
                              timer.CountMs());
    throw e;
  }
}

这里就会调用到feature_adapter_进行特征的拉取。FetchFeature调用了子类实现的OnFetchFeature方法。我们以HttpFeatureAdapter来看下，这里也没什么好说的，就是用brpc完成了一个http拉取请求。

void FeatureAdapter::FetchFeature(const Request& request, Response* response) {
  OnFetchFeature(request, response);

  CheckFeatureValid(request, response->features);
}
/** HttpFeatureAdapter 的OnFetchFeature **/
void HttpFeatureAdapter::OnFetchFeature(const Request& request,
                                        Response* response) {
  auto request_body = SerializeRequest(request);

  yacl::ElapsedTimer timer;

  brpc::Controller cntl;
  cntl.http_request().uri() = spec_.http_opts().endpoint();
  cntl.http_request().set_method(brpc::HTTP_METHOD_POST);
  cntl.http_request().set_content_type("application/json");
  cntl.request_attachment().append(request_body);
  channel_->CallMethod(NULL, &cntl, NULL, NULL, NULL);
  SERVING_ENFORCE(!cntl.Failed(), errors::ErrorCode::NETWORK_ERROR,
                  "http request failed, endpoint:{}, detail:{}",
                  spec_.http_opts().endpoint(), cntl.ErrorText());

  DeserializeResponse(cntl.response_attachment().to_string(), response);
}

还有一种是预定义的特征，就是请求携带了特征，这里会将特征读取到arrow中，比较简单，不多讲了。


std::shared_ptr<arrow::RecordBatch> FeaturesToTable(
    const ::google::protobuf::RepeatedPtrField<Feature>& features,
    const std::shared_ptr<const arrow::Schema>& target_schema) {
  arrow::SchemaBuilder schema_builder;
  std::vector<std::shared_ptr<arrow::Array>> arrays;
  int num_rows = -1;

  for (const auto& field : target_schema->fields()) {
    bool found = false;
    for (const auto& f : features) {
      if (f.field().name() == field->name()) {
        FeatureToArrayVisitor visitor{.target_field = field, .array = {}};
        FeatureVisit(visitor, f);

        if (num_rows >= 0) {
          SERVING_ENFORCE_EQ(
              num_rows, visitor.array->length(),
              "features must have same length value. {}:{}, others:{}",
              f.field().name(), visitor.array->length(), num_rows);
        }
        num_rows = visitor.array->length();
        arrays.emplace_back(visitor.array);
        found = true;
        break;
      }
    }
    SERVING_ENFORCE(found, errors::ErrorCode::UNEXPECTED_ERROR,
                    "can not found feature:{} in response", field->name());
  }
  return MakeRecordBatch(target_schema, num_rows, std::move(arrays));
}

转换入模特征

ApplyFeatureMappingRule实现了在线特征和模型特征的转换。
在配置的feature_mapping中找到对应字段的schema，然后使用arrow::RecordBatch::Make进行特征转换。


std::shared_ptr<arrow::RecordBatch> ExecutionCore::ApplyFeatureMappingRule(
    const std::shared_ptr<arrow::RecordBatch>& features) {
  if (features == nullptr || !valid_feature_mapping_flag_) {
    // no need mapping
    return features;
  }
  const auto& feature_mapping = opts_.feature_mapping.value();

  int num_cols = features->num_columns();
  const auto& old_schema = features->schema();
  arrow::SchemaBuilder builder;
  for (int i = 0; i < num_cols; ++i) {
    auto field = old_schema->field(i);
    auto iter = feature_mapping.find(field->name());
    if (iter != feature_mapping.end()) {
      field = arrow::field(iter->second, field->type());
    }
    SERVING_CHECK_ARROW_STATUS(builder.AddField(field));
  }

  std::shared_ptr<arrow::Schema> schema;
  SERVING_GET_ARROW_RESULT(builder.Finish(), schema);

  return MakeRecordBatch(schema, features->num_rows(), features->columns());
    // 调用了  arrow::RecordBatch::Make(schema, num_rows, std::move(columns));
}

Task初始化

首先根据入参进行Task实例化，这里将nodeio的参数转换为了op::OpComputeInputs。

   // executable run
    Executable::Task task;
    task.id = request->task().execution_id();
    task.features = features;
    task.node_inputs = std::make_shared<std::unordered_map<
        std::string, std::shared_ptr<op::OpComputeInputs>>>();
    for (const auto& n : request->task().nodes()) {
      auto compute_inputs = std::make_shared<op::OpComputeInputs>();
      for (const auto& io : n.ios()) {
        std::vector<std::shared_ptr<arrow::RecordBatch>> inputs;
        for (const auto& d : io.datas()) {
          inputs.emplace_back(DeserializeRecordBatch(d));
        }
        compute_inputs->emplace_back(std::move(inputs));
      }
      task.node_inputs->emplace(n.name(), std::move(compute_inputs));
    }
    opts_.executable->Run(task);

然后调用执行，如果有特征的话，那么就用特征进行预测；否则，使用使用node_inputs进行预测。这里其实会将feature转换为std::unordered_map，然后调用对应的run方法。

void Executable::Run(Task& task) {
  SERVING_ENFORCE(task.id < executors_.size(), errors::ErrorCode::LOGIC_ERROR);
  auto executor = executors_[task.id];
  if (task.features) {
    task.outputs = executor->Run(task.features);
  } else {
    SERVING_ENFORCE(!task.node_inputs->empty(), errors::ErrorCode::LOGIC_ERROR);
    task.outputs = executor->Run(*(task.node_inputs));
  }

  SPDLOG_DEBUG("Executable::Run end, task.outputs.size:{}",
               task.outputs->size());
}

首先我们看Executor的Run方法：

std::shared_ptr<std::vector<NodeOutput>> Executor::Run(
    std::shared_ptr<arrow::RecordBatch>& features) {
  SERVING_ENFORCE(execution_->IsEntry(), errors::ErrorCode::LOGIC_ERROR);
  auto inputs =
      std::unordered_map<std::string, std::shared_ptr<op::OpComputeInputs>>();
  for (size_t i = 0; i < entry_node_names_.size(); ++i) {
    auto op_inputs = std::make_shared<op::OpComputeInputs>();
    std::vector<std::shared_ptr<arrow::RecordBatch>> record_list = {features};
    op_inputs->emplace_back(std::move(record_list));
    inputs.emplace(entry_node_names_[i], std::move(op_inputs));
  }
  return Run(inputs);
}
// 然后调用下面的run方法

std::shared_ptr<std::vector<NodeOutput>> Executor::Run(
    std::unordered_map<std::string, std::shared_ptr<op::OpComputeInputs>>&
        inputs) {
  // 入口node
  std::vector<std::shared_ptr<op::OpComputeInputs>> entry_node_inputs;
  for (const auto& node : execution_->GetEntryNodes()) {
    auto iter = inputs.find(node->node_def().name());
    SERVING_ENFORCE(iter != inputs.end(), errors::ErrorCode::INVALID_ARGUMENT,
                    "can not found inputs for node:{}",
                    node->node_def().name());
    entry_node_inputs.emplace_back(iter->second);
  }
  // 实例化一个调度器
  // execution_ 每个executor都有一个，在初始化时传入
  auto sched = std::make_shared<ExecuteScheduler>(
      node_items_, execution_->GetExitNodeNum(), ThreadPool::GetInstance(),
      execution_);
  // 调度器执行入口
  const auto& entry_nodes = execution_->GetEntryNodes();
  for (size_t i = 0; i != execution_->GetEntryNodeNum(); ++i) {
    sched->AddEntryNode(entry_nodes[i], entry_node_inputs[i]);
  }

  sched->Schedule();

  auto task_exception = sched->GetTaskException();
  if (task_exception) {
    SPDLOG_ERROR("Execution {} run with exception.", execution_->id());
    std::rethrow_exception(task_exception);
  }
  SERVING_ENFORCE_EQ(sched->GetSchedCount(), execution_->nodes().size());
  return std::make_shared<std::vector<NodeOutput>>(sched->GetResults());
}

这里最关键的在于调度器，它的入参是：

std::shared_ptr> node_items_它携带了当前Executor拥有的node和对应的调度器op_kernel
出口node的数目
线程池实例
execution_ 也就是模型最初对于executor的配置

接下来我们看这个调度器的代码。

调度器

首先，在构造函数这里有个有意思的东西：

  ExecuteScheduler(
      std::shared_ptr<
          std::unordered_map<std::string, std::shared_ptr<NodeItem>>>
          node_items,
      uint64_t res_cnt, const std::shared_ptr<ThreadPool>& thread_pool,
      std::shared_ptr<Execution> execution)
      : node_items_(std::move(node_items)),
        context_(res_cnt),
        thread_pool_(thread_pool),
        execution_(std::move(execution)),
        propagator_(execution_->nodes()),
        sched_count_(0) {}

注意到了吗，propagator_。我们来看它的定义和实现，看起来它保管了所有node的状态信息和输入输出。
像是某种用于节点管控的东西。


struct ComputeContext {
  // TODO: Session
  OpComputeInputs inputs;
  std::shared_ptr<arrow::RecordBatch> output;
};

struct FrameState {
  std::atomic<int> pending_count;

  op::ComputeContext compute_ctx;
};

class Propagator {
 public:
  explicit Propagator(
      const std::unordered_map<std::string, std::shared_ptr<Node>>& nodes);

  FrameState* GetFrame(const std::string& node_name);

 private:
  std::unordered_map<std::string, FrameState*> node_frame_map_;
  std::vector<FrameState> frame_pool_;
};
}  // namespace secretflow::serving

Propagator::Propagator(
    const std::unordered_map<std::string, std::shared_ptr<Node>>& nodes) {
  frame_pool_ = std::vector<FrameState>(nodes.size());
  size_t idx = 0;
  for (auto& [node_name, node] : nodes) {
    auto frame = &frame_pool_[idx++];
    frame->pending_count = node->GetInputNum();
    frame->compute_ctx.inputs.resize(frame->pending_count);

    SERVING_ENFORCE(node_frame_map_.emplace(node_name, std::move(frame)).second,
                    errors::ErrorCode::LOGIC_ERROR);
  }
}

FrameState* Propagator::GetFrame(const std::string& node_name) {
  auto iter = node_frame_map_.find(node_name);
  SERVING_ENFORCE(iter != node_frame_map_.end(), errors::ErrorCode::LOGIC_ERROR,
                  "can not found frame for node: {}", node_name);
  return iter->second;
}

我们带着疑问继续来看执行代码。
调度器的调度代码，这里注释说想用bthread的能力来实现worker窃取，挺有意思，关于brpc我之前也写过一篇，感兴趣的可以看看：
https://www.yuque.com/treblez/qksu6c/owbw5sm9xzmqv2qa?singleDoc# 《brpc：优秀代码鉴赏》
这里没有换出机制，仅仅是一个简单的原地等待。ready_nodes_是一个ThreadSafeQueue，这个也不是什么无锁队列，简单的mutex同步，代码就不贴了。

  void Schedule() {
    while (!stop_flag_.load() && !context_.IsFinish()) {
      // TODO: consider use bthread::Mutex and bthread::ConditionVariable
      //       to make this worker can switch to others
      std::shared_ptr<NodeItem> node_item;
      // 调用了ready_nodes_.WaitPop(node_item);
      // ready_nodes_是一个 ThreadSafeQueue 
      if (!context_.GetReadyNode(node_item)) {
        continue;
      }
      SubmitExecuteOpTask(node_item);
    }
  }
/** 调用到下边 **/
  void SubmitExecuteOpTask(std::shared_ptr<NodeItem>& node_item) {
    if (stop_flag_.load()) {
      return;
    }
    thread_pool_->SubmitTask(
        std::make_unique<ExecuteOpTask>(node_item, shared_from_this()));
  }
/** 下面的类被提交到线程池 **/
  class ExecuteOpTask : public ThreadPool::Task {
   public:
    const char* Name() override { return "ExecuteOpTask"; }

    ExecuteOpTask(std::shared_ptr<NodeItem> node_item,
                  std::shared_ptr<ExecuteScheduler> sched)
        : node_item_(std::move(node_item)), sched_(std::move(sched)) {}
	// 回调ExecuteOp
    void Exec() override { sched_->ExecuteOp(node_item_); }

    void OnException(std::exception_ptr e) noexcept override {
      sched_->SetTaskException(e);
    }

   private:
    std::shared_ptr<NodeItem> node_item_;
    std::shared_ptr<ExecuteScheduler> sched_;
  };

/** 调用ExecuteOp **/ 

  void ExecuteOp(const std::shared_ptr<NodeItem>& node_item) {
    if (stop_flag_.load()) {
      return;
    }

    auto* frame = propagator_.GetFrame(node_item->node->node_def().name());
	//这里调用了OpKernel，进行执行逻辑
    node_item->op_kernel->Compute(&(frame->compute_ctx));
    sched_count_++;

    if (execution_->IsExitNode(node_item->node->node_def().name())) {
      context_.AddResult(node_item->node->node_def().name(),
                         frame->compute_ctx.output);
    }

    const auto& edges = node_item->node->out_edges();
    for (const auto& edge : edges) {
      CompleteOutEdge(edge, frame->compute_ctx.output);
    }
  }

执行

之前我介绍过OpKernel的子类有三个：ArrowProcessing、MergeY、DotProduct
这里分别讲一下它们执行期的执行方法。

ArrowProcessing

先来看最重要的ArrowProcessing。
ComputeTrace被定义在proto中，表示要执行的方法。

message FunctionTrace {
  // The Function name.
  string name = 1;

  // The serialized function options.
  bytes option_bytes = 2;

  // Inputs of this function.
  repeated FunctionInput inputs = 3;

  // Output of this function.
  FunctionOutput output = 4;
}

message ComputeTrace {
  // The name of this Compute.
  string name = 1;

  repeated FunctionTrace func_traces = 2;
}

// Function name定义如下：

enum ExtendFunctionName {
  // Placeholder for proto3 default value, do not use it
  UNKOWN_EX_FUNCTION_NAME = 0;

  // Get colunm from table(record_batch).
  // see
  // https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow11RecordBatch6columnEi
  EFN_TB_COLUMN = 1;
  // Add colum to table(record_batch).
  // see
  // https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow11RecordBatch9AddColumnEiNSt6stringERKNSt10shared_ptrI5ArrayEE
  EFN_TB_ADD_COLUMN = 2;
  // Remove colunm from table(record_batch).
  // see
  // https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow11RecordBatch12RemoveColumnEi
  EFN_TB_REMOVE_COLUMN = 3;
  // Set colunm to table(record_batch).
  // see
  // https://arrow.apache.org/docs/cpp/api/table.html#_CPPv4NK5arrow11RecordBatch9SetColumnEiRKNSt10shared_ptrI5FieldEERKNSt10shared_ptrI5ArrayEE
  EFN_TB_SET_COLUMN = 4;
}

// FunctionInput 定义如下

message FunctionInput {
  oneof value {
    // '0' means root input data
    int32 data_id = 1;
    Scalar custom_scalar = 2;
  }
}

我们可以看到，这里的几个函数就是Arrow的CRUD。
执行逻辑如下所示，这里会将compute_trace_中的所有定义函数放到执行列表func_list_里面，后续会对func_list_进行顺序的调用。


switch (ex_func_name) {
case compute::ExtendFunctionName::EFN_TB_COLUMN: {
    // 查找
  func_list_.emplace_back([](arrow::Datum& result_datum,
                             std::vector<arrow::Datum>& func_inputs) {
    result_datum = func_inputs[0].record_batch()->column(
        std::static_pointer_cast<arrow::Int64Scalar>(
            func_inputs[1].scalar())
            ->value);
  });
  break;
}
case compute::ExtendFunctionName::EFN_TB_ADD_COLUMN: {
    // 增加一列
  func_list_.emplace_back([](arrow::Datum& result_datum,
                             std::vector<arrow::Datum>& func_inputs) {
    int64_t index = std::static_pointer_cast<arrow::Int64Scalar>(
                        func_inputs[1].scalar())
                        ->value;
    std::string field_name(
        std::static_pointer_cast<arrow::StringScalar>(
            func_inputs[2].scalar())
            ->view());
    std::shared_ptr<arrow::RecordBatch> new_batch;
    SERVING_GET_ARROW_RESULT(
        func_inputs[0].record_batch()->AddColumn(
            index, std::move(field_name), func_inputs[3].make_array()),
        new_batch);
    result_datum = new_batch;
  });
  break;
}
case compute::ExtendFunctionName::EFN_TB_REMOVE_COLUMN: {
    // 删除一列
  func_list_.emplace_back([](arrow::Datum& result_datum,
                             std::vector<arrow::Datum>& func_inputs) {
    std::shared_ptr<arrow::RecordBatch> new_batch;
    SERVING_GET_ARROW_RESULT(
        func_inputs[0].record_batch()->RemoveColumn(
            std::static_pointer_cast<arrow::Int64Scalar>(
                func_inputs[1].scalar())
                ->value),
        new_batch);
    result_datum = new_batch;
  });
  break;
}
case compute::ExtendFunctionName::EFN_TB_SET_COLUMN: {
    // 修改一列
  func_list_.emplace_back([](arrow::Datum& result_datum,
                             std::vector<arrow::Datum>& func_inputs) {
    int64_t index = std::static_pointer_cast<arrow::Int64Scalar>(
                        func_inputs[1].scalar())
                        ->value;
    std::string field_name(
        std::static_pointer_cast<arrow::StringScalar>(
            func_inputs[2].scalar())
            ->view());
    std::shared_ptr<arrow::Array> array = func_inputs[3].make_array();
    std::shared_ptr<arrow::RecordBatch> new_batch;
    SERVING_GET_ARROW_RESULT(
        func_inputs[0].record_batch()->SetColumn(
            index, arrow::field(std::move(field_name), array->type()),
            array),
        new_batch);
    result_datum = new_batch;
  });
  break;
}
default:
  SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                "invalid ext func name enum: {}",
                static_cast<int>(ex_func_name));
}

然后我们继续看执行时逻辑DoCompute，这里比较奇怪的是拆成了两个，这里只是做一个顺序的调用，就不多讲了。


void ArrowProcessing::DoCompute(ComputeContext* ctx) {
  // sanity check
  SERVING_ENFORCE(ctx->inputs.size() == 1, errors::ErrorCode::LOGIC_ERROR);
  SERVING_ENFORCE(ctx->inputs.front().size() == 1,
                  errors::ErrorCode::LOGIC_ERROR);

  if (dummy_flag_) {
    ctx->output = ctx->inputs.front().front();
    return;
  }

  SPDLOG_INFO("replay compute: {}", compute_trace_.name());

  ctx->output = ReplayCompute(ctx->inputs.front().front());
}



std::shared_ptr<arrow::RecordBatch> ArrowProcessing::ReplayCompute(
    const std::shared_ptr<arrow::RecordBatch>& input) {
  std::map<int32_t, arrow::Datum> datas = {{0, input}};

  arrow::Datum result_datum;
  for (int i = 0; i < compute_trace_.func_traces_size(); ++i) {
    const auto& func = compute_trace_.func_traces(i);
    SPDLOG_DEBUG("replay func: {}", func.ShortDebugString());
    auto func_inputs = BuildInputDatums(func.inputs(), datas);
    func_list_[i](result_datum, func_inputs);

    SERVING_ENFORCE(
        datas.emplace(func.output().data_id(), std::move(result_datum)).second,
        errors::ErrorCode::LOGIC_ERROR);
  }

  return datas[result_id_].record_batch();
}

DotProduct

和名字一样，这个OpKernel使用Eigen做一个点乘，Arrow没有提供点乘的能力


void DotProduct::DoCompute(ComputeContext* ctx) {
  SERVING_ENFORCE(ctx->inputs.size() == 1, errors::ErrorCode::LOGIC_ERROR);
  SERVING_ENFORCE(ctx->inputs.front().size() == 1,
                  errors::ErrorCode::LOGIC_ERROR);

  auto features = TableToMatrix(ctx->inputs.front().front());

  Double::ColVec score_vec = features * weights_;
  score_vec.array() += intercept_;

  std::shared_ptr<arrow::Array> array;
  arrow::DoubleBuilder builder;
  for (int i = 0; i < score_vec.rows(); ++i) {
    auto row = score_vec.row(i);
    SERVING_CHECK_ARROW_STATUS(builder.AppendValues(row.data(), 1));
  }
  SERVING_CHECK_ARROW_STATUS(builder.Finish(&array));
  ctx->output = MakeRecordBatch(output_schema_, score_vec.rows(), {array});
}

MergeY

MergeY对两方数据做一个合并：

void MergeY::DoCompute(ComputeContext* ctx) {
  // santiy check
  SERVING_ENFORCE(ctx->inputs.size() == 1, errors::ErrorCode::LOGIC_ERROR);
  SERVING_ENFORCE(ctx->inputs.front().size() >= 1,
                  errors::ErrorCode::LOGIC_ERROR);

  // merge partial_y
  arrow::Datum incremented_datum(ctx->inputs.front()[0]->column(0));
  for (size_t i = 1; i < ctx->inputs.front().size(); ++i) {
    auto cur_array = ctx->inputs.front()[i]->column(0);
    SERVING_GET_ARROW_RESULT(arrow::compute::Add(incremented_datum, cur_array),
                             incremented_datum);
  }
  auto merged_array = std::static_pointer_cast<arrow::DoubleArray>(
      std::move(incremented_datum).make_array());

  // apply link func
  arrow::DoubleBuilder builder;
  SERVING_CHECK_ARROW_STATUS(builder.Resize(merged_array->length()));
  for (int64_t i = 0; i < merged_array->length(); ++i) {
    auto score =
        ApplyLinkFunc(merged_array->Value(i), link_function_) * yhat_scale_;
    SERVING_CHECK_ARROW_STATUS(builder.Append(score));
  }
  std::shared_ptr<arrow::Array> res_array;
  SERVING_CHECK_ARROW_STATUS(builder.Finish(&res_array));
  ctx->output =
      MakeRecordBatch(output_schema_, res_array->length(), {res_array});
}

后处理

  void CompleteOutEdge(const std::shared_ptr<Edge>& edge,
                       std::shared_ptr<arrow::RecordBatch> output) {
    std::shared_ptr<Node> dst_node;
    if (!execution_->TryGetNode(edge->dst_node(), &dst_node)) {
      return;
    }

    auto* child_frame = propagator_.GetFrame(dst_node->GetName());
    child_frame->compute_ctx.inputs[edge->dst_input_id()].emplace_back(
        std::move(output));

    if (child_frame->pending_count.fetch_sub(1) == 1) {
      context_.AddReadyNode(
          node_items_->find(dst_node->node_def().name())->second);
    }
  }

PredictRequest

// The value of a feature
message FeatureValue {
  // int list
  repeated int32 i32s = 1;
  repeated int64 i64s = 2;
  // float list
  repeated float fs = 3;
  repeated double ds = 4;
  // string list
  repeated string ss = 5;
  // bool list
  repeated bool bs = 6;
}

// The definition of a feature field.
message FeatureField {
  // Unique name of the feature
  string name = 1;

  // Field type of the feature
  FieldType type = 2;
}

// The definition of a feature
message Feature {
  FeatureField field = 1;

  FeatureValue value = 2;
}

message PredictRequest {
  // Custom data. The header will be passed to the downstream system which
  // implement the feature service spi.
  Header header = 1;

  // Model service specification.
  ServiceSpec service_spec = 2;

  // The params for fetch features. Note that this should include all the
  // parties involved in the prediction.
  // Key: party's id.
  // Value: params for fetch features.
  map<string, FeatureParam> fs_params = 3;

  // Optional.
  // If defined, the request party will no longer query for the feature but will
  // use defined fetures in `predefined_features` for the prediction.
  repeated Feature predefined_features = 4;
}

预测

预测仅仅是比Execute多了一个多方通信的过程。
这里会用在启动流程中初始化的rpc channel，启动对方的execute，然后执行自己的流程，最后等待其余方的execute结束。


void Predictor::Predict(const apis::PredictRequest* request,
                        apis::PredictRespaonse* response) {
  std::unordered_map<std::string, std::shared_ptr<apis::NodeIo>>
      prev_node_io_map;
  std::vector<std::shared_ptr<RemoteExecute>> async_running_execs;
  async_running_execs.reserve(opts_.channels->size());

  auto execute_locally =
      [&](const std::shared_ptr<Execution>& execution,
          std::unordered_map<std::string, std::shared_ptr<apis::NodeIo>>&
              prev_io_map,
          std::unordered_map<std::string, std::shared_ptr<apis::NodeIo>>&
              cur_io_map) {
        // exec locally
        auto local_exec = BuildLocalExecute(request, response, execution);
        local_exec->SetInputs(std::move(prev_io_map));
        local_exec->Run();
        local_exec->GetOutputs(&cur_io_map);
      };

  for (const auto& e : opts_.executions) {
    async_running_execs.clear();
    std::unordered_map<std::string, std::shared_ptr<apis::NodeIo>>
        new_node_io_map;
    if (e->GetDispatchType() == DispatchType::DP_ALL) {
      for (const auto& [party_id, channel] : *opts_.channels) {
        auto ctx = BuildRemoteExecute(request, response, e, party_id, channel);
        ctx->SetInputs(prev_node_io_map);
        ctx->Run();
        async_running_execs.emplace_back(ctx);
      }

      // exec locally
      if (execution_core_) {
        execute_locally(e, prev_node_io_map, new_node_io_map);
        for (auto& exec : async_running_execs) {
          exec->WaitToFinish();
          exec->GetOutputs(&new_node_io_map);
        }
      } else {
        // TODO: support no execution core scene
        SERVING_THROW(errors::ErrorCode::NOT_IMPLEMENTED, "not implemented");
      }

    } else if (e->GetDispatchType() == DispatchType::DP_ANYONE) {
      // exec locally
      if (execution_core_) {
        execute_locally(e, prev_node_io_map, new_node_io_map);
      } else {
        // TODO: support no execution core scene
        SERVING_THROW(errors::ErrorCode::NOT_IMPLEMENTED, "not implemented");
      }
    } else if (e->GetDispatchType() == DispatchType::DP_SPECIFIED) {
      if (e->SpecificToThis()) {
        SERVING_ENFORCE(execution_core_, errors::ErrorCode::UNEXPECTED_ERROR);
        execute_locally(e, prev_node_io_map, new_node_io_map);
      } else {
        auto iter = opts_.specific_party_map.find(e->id());
        SERVING_ENFORCE(iter != opts_.specific_party_map.end(),
                        serving::errors::LOGIC_ERROR,
                        "{} execution assign to no party", e->id());
        auto ctx = BuildRemoteExecute(request, response, e, iter->second,
                                      opts_.channels->at(iter->second));
        ctx->SetInputs(prev_node_io_map);
        ctx->Run();
        ctx->WaitToFinish();
        ctx->GetOutputs(&new_node_io_map);
      }
    } else {
      SERVING_THROW(errors::ErrorCode::UNEXPECTED_ERROR,
                    "unsupported dispatch type: {}",
                    DispatchType_Name(e->GetDispatchType()));
    }
    prev_node_io_map.swap(new_node_io_map);
  }

  DealFinalResult(prev_node_io_map, response);
}

最终对于RemoteExecutor的调用最终会调用到下面的代码，其实也就是调用远端的Execute。

void ExecuteContext::Execute(
    std::shared_ptr<::google::protobuf::RpcChannel> channel,
    brpc::Controller* cntl) {
  apis::ExecutionService_Stub stub(channel.get());
  stub.Execute(cntl, &exec_req_, &exec_res_, brpc::DoNothing());
}

你可能感兴趣的:(推理引擎)

【AI大模型】深入解析预训练：大模型时代的核心引擎我爱一条柴ya 学习AI记录深度学习人工智能 ai python AI编程算法
预训练已成为现代人工智能，尤其是自然语言处理和计算机视觉领域的基石技术。它彻底改变了模型开发范式，催生了BERT、GPT等革命性模型。本文将系统阐述预训练的核心概念、原理、方法、应用及挑战。一、预训练的本质：为何需要它？核心问题：数据标注的瓶颈监督学习依赖海量高质量标注数据，获取成本极高（时间、金钱、专业知识）。对于复杂任务（如理解语义、生成文本），标注难度呈指数级上升。标注数据稀缺导致模型泛化能
MySQL存储结构深度解析：Buffer Pool与Page管理 hdzw20 mysql复习 mysql 数据库
MySQL存储结构解析：BufferPool与Page管理在MySQL的InnoDB存储引擎中，BufferPool是其核心组件之一，它极大地提升了数据库的性能。理解BufferPool的内部结构和工作机制，对于优化MySQL数据库至关重要。本文将讨论BufferPool的结构、三大链表、改进型LRU算法以及ChangeBuffer机制。1.BufferPool结构：控制块与缓存页BufferPo
青少年编程与数学 02-022 专业应用软件简介 24 项目管理工具：Trello
青少年编程与数学02-022专业应用软件简介24项目管理工具：Trello引言一、Trello的发展背景与历程1.1创立初衷1.2被Atlassian收购二、Trello的核心功能与特性2.1看板式任务管理（KanbanBoard）2.2卡片内容丰富性2.3自动化与规则引擎（Butler）2.4团队协作与权限管理三、Trello的应用场景与行业应用3.1软件开发与敏捷项目管理3.2市场营销与内容策
Three.js引擎开发：Three.js动画系统实现_（9）.Three.js中的骨骼动画实现 chenlz2007 游戏开发 javascript nginx 开发语言 vr 性能优化 ecmascript 前端
Three.js中的骨骼动画实现在上一节中，我们介绍了如何在Three.js中加载和显示3D模型。接下来，我们将深入探讨如何在Three.js中实现骨骼动画。骨骼动画是一种高级的动画技术，它通过控制模型的骨骼来驱动模型的动画，广泛应用于虚拟角色的动画制作。在本节中，我们将学习如何在Three.js中实现骨骼动画，包括骨骼动画的基本原理、如何加载带有骨骼的模型、如何创建和控制动画混合器（Animat
虚幻引擎UE5专用服务器游戏开发-19 设置头顶状态条可见性控制 AA陈超虚幻 ue5 游戏引擎 c++游戏服务器
头顶状态条的动态显示控制。状态条会根据与玩家角色的距离（默认300单位）进行自动隐藏，并通过定时器（默认0.2秒频率）持续检测距离变化。当角色由本地玩家控制时，状态条会自动隐藏。代码采用服务器-客户端初始化架构，并包含碰撞设置、组件创建等基础角色配置。Source/Crunch/Public/Character/CCharacter.h：变量：//计时器频率UPROPERTY(EditDefaul
MiniMind：3小时训练26MB微型语言模型，开源项目助力AI初学者快速入门 nine是个工程师关注人工智能语言模型开源
开发｜界面｜引擎｜交付｜副驾——重写全栈法则：AI原生的倍速造应用流来自全栈程序员nine的探索与实践，持续迭代中。欢迎关注评论私信交流~在大型语言模型(LLaMA、GPT等)日益流行的今天，一个名为MiniMind的开源项目正在AI学习圈内引起广泛关注。这个项目让初学者能够在3小时内从零开始训练出一个仅26.88MB大小的微型语言模型，体积仅为GPT-3的七千分之一，却完整覆盖了从数据处理到模型
如何让AI真正理解你的意图（自适应Prompt实战指南） nine是个工程师大语言模型人工智能 prompt
目前的LLM模型，在理解用户意图方面，正在使用自适应Prompt技术，来提升模型的理解能力。目前使用deepseek推理模型能明显看到自适应的一个过程。前言：为什么你的AI总是"答非所问"？相信很多人都遇到过这样的情况：你问：“帮我写一个Python爬虫”AI答：给你一堆理论知识和完整教程（你只想要简单代码）你问：“推荐一部电影”AI答：推荐了《教父》（你想看轻松喜剧）你问：“解释一下机器学习”A
上下文工程：AI 智能体架构落地的关键新技术一休哥助手人工智能人工智能架构
摘要随着大语言模型（LLM）驱动的智能体（Agent）逐渐成为下一代人机交互的核心范式，上下文管理已成为决定智能体性能与可靠性的关键瓶颈。本文提出“上下文工程”（ContextEngineering）作为智能体架构落地的核心技术方向，系统阐述其在解决长上下文依赖、多轮交互一致性、动态知识更新等挑战中的核心作用。通过分层架构设计、动态压缩策略与向量化增强技术，上下文工程显著提升智能体的记忆效率与推理
Cursor这类编程Agent软件的模型架构与工作流程 nine是个工程师谈谈架构 Agent 架构
开发｜界面｜引擎｜交付｜副驾——重写全栈法则：AI原生的倍速造应用流来自全栈程序员nine的探索与实践，持续迭代中。欢迎评论私信交流。最近在关注和输出一系列AIGC架构。模型架构与工作流程大语言模型（LLM）核心编程Agent的核心是一个强大的大语言模型，负责理解用户意图并生成相应的代码和解决方案。Cursor这类编程Agent通常基于GPT-4或Claude等先进大语言模型构建。这些模型通过海量
【AI大模型前沿】OmniAudio：阿里通义实验室的空间音频生成模型，开启沉浸式体验新时代寻道AI小兵 AI大模型 -前沿技术追踪人工智能音视频开源 AIGC 语言模型
系列篇章No.文章1【AI大模型前沿】深度剖析瑞智病理大模型RuiPath：如何革新癌症病理诊断技术2【AI大模型前沿】清华大学CLAMP-3：多模态技术引领音乐检索新潮流3【AI大模型前沿】浙大携手阿里推出HealthGPT：医学视觉语言大模型助力智能医疗新突破4【AI大模型前沿】阿里QwQ-32B：320亿参数推理大模型，性能比肩DeepSeek-R1，免费开源5【AI大模型前沿】TRELLI
百度搜索下拉框,下拉菜单怎么做?如何刷? mt_187 日常记录技术收藏 html5
搜索下拉菜单反馈性关键词是用户在搜索时与搜索引擎的第一步互动，在互动过程中，搜索引擎的反馈关键词不断调整来满足用户的个性需求。搜索下拉框存在的下拉词，每天的点击浏览量都很高，这意味着很多企业都在抢这个位置，在网民搜索自己的行业主关键词时，下拉中的词条是行业关键词和自己品牌词或其他营销类词的整体呈现。搜索下拉框菜单怎么做?如何刷?在本篇文章中您将会了解到以下信息。第一部分搜索下拉框菜单原理第二部分怎
为什么MySQL怕排序，Redis ZSet却秒杀？跳表+亿级数据的架构暴力美学
某证券交易所实时股价排序系统突发故障：处理10万支股票的排序请求从毫秒级飙升到12秒。事后发现ZSet元素数量突破阈值后，底层结构未能从listpack切换到跳表，导致性能断崖式下跌。这个千万级损失的案例揭示了ZSet底层实现的关键性。一、ZSet双引擎架构：自适应存储的艺术1.小数据高效存储：listpack（Redis7.0+）//listpack内存结构示例[总字节数][元素数量][元素1]
MySQL如何查看某个表所占空间大小？（表空间大小查看方法） lwb_0118 面试学习路线阿里巴巴 mysql android 数据库
文章目录一、使用SQL查询查看表空间1.1查询所有表的大小（包括数据和索引）1.2查询特定数据库的表大小1.3查询单个表的详细空间信息二、使用命令行工具查看表空间2.1使用`mysql`客户端查询2.2查看物理文件大小（适用于MyISAM/InnoDB）三、查看InnoDB表的空间使用详情3.1查看InnoDB表空间状态3.2查看InnoDB引擎状态（包含缓冲池等信息）3.3查询InnoDB表空间
恒创科技：香港站群服务器做seo站群优化效果如何海外空间恒创科技科技服务器运维
香港站群服务器做SEO站群优化效果如何?在当前搜索引擎优化竞争日益激烈的环境下，越来越多的企业开始关注站群策略这一高效的SEO手段。作为亚洲重要的网络枢纽，香港站群服务器因其独特优势，正成为实施SEO站群优化的热门选择。本文将客观分析香港服务器在SEO站群优化中的实际效果，帮助您做出明智的技术选型决策。香港站群服务器的先天优势：地理位置和网络基建香港站群服务器就像个“黄金地段”的商铺——背靠中国大
ClickHouse高频面试题野老杂谈数据库
ClickHouse高频面试题1、简单介绍一下ClickHouse2、ClickHouse具有哪些特点3、ClickHouse作为一款高性能OLAP数据库，存在哪些不足4、ClickHouse有哪些表引擎5、介绍下Log系列表引擎应用场景共性特点不支持6、简单介绍下MergeTree系列引擎7、简单介绍下外部集成表引擎ODBCJDBCMySQLHDFSKafkaRabbitMQ8、ClickHou
浏览器渲染引擎和JS引擎分类
渲染引擎：Firefox：Gecko引擎Safari：WebKit引擎Chrome：Blink引擎IE:Trident引擎Edge:EdgeHTML引擎JS引擎：SpiderMonkey(Firefox)，火狐Nitro/JavaScriptCore(Safari)，苹果IOS浏览器V8(Chrome,Chromium)，Node.js也是V8Chakra(MicrosoftInternetExp
智慧城市大脑：城市治理的新引擎 Fulima_cloud 智慧城市人工智能
在科技日新月异的今天，智慧城市的概念已经深入人心。而智慧城市大脑，作为智慧城市的中枢神经系统，运用大数据、云计算、物联网、人工智能等先进技术，构建的城市级智能化管理体系，正逐步成为提升城市治理能力、优化城市服务、推动城市可持续发展的重要力量。智慧城市大脑是什么，简而言之，是运用大数据、云计算、物联网、人工智能等先进技术，构建的城市级智能化管理体系。它如同城市的“智慧中枢”，通过对城市全域运行数据的
UA池和代理IP池 itLaity Python基础知识讲解与总结中间件 http py 代理模式
scrapy中中间件：位于scrapy引擎和下载器之间的一层组件作用：（1）引擎将请求传递给下载器过程中，下载中间件可以对请求进行一系列处理。比如设置请求的User-Agent，设置代理等（2）在下载器完成将Response传递给引擎中，下载中间件可以对响应进行一系列处理。比如进行gzip解压等。middlewares（中间件py文件）spider:从这里开始--->作用:产生一个或者一批url/
深入解读MCP：构建低延迟、高吞吐量通信中间件 LCG元 MCP 中间件
目录MCP核心架构设计MCP中间件架构图协议设计与消息格式MCP协议头结构消息体编码示例核心模块实现1.高性能网络层（基于Netty）2.零拷贝内存队列3.高效路由引擎4.消息持久化模块性能优化技巧1.批量合并写操作2.CPU缓存行优化3.内存池技术可靠性保障机制消息处理流程图实现代码：消息重试机制性能基准测试压测环境配置性能测试结果生产部署方案集群拓扑图部署脚本示例总结与最佳实践性能优化矩阵部署
NCCL 核心集体通信操作深度解析：从原理到优化实践清风 001 AI大模型底层建设 gpu算力 ai
目录引言：NCCL——分布式训练的通信引擎一、NCCL基础：GPU通信的“加速器”1.1NCCL与MPI的协同1.2集体通信的价值二、NCCL核心操作深度解析2.1AllGather：全局数据聚合2.1.1定义与目标2.1.2算法原理2.1.3性能影响因素2.1.4测试方法（nccl-tests）2.2AllReduce：梯度聚合的核心2.2.1定义与目标2.2.2算法原理2.2.3性能影响因素2
阿里开源WebSailor：超越闭源模型的网络智能体新星
WebSailor简介与开源背景在人工智能领域持续创新的浪潮中，阿里通义实验室于2025年7月正式开源了其突破性成果——WebSailor网络智能体。这一开源项目标志着中国企业在复杂推理与检索技术领域的重要突破，其设计初衷直指开源生态中长期存在的关键短板：面对超高不确定性任务时的系统性推理能力缺失。填补开源生态的关键空白WebSailor的诞生源于一个被长期忽视的技术鸿沟。根据斯坦福大学《2025
哪家香港站群服务器比较好用？海外空间恒创科技站群服务器服务器香港站群服务器
面对鱼龙混杂的服务商市场，哪家的香港站群服务器真正稳定？毕竟搞站群最怕的就是服务器抽风，轻则掉排名，重则客户跑光光。今天咱就重点聊聊哪家香港站群服务器比较好用？一般来说，在选择香港站群服务器提供商时，稳定性、IP资源、网络质量以及售后服务是关键考量因素。1.服务器的稳定性服务器频繁宕机或网络波动会导致站群网站无法访问，不仅影响用户体验，还会导致搜索引擎排名下滑，甚至被降权。稳定的服务器环境利于搜索
知识图谱系列（2）：知识图谱的技术架构与组成要素程序员查理 #知识图谱知识图谱架构人工智能 AI Agent RAG
1.引言知识图谱作为一种强大的知识表示和组织方式，已经在搜索引擎、推荐系统、智能问答等多个领域展现出巨大的价值。在之前的上一篇文章中，我们介绍了知识图谱的基础概念与发展历程，了解了知识图谱的定义、核心特征、发展历史以及在AI发展中的地位与作用。要深入理解和应用知识图谱，我们需要进一步探索其内部的技术架构和组成要素。知识图谱不仅仅是一个简单的数据结构，而是一个复杂的技术体系，涉及知识的表示、存储、查
【vLLM 学习】Eagle
vLLM是一款专为大语言模型推理加速而设计的框架，实现了KV缓存内存几乎零浪费，解决了内存管理瓶颈问题。更多vLLM中文文档及教程可访问→https://vllm.hyper.ai/*在线运行vLLM入门教程：零基础分步指南源码examples/offline_inference/eagle.py#SPDX-License-Identifier:Apache-2.0importargparseim
推测性解码：加速多模态大型语言模型的推理人工智能培训咨询叶梓人工智能前沿语言模型人工智能自然语言处理计算机视觉推理多模态算法
大模型（LLMs）以其卓越的性能在多个应用场景中大放异彩。然而，随着应用的深入，这些模型的推理速度问题逐渐凸显。为了解决这一挑战，推测性解码（SpeculativeDecoding,SPD）技术应运而生。本文深入探讨了SPD在多模态大型语言模型（MLLMs）中的应用，尤其是针对LLaVA7B模型的优化。MLLMs通过融合视觉和文本数据，极大地丰富了模型与用户的互动，但同时也面临着自回归生成和内存带
Spring AI：Tool Calling 虾条_花吹雪 Spring AI ai java
工具调用（也称为函数调用）是人工智能应用程序中的一种常见模式，允许模型与一组API或工具交互，以增强其功能。工具主要用于：信息检索。此类工具可用于从外部源（如数据库、web服务、文件系统或web搜索引擎）检索信息。目标是增强模型的知识，使其能够回答否则无法回答的问题。因此，它们可用于检索增强生成（RAG）场景。例如，一个工具可用于检索给定位置的当前天气，检索最新的新闻文章，或查询数据库中的特定记录
！LangChain自定义代理开发深度解析(44) Android 小码蜂测试专栏 langchain microsoft .net
LangChain自定义代理开发深度解析一、LangChain代理基础架构1.1代理核心概念LangChain中的代理（Agent）是一种能够根据工具调用和对话历史自主决策的智能体。它通过以下核心机制实现智能交互：工具调用：代理可调用外部工具（如搜索引擎、计算器等）获取实时信息对话历史管理：维护多轮对话上下文，支持状态追踪决策逻辑：基于LLM生成决策，判断是否需要调用工具或直接回答1.2代理核心组
！LangChain工具选择与调用策略深入解析(42)
LangChain工具选择与调用策略深入解析一、LangChain工具概述1.1工具的定义与作用LangChain中的工具（Tool）是用于扩展语言模型能力的核心组件，它允许开发者将外部功能或资源集成到基于语言模型的应用中。工具的本质是封装了特定功能的可调用单元，例如调用搜索引擎获取实时信息、操作数据库执行查询、调用文件系统读取数据等。通过工具，LangChain能够弥补语言模型自身能力的局限，使
！ LangChain工具选择与调用策略深入解析(41) Android 小码蜂测试专栏 langchain 数据库服务器
LangChain工具选择与调用策略深入解析一、LangChain工具概述1.1工具的定义与作用LangChain中的工具（Tool）是用于扩展语言模型能力的核心组件，它允许开发者将外部功能或资源集成到基于语言模型的应用中。工具的本质是封装了特定功能的可调用单元，例如调用搜索引擎获取实时信息、操作数据库执行查询、调用文件系统读取数据等。通过工具，LangChain能够弥补语言模型自身能力的局限，使
LangChain内置代理类型深度对比分析(43) Android 小码蜂 LangChain框架入门 langchain 人工智能深度学习神经网络自然语言处理
LangChain内置代理类型深度对比分析一、LangChain代理概述与核心价值1.1代理在LangChain中的定位在LangChain框架体系里，代理（Agent）扮演着智能任务执行者的关键角色。它区别于普通的链式结构，能够依据任务需求，动态调用不同工具（Tool）、结合语言模型的推理能力，自主规划执行步骤并完成复杂任务。无论是智能问答、代码生成，还是数据分析等场景，代理都可通过灵活组合工具
TOMCAT在POST方法提交参数丢失问题 357029540 java tomcat jsp
摘自http://my.oschina.net/luckyi/blog/213209 昨天在解决一个BUG时发现一个奇怪的问题，一个AJAX提交数据在之前都是木有问题的，突然提交出错影响其他处理流程。检查时发现页面处理数据较多，起初以为是提交顺序不正确修改后发现不是由此问题引起。于是删除掉一部分数据进行提交，较少数据能够提交成功。恢复较多数据后跟踪提交FORM DATA ，发现数
在MyEclipse中增加JSP模板删除-2008-08-18 ljy325 jsp xml MyEclipse
在D:\Program Files\MyEclipse 6.0\myeclipse\eclipse\plugins\com.genuitec.eclipse.wizards_6.0.1.zmyeclipse601200710\templates\jsp 目录下找到Jsp.vtl，复制一份，重命名为jsp2.vtl,然后把里面的内容修改为自己想要的格式，保存。然后在 D:\Progr
JavaScript常用验证脚本总结 eksliang JavaScript javaScript表单验证
转载请出自出处：http://eksliang.iteye.com/blog/2098985 下面这些验证脚本，是我在这几年开发中的总结，今天把他放出来，也算是一种分享吧，现在在我的项目中也在用！包括日期验证、比较，非空验证、身份证验证、数值验证、Email验证、电话验证等等...! &nb
微软BI（4） 18289753290 微软BI SSIS
1） Q:查看ssis里面某个控件输出的结果： A MessageBox.Show(Dts.Variables["v_lastTimestamp"].Value.ToString()); 这是我们在包里面定义的变量 2):在关联目的端表的时候如果是一对多的关系，一定要选择唯一的那个键作为关联字段。 3) Q：ssis里面如果将多个数据源的数据插入目的端一
定时对大数据量的表进行分表对数据备份酷的飞上天空大数据量
工作中遇到数据库中一个表的数据量比较大，属于日志表。正常情况下是不会有查询操作的，但如果不进行分表数据太多，执行一条简单sql语句要等好几分钟。。分表工具：linux的shell + mysql自身提供的管理命令原理：使用一个和原表数据结构一样的表，替换原表。 linux shell内容如下： =======================开始
本质的描述与因材施教永夜-极光感想随笔
不管碰到什么事,我都下意识的想去探索本质,找寻一个最形象的描述方式。我坚信,世界上对一件事物的描述和解释,肯定有一种最形象,最贴近本质,最容易让人理解 &
很迷茫。。。随便小屋随笔
小弟我今年研一，也是从事的咱们现在最流行的专业（计算机）。本科三流学校，为了能有个更好的跳板，进入了考研大军，非常有幸能进入研究生的行业（具体学校就不说了，怕把学校的名誉给损了）。先说一下自身的条件，本科专业软件工程。主要学习就是软件开发，几乎和计算机没有什么区别。因为学校本身三流，也就是让老师带着学生学点东西，然后让学生毕业就行了。对专业性的东西了解的非常浅。就那学的语言来说
23种设计模式的意图和适用范围 aijuans 设计模式
Factory Method 意图定义一个用于创建对象的接口，让子类决定实例化哪一个类。Factory Method 使一个类的实例化延迟到其子类。　　适用性当一个类不知道它所必须创建的对象的类的时候。　　当一个类希望由它的子类来指定它所创建的对象的时候。　　当类将创建对象的职责委托给多个帮助子类中的某一个，并且你希望将哪一个帮助子类是代理者这一信息局部化的时候。 Abstr
Java中的synchronized和volatile aoyouzi java volatile synchronized
说到Java的线程同步问题肯定要说到两个关键字synchronized和volatile。说到这两个关键字，又要说道JVM的内存模型。JVM里内存分为main memory和working memory。 Main memory是所有线程共享的，working memory则是线程的工作内存，它保存有部分main memory变量的拷贝，对这些变量的更新直接发生在working memo
js数组的操作和this关键字百合不是茶 js 数组操作 this关键字
js数组的操作; 一:数组的创建: 1、数组的创建 var array = new Array();　//创建一个数组 var array = new Array([size]);　//创建一个数组并指定长度，注意不是上限，是长度 var arrayObj = new Array([element0[, element1[, ...[, elementN]]]
别人的阿里面试感悟 bijian1013 面试分享工作感悟阿里面试
原文如下：http://greemranqq.iteye.com/blog/2007170 一直做企业系统，虽然也自己一直学习技术，但是感觉还是有所欠缺，准备花几个月的时间，把互联网的东西，以及一些基础更加的深入透析，结果这次比较意外，有点突然，下面分享一下感受吧！ &nb
淘宝的测试框架Itest Bill_chen spring maven 框架单元测试 JUnit
Itest测试框架是TaoBao测试部门开发的一套单元测试框架，以Junit4为核心，集合DbUnit、Unitils等主流测试框架，应该算是比较好用的了。近期项目中用了下，有关itest的具体使用如下： 1.在Maven中引入itest框架： <dependency> <groupId>com.taobao.test</groupId&g
【Java多线程二】多路条件解决生产者消费者问题 bit1129 java多线程
package com.tom; import java.util.LinkedList; import java.util.Queue; import java.util.concurrent.ThreadLocalRandom; import java.util.concurrent.locks.Condition; import java.util.concurrent.loc
汉字转拼音pinyin4j 白糖_ pinyin4j
以前在项目中遇到汉字转拼音的情况，于是在网上找到了pinyin4j这个工具包，非常有用，别的不说了，直接下代码： import java.util.HashSet; import java.util.Set; import net.sourceforge.pinyin4j.PinyinHelper; import net.sourceforge.pinyin
org.hibernate.TransactionException: JDBC begin failed解决方案 bozch ssh 数据库异常 DBCP
org.hibernate.TransactionException: JDBC begin failed: at org.hibernate.transaction.JDBCTransaction.begin(JDBCTransaction.java:68) at org.hibernate.impl.SessionImp
java-并查集（Disjoint-set）-将多个集合合并成没有交集的集合 bylijinnan java
import java.util.ArrayList; import java.util.Arrays; import java.util.HashMap; import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.Map; import java.ut
Java PrintWriter打印乱码 chenbowen00 java
一个小程序读写文件，发现PrintWriter输出后文件存在乱码，解决办法主要统一输入输出流编码格式。读文件： BufferedReader 从字符输入流中读取文本，缓冲各个字符，从而提供字符、数组和行的高效读取。可以指定缓冲区的大小，或者可使用默认的大小。大多数情况下，默认值就足够大了。通常，Reader 所作的每个读取请求都会导致对基础字符或字节流进行相应的读取请求。因
[天气与气候]极端气候环境 comsci 环境
如果空间环境出现异变...外星文明并未出现,而只是用某种气象武器对地球的气候系统进行攻击,并挑唆地球国家间的战争,经过一段时间的准备...最大限度的削弱地球文明的整体力量,然后再进行入侵...... 那么地球上的国家应该做什么样的防备工作呢? &n
oracle order by与union一起使用的用法 daizj UNION oracle order by
当使用union操作时，排序语句必须放在最后面才正确，如下：只能在union的最后一个子查询中使用order by，而这个order by是针对整个unioning后的结果集的。So：如果unoin的几个子查询列名不同，如 Sql代码 select supplier_id, supplier_name from suppliers UNI
zeus持久层读写分离单元测试 deng520159 单元测试
本文是zeus读写分离单元测试,距离分库分表,只有一步了.上代码: 1.ZeusMasterSlaveTest.java package com.dengliang.zeus.webdemo.test; import java.util.ArrayList; import java.util.List; import org.junit.Assert; import org.j
Yii 截取字符串(UTF-8) 使用组件 dcj3sjt126com yii
1.将Helper.php放进protected\components文件夹下。 2.调用方法： Helper::truncate_utf8_string($content,20,false); //不显示省略号 Helper::truncate_utf8_string($content,20); //显示省略号 &n
安装memcache及php扩展 dcj3sjt126com PHP
安装memcache tar zxvf memcache-2.2.5.tgz cd memcache-2.2.5/ /usr/local/php/bin/phpize (?) ./configure --with-php-confi
JsonObject 处理日期 feifeilinlin521 java json JsonOjbect JsonArray JSONException
写这边文章的初衷就是遇到了json在转换日期格式出现了异常 net.sf.json.JSONException: java.lang.reflect.InvocationTargetException 原因是当你用Map接收数据库返回了java.sql.Date 日期的数据进行json转换出的问题话不多说直接上代码 &n
Ehcache（06）——监听器 234390216 监听器 listener ehcache
监听器 Ehcache中监听器有两种，监听CacheManager的CacheManagerEventListener和监听Cache的CacheEventListener。在Ehcache中，Listener是通过对应的监听器工厂来生产和发生作用的。下面我们将来介绍一下这两种类型的监听器。
activiti 自带设计器中chrome 34版本不能打开bug的解决 jackyrong Activiti
在acitivti modeler中，如果是chrome 34，则不能打开该设计器，其他浏览器可以，经证实为bug，参考 http://forums.activiti.org/content/activiti-modeler-doesnt-work-chrome-v34 修改为，找到 oryx.debug.js 在最头部增加 if (!Document.
微信收货地址共享接口-终极解决 laotu5i0 微信开发
最近要接入微信的收货地址共享接口，总是不成功，折腾了好几天，实在没办法网上搜到的帖子也是骂声一片。我把我碰到并解决问题的过程分享出来，希望能给微信的接口文档起到一个辅助作用，让后面进来的开发者能快速的接入，而不需要像我们一样苦逼的浪费好几天，甚至一周的青春。各种羞辱、谩骂的话就不说了，本人还算文明。如果你能搜到本贴，说明你已经碰到了各种 ed
关于人才 netkiller.github.com 工作面试招聘 netkiller 人才
关于人才每个月我都会接到许多猎头的电话，有些猎头比较专业，但绝大多数在我看来与猎头二字还是有很大差距的。与猎头接触多了，自然也了解了他们的工作，包括操作手法，总体上国内的猎头行业还处在初级阶段。总结就是“盲目推荐，以量取胜”。目前现状许多从事人力资源工作的人，根本不懂得怎么找人才。处在人才找不到企业，企业找不到人才的尴尬处境。企业招聘，通常是需要用人的部门提出招聘条件，由人
搭建 CentOS 6 服务器 - 目录 rensanning centos
(1) 安装CentOS ISO（desktop/minimal）、Cloud（AWS/阿里云）、Virtualization（VMWare、VirtualBox）详细内容 (2) Linux常用命令 cd、ls、rm、chmod...... 详细内容 (3) 初始环境设置用户管理、网络设置、安全设置...... 详细内容 (4) 常驻服务Daemon
【求助】mongoDB无法更新主键 toknowme mongodb
Query query = new Query(); query.addCriteria(new Criteria("_id").is(o.getId())); &n
jquery 页面滚动到底部自动加载插件集合 xp9802 jquery
很多社交网站都使用无限滚动的翻页技术来提高用户体验，当你页面滑到列表底部时候无需点击就自动加载更多的内容。下面为你推荐 10 个 jQuery 的无限滚动的插件： 1. jQuery ScrollPagination jQuery ScrollPagination plugin 是一个 jQuery 实现的支持无限滚动加载数据的插件。 2. jQuery Screw S