weixin_30367169

xgboost 源码学习

官方代码结构解析，README.MD

XGboost 回归时，损失函数式平方误差损失

分类时，是对数自燃损失；

Coding Guide
======
This file is intended to be notes about code structure in xgboost

Project Logical Layout // 依赖关系，IO -> LEANER（计算梯度并且传导给GBM）-> GBM（梯度提升） -> TREE(构建树的算法)  
=======
* Dependency order: io->learner->gbm->tree
  - All module depends on data.h
* tree are implementations of tree construction algorithms.
* gbm is gradient boosting interface, that takes trees and other base learner to do boosting.
  - gbm only takes gradient as sufficient statistics, it does not compute the gradient.
* learner is learning module that computes gradient for specific object, and pass it to GBM

File Naming Convention // .h定义数据结构和接口，.hpp实现接口
======= 
* .h files are data structures and interface, which are needed to use functions in that layer.
* -inl.hpp files are implementations of interface, like cpp file in most project.
  - You only need to understand the interface file to understand the usage of that layer
* In each folder, there can be a .cpp file, that compiles the module of that layer

How to Hack the Code // 目标函数定义和修改
======
* Add objective function: add to learner/objective-inl.hpp and register it in learner/objective.h ```CreateObjFunction``` 
  - You can also directly do it in python
* Add new evaluation metric: add to learner/evaluation-inl.hpp and register it in learner/evaluation.h ```CreateEvaluator``` 
* Add wrapper for a new language, most likely you can do it by taking the functions in python/xgboost_wrapper.h, which is purely C based, and call these C functions to use xgboost

XGBoost: eXtreme Gradient Boosting

An optimized general purpose gradient boosting library. The library is parallelized, and also provides an optimized distributed version.
It implements machine learning algorithm under gradient boosting framework, including generalized linear model and gradient boosted regression tree (GBDT). XGBoost can also also distributed and scale to Terascale data.

 

  UpdateOneIter流程主要有以下几个步骤：

  1. LazyInitDMatrix(train); 
  2. PredictRaw(train, &preds_); 
  3. obj_->GetGradient(preds_, train->info(), iter, &gpair_); 
  4. gbm_->DoBoost(train, &gpair_, obj_.get());

objective.h 文件

#ifndef XGBOOST_LEARNER_OBJECTIVE_H_
#define XGBOOST_LEARNER_OBJECTIVE_H_
/*!
 * \file objective.h
 * \brief interface of objective function used for gradient boosting
 * \author Tianqi Chen, Kailong Chen
 */
#include "dmatrix.h"

namespace xgboost {
namespace learner {
/*! \brief interface of objective function */
class IObjFunction{/// 所有目标函数的基类定义
 public:
  /*! \brief virtual destructor */
  virtual ~IObjFunction(void){} /// 虚析构函数，释放空间
  /*!
   * \brief set parameters from outside
   * \param name name of the parameter
   * \param val value of the parameter
   */
  virtual void SetParam(const char *name, const char *val) = 0;  /// 参数名、参数值
  /*!
   * \brief get gradient over each of predictions, given existing information
   * \param preds prediction of current round
   * \param info information about labels, weights, groups in rank
   * \param iter current iteration number
   * \param out_gpair output of get gradient, saves gradient and second order gradient in
   */
  virtual void GetGradient(const std::vector &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector *out_gpair) = 0; /// 计算梯度
  /*! \return the default evaluation metric for the objective */
  virtual const char* DefaultEvalMetric(void) const = 0; /// 默认评测函数
  // the following functions are optional, most of time default implementation is good enough
  /*!
   * \brief transform prediction values, this is only called when Prediction is called
   * \param io_preds prediction values, saves to this vector as well
   */
  virtual void PredTransform(std::vector *io_preds){}
  /*!
   * \brief transform prediction values, this is only called when Eval is called, 
   *  usually it redirect to PredTransform
   * \param io_preds prediction values, saves to this vector as well
   */
  virtual void EvalTransform(std::vector *io_preds) {
    this->PredTransform(io_preds);
  }
  /*!
   * \brief transform probability value back to margin
   * this is used to transform user-set base_score back to margin 
   * used by gradient boosting
   * \return transformed value
   */
  virtual float ProbToMargin(float base_score) const {
    return base_score;
  }
};
}  // namespace learner
}  // namespace xgboost

// this are implementations of objective functions  /// .hpp中是目标函数的实现
#include "objective-inl.hpp"
// factory function
namespace xgboost {
namespace learner {
/*! \brief factory funciton to create objective function by name */
inline IObjFunction* CreateObjFunction(const char *name) { /// 实现的目标函数，根据传入名称确定调用哪个
  using namespace std;
  /// RegLossObj类实现，传入不同的参数对应不同的损失
  if (!strcmp("reg:linear", name)) return new RegLossObj(LossType::kLinearSquare);
  if (!strcmp("reg:logistic", name)) return new RegLossObj(LossType::kLogisticNeglik);
  if (!strcmp("binary:logistic", name)) return new RegLossObj(LossType::kLogisticClassify);
  if (!strcmp("binary:logitraw", name)) return new RegLossObj(LossType::kLogisticRaw);

  /// PoissonRegression类实现
  if (!strcmp("count:poisson", name)) return new PoissonRegression();

  /// SoftmaxMultiClassObj 类实现
  if (!strcmp("multi:softmax", name)) return new SoftmaxMultiClassObj(0);
  if (!strcmp("multi:softprob", name)) return new SoftmaxMultiClassObj(1);

  /// 分别由LambdaRankObj LambdaRankObjNDCG  LambdaRankObjMAP 实现
  if (!strcmp("rank:pairwise", name )) return new PairwiseRankObj();
  if (!strcmp("rank:ndcg", name)) return new LambdaRankObjNDCG();
  if (!strcmp("rank:map", name)) return new LambdaRankObjMAP();  
  utils::Error("unknown objective function type: %s", name);
  return NULL;
}
}  // namespace learner
}  // namespace xgboost
#endif  // XGBOOST_LEARNER_OBJECTIVE_H_


/// .h定义数据结构和接口，.hpp实现接口

/*
/// 八种定义，针对不同的目标函数有不同的求解结果
“reg:linear” –线性回归。
“reg:logistic” –逻辑回归。
“binary:logistic” –二分类的逻辑回归问题，输出为概率。
“binary:logitraw” –二分类的逻辑回归问题，输出的结果为wTx。
“count:poisson” –计数问题的poisson回归，输出结果为poisson分布。 在poisson回归中，max_delta_step的缺省值为0.7。(used to safeguard optimization)
“multi:softmax” –让XGBoost采用softmax目标函数处理多分类问题，同时需要设置参数num_class（类别个数）
“multi:softprob” –和softmax一样，但是输出的是ndata * nclass的向量，可以将该向量reshape成ndata行nclass列的矩阵。没行数据表示样本所属于每个类别的概率。
“rank:pairwise” –set XGBoost to do ranking task by minimizing the pairwise loss,比如AUC 这类的，就是pairwise
*/

/*
///
  UpdateOneIter流程主要有以下几个步骤：

  1. LazyInitDMatrix(train); 
  2. PredictRaw(train, &preds_); 
  3. obj_->GetGradient(preds_, train->info(), iter, &gpair_); 
  4. gbm_->DoBoost(train, &gpair_, obj_.get());

*/

objective-inl.hpp 文件：

#ifndef XGBOOST_LEARNER_OBJECTIVE_INL_HPP_
#define XGBOOST_LEARNER_OBJECTIVE_INL_HPP_
/*!
 * \file objective-inl.hpp
 * \brief objective function implementations
 * \author Tianqi Chen, Kailong Chen
 */


/// 关于目标函数的求解可以参看： https://www.cnblogs.com/harvey888/p/7203256.html
/// 算法原理：http://wepon.me/files/gbdt.pdf
/// 目标函数推导分析：https://blog.csdn.net/yuxeaotao/article/details/90378782
/// https://blog.csdn.net/a819825294/article/details/51206410

/// 源码流程：https://blog.csdn.net/matrix_zzl/article/details/78699605
/// 源码主要函数：https://blog.csdn.net/weixin_39750084/article/details/83244191


#include 
#include 
#include 
#include 
#include 
#include "../data.h"
#include "./objective.h"
#include "./helper_utils.h"
#include "../utils/random.h"
#include "../utils/omp.h"

namespace xgboost {
namespace learner {/// 实现一些常用的计算功能，并定义为inline
/*! \brief defines functions to calculate some commonly used functions */
struct LossType {
  /*! \brief indicate which type we are using */
  int loss_type;
  // list of constants
  static const int kLinearSquare = 0; /// 线性回归
  static const int kLogisticNeglik = 1; /// 逻辑回归,输出概率
  static const int kLogisticClassify = 2; /// 二分类，输出概率
  static const int kLogisticRaw = 3; /// 输出原始的值，sigmoid 之后就能得到概率和上面的两个相同
  /*!
   * \brief transform the linear sum to prediction
   * \param x linear sum of boosting ensemble
   * \return transformed prediction
   */
  inline float PredTransform(float x) const {/// 0和3 输出一样，1和2输出一样
    switch (loss_type) {
      case kLogisticRaw:
      case kLinearSquare: return x;
      case kLogisticClassify:
      case kLogisticNeglik: return 1.0f / (1.0f + std::exp(-x));
      default: utils::Error("unknown loss_type"); return 0.0f;
    }
  }
  /*!
   * \brief check if label range is valid
   */
  inline bool CheckLabel(float x) const {/// 判定label是否合理
    if (loss_type != kLinearSquare) {
      return x >= 0.0f && x <= 1.0f;
    }
    return true;
  }
  /*!
   * \brief error message displayed when check label fail
   */
  inline const char * CheckLabelErrorMsg(void) const {
    if (loss_type != kLinearSquare) {
      return "label must be in [0,1] for logistic regression";
    } else {
      return "";
    }
  }
  /*!
   * \brief calculate first order gradient of loss, given transformed prediction
   * \param predt transformed prediction
   * \param label true label
   * \return first order gradient
   */
  inline float FirstOrderGradient(float predt, float label) const {/// 计算不同目标函数的一阶导数，可以看到kLogisticClassify 和 kLogisticNeglik 是一样的返回值
    switch (loss_type) {
      case kLinearSquare: return predt - label;
      case kLogisticRaw: predt = 1.0f / (1.0f + std::exp(-predt));
      case kLogisticClassify:
      case kLogisticNeglik: return predt - label;
      default: utils::Error("unknown loss_type"); return 0.0f;
    }
  }
  /*!
   * \brief calculate second order gradient of loss, given transformed prediction
   * \param predt transformed prediction
   * \param label true label
   * \return second order gradient
   */
  inline float SecondOrderGradient(float predt, float label) const {/// 计算出二阶导数
    // cap second order gradient to postive value
    const float eps = 1e-16f;
    switch (loss_type) {
      case kLinearSquare: return 1.0f;
      case kLogisticRaw: predt = 1.0f / (1.0f + std::exp(-predt));
      case kLogisticClassify:
      case kLogisticNeglik: return std::max(predt * (1.0f - predt), eps); /// 设置梯度阈值
      default: utils::Error("unknown loss_type"); return 0.0f;
    }
  }
  /*!
   * \brief transform probability value back to margin
   */
  inline float ProbToMargin(float base_score) const {/// 将概率转化到范围内
    if (loss_type == kLogisticRaw ||
        loss_type == kLogisticClassify ||
        loss_type == kLogisticNeglik ) {
      utils::Check(base_score > 0.0f && base_score < 1.0f,
                   "base_score must be in (0,1) for logistic loss");
      base_score = -std::log(1.0f / base_score - 1.0f);
    }
    return base_score;
  }
  /*! \brief get default evaluation metric for the objective */
  inline const char *DefaultEvalMetric(void) const {/// 默认的评测函数
    if (loss_type == kLogisticClassify) return "error";
    if (loss_type == kLogisticRaw) return "auc";
    return "rmse";
  }
};

/*! \brief objective function that only need to */  /// 逻辑回归
class RegLossObj : public IObjFunction {/// explicit 关键字，防止构造函数的隐式自动转化，IObjFunction 来自objective.h
 public:
  explicit RegLossObj(int loss_type) {/// 原则上应该在所有的构造函数前加explicit关键字，这样可以大大减少错误的发生
    loss.loss_type = loss_type;
    scale_pos_weight = 1.0f;
  }
  virtual ~RegLossObj(void) {}/// 基类，虚析构函数（防止被子类继承在析构时发生内存泄漏）
  virtual void SetParam(const char *name, const char *val) {/// 虚函数，实现多态
    using namespace std;
    if (!strcmp("scale_pos_weight", name)) {
      scale_pos_weight = static_cast(atof(val));
    }
  }
  virtual void GetGradient(const std::vector &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector *out_gpair) {
    utils::Check(info.labels.size() != 0, "label set cannot be empty");
    utils::Check(preds.size() % info.labels.size() == 0,
                 "labels are not correctly provided");
    std::vector &gpair = *out_gpair;
    gpair.resize(preds.size());
    // check if label in range
    bool label_correct = true;
    // start calculating gradient
    const unsigned nstep = static_cast(info.labels.size());
    const bst_omp_uint ndata = static_cast(preds.size());
    #pragma omp parallel for schedule(static) /// 下面的循环，多线程并行编程，静态调度
    for (bst_omp_uint i = 0; i < ndata; ++i) {
      const unsigned j = i % nstep;
      float p = loss.PredTransform(preds[i]);
      float w = info.GetWeight(j);
      if (info.labels[j] == 1.0f) w *= scale_pos_weight;
      if (!loss.CheckLabel(info.labels[j])) label_correct = false;
      gpair[i] = bst_gpair(loss.FirstOrderGradient(p, info.labels[j]) * w,
                           loss.SecondOrderGradient(p, info.labels[j]) * w);
    }
    utils::Check(label_correct, loss.CheckLabelErrorMsg());
  }
  virtual const char* DefaultEvalMetric(void) const {
    return loss.DefaultEvalMetric();
  }
  virtual void PredTransform(std::vector *io_preds) {
    std::vector &preds = *io_preds;
    const bst_omp_uint ndata = static_cast(preds.size());
    #pragma omp parallel for schedule(static)
    for (bst_omp_uint j = 0; j < ndata; ++j) {
      preds[j] = loss.PredTransform(preds[j]);
    }
  }
  virtual float ProbToMargin(float base_score) const {
    return loss.ProbToMargin(base_score);
  }
/// 定义的类内变量为protected 可以被该类中的函数、子类的函数、以及其友元函数访问,但不能被该类的对象访问
 protected:
  float scale_pos_weight;
  LossType loss;
};

// poisson regression for count   ///泊松回归
class PoissonRegression : public IObjFunction {
 public:
  explicit PoissonRegression(void) {
    max_delta_step = 0.0f;
  }
  virtual ~PoissonRegression(void) {}
  
  virtual void SetParam(const char *name, const char *val) {
    using namespace std;
    if (!strcmp( "max_delta_step", name )) {
      max_delta_step = static_cast(atof(val));
    }
  }
  virtual void GetGradient(const std::vector &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector *out_gpair) {
    utils::Check(max_delta_step != 0.0f,
                 "PoissonRegression: need to set max_delta_step");
    utils::Check(info.labels.size() != 0, "label set cannot be empty");
    utils::Check(preds.size() == info.labels.size(),
                 "labels are not correctly provided");
    std::vector &gpair = *out_gpair;
    gpair.resize(preds.size());
    // check if label in range
    bool label_correct = true;
    // start calculating gradient
    const long ndata = static_cast(preds.size());
    #pragma omp parallel for schedule(static)
    for (long i = 0; i < ndata; ++i) {
      float p = preds[i];
      float w = info.GetWeight(i);
      float y = info.labels[i];
      if (y >= 0.0f) {
        gpair[i] = bst_gpair((std::exp(p) - y) * w,
                             std::exp(p + max_delta_step) * w);
      } else {
        label_correct = false;
      }
    }
    utils::Check(label_correct,
                 "PoissonRegression: label must be nonnegative");
  }
  virtual void PredTransform(std::vector *io_preds) {
    std::vector &preds = *io_preds;
    const long ndata = static_cast(preds.size());
    #pragma omp parallel for schedule(static)
    for (long j = 0; j < ndata; ++j) {
      preds[j] = std::exp(preds[j]);
    }
  }
  virtual void EvalTransform(std::vector *io_preds) {
    PredTransform(io_preds);
  }
  virtual float ProbToMargin(float base_score) const {
    return std::log(base_score);
  }
  virtual const char* DefaultEvalMetric(void) const {
    return "poisson-nloglik";
  }
  
 private: /// 定义的类内变量为private 只能由该类中的函数、其友元函数访问,不能被任何其他访问，该类的对象也不能访问. 
  float max_delta_step;
};

// softmax multi-class classification   /// 多分类
class SoftmaxMultiClassObj : public IObjFunction {
 public:
  explicit SoftmaxMultiClassObj(int output_prob)
      : output_prob(output_prob) {
    nclass = 0;
  }
  virtual ~SoftmaxMultiClassObj(void) {}
  virtual void SetParam(const char *name, const char *val) {
    using namespace std;
    if (!strcmp( "num_class", name )) nclass = atoi(val);
  }
  virtual void GetGradient(const std::vector &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector *out_gpair) {
    utils::Check(nclass != 0, "must set num_class to use softmax");
    utils::Check(info.labels.size() != 0, "label set cannot be empty");
    utils::Check(preds.size() % (static_cast(nclass) * info.labels.size()) == 0,
                 "SoftmaxMultiClassObj: label size and pred size does not match");
    std::vector &gpair = *out_gpair;
    gpair.resize(preds.size());
    const unsigned nstep = static_cast(info.labels.size() * nclass);
    const bst_omp_uint ndata = static_cast(preds.size() / nclass);
    int label_error = 0;
    #pragma omp parallel
    {
      std::vector rec(nclass);
      #pragma omp for schedule(static)
      for (bst_omp_uint i = 0; i < ndata; ++i) {
        for (int k = 0; k < nclass; ++k) {
          rec[k] = preds[i * nclass + k];
        }
        Softmax(&rec);
        const unsigned j = i % nstep;
        int label = static_cast(info.labels[j]);
        if (label < 0 || label >= nclass)  {
          label_error = label; label = 0;
        }
        const float wt = info.GetWeight(j);
        for (int k = 0; k < nclass; ++k) {
          float p = rec[k];
          const float h = 2.0f * p * (1.0f - p) * wt;
          if (label == k) {
            gpair[i * nclass + k] = bst_gpair((p - 1.0f) * wt, h);
          } else {
            gpair[i * nclass + k] = bst_gpair(p* wt, h);
          }
        }
      }
    }
    utils::Check(label_error >= 0 && label_error < nclass,
                 "SoftmaxMultiClassObj: label must be in [0, num_class),"\
                 " num_class=%d but found %d in label", nclass, label_error);
  }
  virtual void PredTransform(std::vector *io_preds) {
    this->Transform(io_preds, output_prob);
  }
  virtual void EvalTransform(std::vector *io_preds) {
    this->Transform(io_preds, 1);
  }
  virtual const char* DefaultEvalMetric(void) const {
    return "merror";
  }

 private:
  inline void Transform(std::vector *io_preds, int prob) {
    utils::Check(nclass != 0, "must set num_class to use softmax");
    std::vector &preds = *io_preds;
    std::vector tmp;
    const bst_omp_uint ndata = static_cast(preds.size()/nclass);
    if (prob == 0) tmp.resize(ndata);
    #pragma omp parallel
    {
      std::vector rec(nclass);
      #pragma omp for schedule(static)
      for (bst_omp_uint j = 0; j < ndata; ++j) {
        for (int k = 0; k < nclass; ++k) {
          rec[k] = preds[j * nclass + k];
        }
        if (prob == 0) {
          tmp[j] = static_cast(FindMaxIndex(rec));
        } else {
          Softmax(&rec);
          for (int k = 0; k < nclass; ++k) {
            preds[j * nclass + k] = rec[k];
          }
        }
      }
    }
    if (prob == 0) preds = tmp;
  }
  // data field
  int nclass;
  int output_prob;
};

/*! \brief objective for lambda rank */   /// LambdaRankObj 排序目标函数
class LambdaRankObj : public IObjFunction {
 public:
  LambdaRankObj(void) {
    loss.loss_type = LossType::kLogisticRaw;
    fix_list_weight = 0.0f;
    num_pairsample = 1;
  }
  virtual ~LambdaRankObj(void) {}
  virtual void SetParam(const char *name, const char *val) {
    using namespace std;
    if (!strcmp( "loss_type", name )) loss.loss_type = atoi(val);
    if (!strcmp( "fix_list_weight", name)) fix_list_weight = static_cast(atof(val));
    if (!strcmp( "num_pairsample", name)) num_pairsample = atoi(val);
  }
  virtual void GetGradient(const std::vector &preds,
                           const MetaInfo &info,
                           int iter,
                           std::vector *out_gpair) {
    utils::Check(preds.size() == info.labels.size(), "label size predict size not match");
    std::vector &gpair = *out_gpair;
    gpair.resize(preds.size());
    // quick consistency when group is not available
    std::vector tgptr(2, 0); tgptr[1] = static_cast(info.labels.size());
    const std::vector &gptr = info.group_ptr.size() == 0 ? tgptr : info.group_ptr;
    utils::Check(gptr.size() != 0 && gptr.back() == info.labels.size(),
                 "group structure not consistent with #rows");
    const bst_omp_uint ngroup = static_cast(gptr.size() - 1);
    #pragma omp parallel
    {
      // parall construct, declare random number generator here, so that each
      // thread use its own random number generator, seed by thread id and current iteration
      random::Random rnd; rnd.Seed(iter* 1111 + omp_get_thread_num());
      std::vector pairs;
      std::vector  lst;
      std::vector< std::pair > rec;
      #pragma omp for schedule(static)
      for (bst_omp_uint k = 0; k < ngroup; ++k) {
        lst.clear(); pairs.clear();
        for (unsigned j = gptr[k]; j < gptr[k+1]; ++j) {
          lst.push_back(ListEntry(preds[j], info.labels[j], j));
          gpair[j] = bst_gpair(0.0f, 0.0f);
        }
        std::sort(lst.begin(), lst.end(), ListEntry::CmpPred);
        rec.resize(lst.size());
        for (unsigned i = 0; i < lst.size(); ++i) {
          rec[i] = std::make_pair(lst[i].label, i);
        }
        std::sort(rec.begin(), rec.end(), CmpFirst);
        // enumerate buckets with same label, for each item in the lst, grab another sample randomly
        for (unsigned i = 0; i < rec.size(); ) {
          unsigned j = i + 1;
          while (j < rec.size() && rec[j].first == rec[i].first) ++j;
          // bucket in [i,j), get a sample outside bucket
          unsigned nleft = i, nright = static_cast(rec.size() - j);
          if (nleft + nright != 0) {
            int nsample = num_pairsample;
            while (nsample --) {
              for (unsigned pid = i; pid < j; ++pid) {
                unsigned ridx = static_cast(rnd.RandDouble() * (nleft+nright));
                if (ridx < nleft) {
                  pairs.push_back(LambdaPair(rec[ridx].second, rec[pid].second));
                } else {
                  pairs.push_back(LambdaPair(rec[pid].second, rec[ridx+j-i].second));
                }
              }
            }
          }
          i = j;
        }
        // get lambda weight for the pairs
        this->GetLambdaWeight(lst, &pairs);
        // rescale each gradient and hessian so that the lst have constant weighted
        float scale = 1.0f / num_pairsample;
        if (fix_list_weight != 0.0f) {
          scale *= fix_list_weight / (gptr[k+1] - gptr[k]);
        }
        for (size_t i = 0; i < pairs.size(); ++i) {
          const ListEntry &pos = lst[pairs[i].pos_index];
          const ListEntry &neg = lst[pairs[i].neg_index];
          const float w = pairs[i].weight * scale;
          float p = loss.PredTransform(pos.pred - neg.pred);
          float g = loss.FirstOrderGradient(p, 1.0f);
          float h = loss.SecondOrderGradient(p, 1.0f);
          // accumulate gradient and hessian in both pid, and nid
          gpair[pos.rindex].grad += g * w;
          gpair[pos.rindex].hess += 2.0f * w * h;
          gpair[neg.rindex].grad -= g * w;
          gpair[neg.rindex].hess += 2.0f * w * h;
        }
      }
    }
  }
  virtual const char* DefaultEvalMetric(void) const {
    return "map";
  }

 protected:
  /*! \brief helper information in a list */
  struct ListEntry {
    /*! \brief the predict score we in the data */
    float pred;
    /*! \brief the actual label of the entry */
    float label;
    /*! \brief row index in the data matrix */
    unsigned rindex;
    // constructor
    ListEntry(float pred, float label, unsigned rindex)
        : pred(pred), label(label), rindex(rindex) {}
    // comparator by prediction
    inline static bool CmpPred(const ListEntry &a, const ListEntry &b) {
      return a.pred > b.pred;
    }
    // comparator by label
    inline static bool CmpLabel(const ListEntry &a, const ListEntry &b) {
      return a.label > b.label;
    }
  };
  /*! \brief a pair in the lambda rank */
  struct LambdaPair {
    /*! \brief positive index: this is a position in the list */
    unsigned pos_index;
    /*! \brief negative index: this is a position in the list */
    unsigned neg_index;
    /*! \brief weight to be filled in */
    float weight;
    // constructor
    LambdaPair(unsigned pos_index, unsigned neg_index)
        : pos_index(pos_index), neg_index(neg_index), weight(1.0f) {}
  };
  /*!
   * \brief get lambda weight for existing pairs 
   * \param list a list that is sorted by pred score
   * \param io_pairs record of pairs, containing the pairs to fill in weights
   */
  virtual void GetLambdaWeight(const std::vector &sorted_list,
                               std::vector *io_pairs) = 0;

 private:
  // loss function
  LossType loss;
  // number of samples peformed for each instance
  int num_pairsample;
  // fix weight of each elements in list
  float fix_list_weight;
};

class PairwiseRankObj: public LambdaRankObj{
 public:
  virtual ~PairwiseRankObj(void) {}

 protected:
  virtual void GetLambdaWeight(const std::vector &sorted_list,
                               std::vector *io_pairs) {}
};

// beta version: NDCG lambda rank
class LambdaRankObjNDCG : public LambdaRankObj {
 public:
  virtual ~LambdaRankObjNDCG(void) {}

 protected:
  virtual void GetLambdaWeight(const std::vector &sorted_list,
                               std::vector *io_pairs) {
    std::vector &pairs = *io_pairs;
    float IDCG;
    {
      std::vector labels(sorted_list.size());
      for (size_t i = 0; i < sorted_list.size(); ++i) {
        labels[i] = sorted_list[i].label;
      }
      std::sort(labels.begin(), labels.end(), std::greater());
      IDCG = CalcDCG(labels);
    }
    if (IDCG == 0.0) {
      for (size_t i = 0; i < pairs.size(); ++i) {
        pairs[i].weight = 0.0f;
      }
    } else {
      IDCG = 1.0f / IDCG;
      for (size_t i = 0; i < pairs.size(); ++i) {
        unsigned pos_idx = pairs[i].pos_index;
        unsigned neg_idx = pairs[i].neg_index;
        float pos_loginv = 1.0f / std::log(pos_idx + 2.0f);
        float neg_loginv = 1.0f / std::log(neg_idx + 2.0f);
        int pos_label = static_cast(sorted_list[pos_idx].label);
        int neg_label = static_cast(sorted_list[neg_idx].label);
        float original =
            ((1 << pos_label) - 1) * pos_loginv + ((1 << neg_label) - 1) * neg_loginv;
        float changed  =
            ((1 << neg_label) - 1) * pos_loginv + ((1 << pos_label) - 1) * neg_loginv;
        float delta = (original - changed) * IDCG;
        if (delta < 0.0f) delta = - delta;
        pairs[i].weight = delta;
      }
    }
  }
  inline static float CalcDCG(const std::vector &labels) {
    double sumdcg = 0.0;
    for (size_t i = 0; i < labels.size(); ++i) {
      const unsigned rel = static_cast(labels[i]);
      if (rel != 0) {
        sumdcg += ((1 << rel) - 1) / std::log(static_cast(i + 2));
      }
    }
    return static_cast(sumdcg);
  }
};

// map LambdaRank
class LambdaRankObjMAP : public LambdaRankObj {
 public:
  virtual ~LambdaRankObjMAP(void) {}

 protected:
  struct MAPStats {
    /*! \brief the accumulated precision */
    float ap_acc;
    /*!
     * \brief the accumulated precision,
     *   assuming a positive instance is missing 
     */
    float ap_acc_miss;
    /*! 
     * \brief the accumulated precision,
     * assuming that one more positive instance is inserted ahead
     */
    float ap_acc_add;
    /* \brief the accumulated positive instance count */
    float hits;
    MAPStats(void) {}
    MAPStats(float ap_acc, float ap_acc_miss, float ap_acc_add, float hits)
        : ap_acc(ap_acc), ap_acc_miss(ap_acc_miss), ap_acc_add(ap_acc_add), hits(hits) {}
  };
  /*!
   * \brief Obtain the delta MAP if trying to switch the positions of instances in index1 or index2
   *        in sorted triples
   * \param sorted_list the list containing entry information
   * \param index1,index2 the instances switched
   * \param map_stats a vector containing the accumulated precisions for each position in a list
   */
  inline float GetLambdaMAP(const std::vector &sorted_list,
                            int index1, int index2,
                            std::vector *p_map_stats) {
    std::vector &map_stats = *p_map_stats;
    if (index1 == index2 || map_stats[map_stats.size() - 1].hits == 0) {
      return 0.0f;
    }
    if (index1 > index2) std::swap(index1, index2);
    float original = map_stats[index2].ap_acc;
    if (index1 != 0) original -= map_stats[index1 - 1].ap_acc;
    float changed = 0;
    float label1 = sorted_list[index1].label > 0.0f ? 1.0f : 0.0f;
    float label2 = sorted_list[index2].label > 0.0f ? 1.0f : 0.0f;
    if (label1 == label2) {
      return 0.0;
    } else if (label1 < label2) {
      changed += map_stats[index2 - 1].ap_acc_add - map_stats[index1].ap_acc_add;
      changed += (map_stats[index1].hits + 1.0f) / (index1 + 1);
    } else {
      changed += map_stats[index2 - 1].ap_acc_miss - map_stats[index1].ap_acc_miss;
      changed += map_stats[index2].hits / (index2 + 1);
    }
    float ans = (changed - original) / (map_stats[map_stats.size() - 1].hits);
    if (ans < 0) ans = -ans;
    return ans;
  }
  /*
   * \brief obtain preprocessing results for calculating delta MAP
   * \param sorted_list the list containing entry information
   * \param map_stats a vector containing the accumulated precisions for each position in a list
   */
  inline void GetMAPStats(const std::vector &sorted_list,
                          std::vector *p_map_acc) {
    std::vector &map_acc = *p_map_acc;
    map_acc.resize(sorted_list.size());
    float hit = 0, acc1 = 0, acc2 = 0, acc3 = 0;
    for (size_t i = 1; i <= sorted_list.size(); ++i) {
      if (sorted_list[i - 1].label > 0.0f) {
        hit++;
        acc1 += hit / i;
        acc2 += (hit - 1) / i;
        acc3 += (hit + 1) / i;
      }
      map_acc[i - 1] = MAPStats(acc1, acc2, acc3, hit);
    }
  }
  virtual void GetLambdaWeight(const std::vector &sorted_list,
                               std::vector *io_pairs) {
    std::vector &pairs = *io_pairs;
    std::vector map_stats;
    GetMAPStats(sorted_list, &map_stats);
    for (size_t i = 0; i < pairs.size(); ++i) {
      pairs[i].weight =
          GetLambdaMAP(sorted_list, pairs[i].pos_index,
                       pairs[i].neg_index, &map_stats);
    }
  }
};

}  // namespace learner
}  // namespace xgboost
#endif  // XGBOOST_LEARNER_OBJECTIVE_INL_HPP_

/// 关于目标函数的求解可以参看： https://www.cnblogs.com/harvey888/p/7203256.html
/// 算法原理：http://wepon.me/files/gbdt.pdf
/// 目标函数推导分析：https://blog.csdn.net/yuxeaotao/article/details/90378782
/// https://blog.csdn.net/a819825294/article/details/51206410

/// 源码流程：https://blog.csdn.net/matrix_zzl/article/details/78699605
/// 源码主要函数：https://blog.csdn.net/weixin_39750084/article/details/83244191

转载于:https://www.cnblogs.com/Allen-rg/p/11377562.html

你可能感兴趣的:(xgboost 源码学习)

Spring 源码学习(九) Transaction 事务带鱼真好吃
spring系列转载自掘金VipAugushttps://juejin.cn/user/2348212565601415/postsSpringTransaction事务的使用和实现原理前言业务系统的数据，一般最后都会落入到数据库中，例如MySQL、Oracle等主流数据库，不可避免的，在数据更新时，有可能会遇到错误，这时需要将之前的数据更新操作撤回，避免错误数据。Spring的声明式事务能帮我们
十大机器学习算法-梯度提升决策树（GBDT） zjwreal 机器学习 GBDT 机器学习梯度提升提升树梯度提升决策树
简介梯度提升决策树（GBDT）由于准确率高、训练快速等优点，被广泛应用到分类、回归合排序问题中。该算法是一种additive树模型，每棵树学习之前additive树模型的残差。许多研究者相继提出XGBoost、LightGBM等，又进一步提升了GBDT的性能。基本思想提升树-BoostingTree以决策树为基函数的提升方法称为提升树，其决策树可以是分类树或者回归树。决策树模型可以表示为决策树的加
mysql hashcode函数_Mysql源码学习——没那么简单的Hash weixin_39793794 mysql hashcode函数
Hash链表的应用比较常见，其目的就是为了将不同的值映射到不同的位置，查找的时候直接找到相应的位置，而不需要传统的顺序遍历或是二分查找，从而达到减少查询时间的目的。常规的hash是预定义一定的桶(bucket)，规定一个hash函数，然后进行散列。然而Mysql中的hash没有固定的bucket，hash函数也是动态变化的，本文就进行非深入介绍。基本结构体Hash的结构体定义以及相关的函数接口定义
《机器学习》—— XGBoost（xgb.XGBClassifier）分类器张小生180 机器学习人工智能
文章目录一、XGBoost分类器的介绍二、XGBoost（xgb.XGBClassifier）分类器与随机森林分类器（RandomForestClassifier）的区别三、XGBoost（xgb.XGBClassifier）分类器代码使用示例一、XGBoost分类器的介绍XGBoost分类器是一种基于梯度提升决策树（GradientBoostingDecisionTree，GBDT）的集成学习算
java源码学习-Mybatis(2)与数据库建立连接子波zibo 源码学习 java基础 mybatis java jdbc
Mybatis与数据库建立连接jdbc执行流程图Mybatis初始化Hikari连接池的启动Mybatis获取数据库连接后记前文:Mybatis加载mapper流程由于mybatis是在jdbc的基础上进行封装的,所以jdbc执行流程获取连接->创建statements->resultSet这些步骤mybatis都是存在的,本篇学习一下Mybatis获取Connection的步骤jdbc执行流程图
Python处理大数据，如何提高处理速度 RS& #python python 大数据 pandas
Python处理大数据，如何提高处理速度？一、利用大数据分析工具Dask：https://dask.org/Dask简介：Dask支持Pandas的DataFrame和NumpyArray的数据结构，并且既可在本地计算机上运行，也可以扩展到在集群上运行。Dask可支持pandas、Numpy、Sklearn、XGBoost、XArray、RAPIDS等等。原理及使用方法：https://blog.
XGBoost调参demo（Python）妄念驱动机器学习算法 python 机器学习 XGBoost python
XGBoost我们用的是保险公司的一份数据#各种库importpandasaspdimportnumpyasnpimportmatplotlib.pyplotaspltfromsklearn.linear_modelimportLogisticRegressionfromsklearn.ensembleimportRandomForestClassifierfromsklearn.metricsi
【python】Python实现XGBoost算法的详细理论讲解与应用实战景天科技苑 python轻松入门基础语法到高阶实战教学 python 算法开发语言 XGBoost算法 XGBoost python实现XGBoost 人工智能
✨✨欢迎大家来到景天科技苑✨✨养成好习惯，先赞后看哦~作者简介：景天科技苑《头衔》：大厂架构师，华为云开发者社区专家博主，阿里云开发者社区专家博主，CSDN全栈领域优质创作者，掘金优秀博主，51CTO博客专家等。《博客》：Python全栈，PyQt5和Tkinter桌面开发，小程序开发，人工智能，js逆向，App逆向，网络系统安全，数据分析，Django，fastapi，flask等框架，云原生K
NVIDIA NCCL 源码学习（八）- 数据通信链路transport的建立 KIDGINBROOK nccl nccl gpu cuda
上节以ringGraph为例介绍了机器间channel的连接过程，现在环里每个rank都知道了从哪个rank接收数据以及将数据发送给哪个rank，本节具体介绍下P2P和rdmaNET场景下数据通信链路的建立过程。上节说到nccl通过ncclTransportP2pSetup完成了数据通信链路的建立，还是以上节两机十六卡的环为例：第一台机器的环：graph->intra:GPU/0GPU/7GPU/
WHAT - 通过 react-use 源码学习 React（Lifecycles 篇） @PHARAOH react.js 学习前端
目录一、官方介绍1.Sensors2.UI3.Animations4.Side-Effects5.Lifecycles6.State7.Miscellaneous二、源码学习示例：n.xx-yyLifecycles-useEffectOnceLifecycles-useEventLifecycles-useLifecyclesLifecycles-useMountedState&useUnmoun
Java源码学习之高并发编程基础——AQS源码剖析之阻塞队列（下）永往不庭 java 学习后端性能优化
1.前言&目录前言：在上一篇文章AQS源码剖析之阻塞队列（上）中介绍了以独占锁模式下AQS的基本原理，AQS仅仅起到了一个“维持线程等待秩序”的作用，那么本篇文章继续讲解共享锁模式下的特点。AQS不操纵锁的获取或者释放，仅仅提供一个由双向链表组成的队列，让抢不到锁的线程进入队列排队并阻塞起来、持有锁的线程释放锁后“通知”（即从阻塞态中唤醒）排名最靠前的有效（非CANCELLED状态）节点去重新竞争
每天一个数据分析题（五百零五）- 提升方法跟着紫枫学姐学CDA 数据分析题库数据分析
提升方法（Boosting），是一种可以用来减小监督式学习中偏差的机器学习算法。基于Boosting的集成学习，其代表算法不包括？A.AdaboostB.GBDTC.XGBOOSTD.随机森林数据分析认证考试介绍：点击进入题目来源于CDA模拟题库点击此处获取答案数据分析专项练习题库内容涵盖Python，SQL，统计学，数据分析理论，深度学习，可视化，机器学习，Spark八个方向的专项练习题库，数据
每天一个数据分析题（五百零六）- 装袋方法跟着紫枫学姐学CDA 数据分析数据挖掘
装袋方法(bagging)也叫做bootstrapaggregating,是在原始数据集有放回地重采样S次后得到新数据集的一种技术，其代表算法有？A.AdaboostB.GBDTC.XGBOOSTD.随机森林数据分析认证考试介绍：点击进入题目来源于CDA模拟题库点击此处获取答案数据分析专项练习题库内容涵盖Python，SQL，统计学，数据分析理论，深度学习，可视化，机器学习，Spark八个方向的专
简易Python：xlrd 和 openpyxl 库读取Excel单元格数据几种方式 PythonKaiser python windows excel
xlrd库是比较经典的一个库了，经典到vscode都没有代码提示，也没有高亮显示，堪称古典。xlrd也是很轻量的库，用起来不难。初步了解面向对象编码后，也可以尝试阅读源码学习代码组织方式。以下进入正题。首先当然是下载安装xlrd库了，然后import该库。在链式调用的各个函数中填入相应参数：文件路径和工作表序号（或名称），以上都是读取同一个单元格的数据，可以看出，几种读取方式的代码数量是一样的。而
Spark-第三周 fightingD&W Spark spark 大数据分布式
1.sparkcontext初始化源码分析Spark源码（7）-SparkContext初始化源码分析_太与旅spark源码-CSDN博客Spark源码学习(一)：SparkContext初始化源码分析_sparkinitialize-CSDN博客2.任务调度源码分析job提交spark提交job运行流程_请详述spark核心执行流程,如何使用sparksubmit在客户端提交job后如何通过st
Java基础——System系统类风之彼端 Java学习 java 开发语言
System系统类（在职的人不去看）跟着源码学习，不看api，一般是给学习者看常用方法：//学习数组的时候，自己写过数组拷贝的代码，工具类publicstaticnativevoidarraycopy(Objectsrc,intsrcPos（开始的索引）,Objectdest,intdestPos（开始拷贝的数组索引位置）,intlength（要拷贝多长）);//查询当前系统时间System.cu
R语言使用caret包构建xgboost模型（xgbLinear算法）构建回归模型实战、通过method参数指定算法名称、通过trainControl函数控制训练过程 statistics.insight R语言入门课算法 r语言回归机器学习数据挖掘
R语言使用caret包构建xgboost模型（xgbLinear算法）构建回归模型实战、通过method参数指定算法名称、通过trainControl函数控制训练过程目录R语言使用caret包构建xgboost模型（xgbLinear算法）构建回归模型、通过method参数指定算法名称、通过trainControl函数控制训练过程#导入包和库#仿真数据#R语言使用caret包构建xgboost模型
LTE Network Quality Analysis Method Based on MR Data and XGBoost Algorithm YZRuin 网络机器学习人工智能
原文链接：LTENetworkQualityAnalysisMethodBasedonMRDataandXGBoostAlgorithm|IEEEConferencePublication|IEEEXploreBasicInformation:Title:LTENetworkQualityAnalysisMethodBasedonMRDataandXGBoostAlgorithm(基于MR数据和X
XGB-12:在 Kubernetes 上进行分布式 XGBoost 训练 uncle_ll #XGBoost kubernetes 分布式 xgb xgboost Python
通过KubeflowXGBoostTrainingOperator支持在Kubernetes上进行分布式XGBoost训练和批量预测。操作步骤为在Kubernetes集群上运行XGBoost作业，执行以下步骤：在Kubernetes集群上安装XGBoostOperator。XGBoostOperator旨在管理XGBoost作业的调度和监控。按照安装指南安装XGBoostOperator。编写由X
Gin 框架源码学习（一） -- 服务启动前 gogin框架
官方简介GinisawebframeworkwritteninGo(Golang).Itfeaturesamartini-likeAPIwithperformancethatisupto40timesfasterthankstohttprouter.Ifyouneedperformanceandgoodproductivity,youwillloveGin.一些核心的结构*Enginegin实例结
121 Linux C++ 通讯架构实战 nginx源码学习目的，学习源码前期准备 hunandede linux 架构 nginx
零nginx源码学习的目的把nginx中最要的，有用的，代码提取出来作为我们自己知识库的一部分，以备将来使用一，nginx源码在windows上也可以下载下来。我们下载下来，注意下载的是nginx的linux源码，只是我们存放在windows下。然后解压就好，winrar就可以解压二，nginx源码查看工具。visualstudiocode解压后，我们发现源码文件不少，用什么工具比较好呢？这里我们
探索XGBoost：深度集成与迁移学习 Echo_Wish Python 笔记 Python算法迁移学习机器学习人工智能
导言深度集成与迁移学习是机器学习领域中的两个重要概念，它们可以帮助提高模型的性能和泛化能力。本教程将详细介绍如何在Python中使用XGBoost进行深度集成与迁移学习，包括模型集成、迁移学习的概念和实践等，并提供相应的代码示例。模型集成模型集成是一种通过组合多个模型来提高性能的技术。XGBoost提供了集成多个弱学习器的功能，可以通过设置booster参数来选择集成模型。以下是一个简单的示例：i
基于LightGBM的回归任务案例 python收藏家机器学习数据挖掘人工智能机器学习
在本文中，我们将学习先进的机器学习模型之一：Lightgbm。在对XGB模型进行了越来越多的改进以获得更好的性能之后，XGBoost是一种极限梯度提升机器，但通过lightgbm，我们可以在没有太多计算的情况下实现类似或更好的结果，并在更短的时间内在更大的数据集上训练我们的模型。让我们看看什么是LightGBM以及如何使用LightGBM执行回归。什么是LightGBM？LightGBM或“Lig
Task 11 XGBoost 算法分析与案例调参实例沫2021
1.XGBoost算法XGBoost是陈天奇等人开发的一个开源机器学习项目，高效地实现了GBDT算法并进行了算法和工程上的许多改进，被广泛应用在Kaggle竞赛及其他许多机器学习竞赛中并取得了不错的成绩。XGBoost是一个优化的分布式梯度增强库，旨在实现高效，灵活和便携。它在GradientBoosting框架下实现机器学习算法。XGBoost提供了并行树提升（也称为GBDT，GBM），可以快速
ApacheCN 交流社区热点汇总 2019.3 布客飞龙
听说B站可以睡小姐姐？可是。。那个小姐姐就是我鸭！【每日一问】卷积、卷积核、卷积神经网络怎么理解？如果你没有经验怎么办？来ApacheCN免费实习把！出国留学-微信讨论组自然语言处理（NLP）学习路线【每日一问】ID3、C4.5、C5.0和CART有什么联系、区别和优劣？【每日一问】假设模型准确率接近的情况下，模型融合越多越好吗？【每日一问】1000W数据量，喂给xgboost的特征大概是多少维度
新思路：TCN-RVM模型，你见过吗？机器学习预测全家桶新增模型，MATLAB代码今天吃饺子机器学习 matlab 人工智能开发语言
截止到本期，一共发了13篇关于机器学习预测全家桶MATLAB代码的文章。参考文章如下：1.五花八门的机器学习预测？一篇搞定不行吗？2.机器学习预测全家桶，多步预测之BiGRU、BiLSTM、GRU、LSTM，LSSVM、TCN、CNN，光伏发电数据为例3.机器学习预测全家桶，多步预测之组合预测模型，光伏发电数据为例4.机器学习预测全家桶之Xgboost，交通流量数据预测为例，MATLAB代码5.机
controller-manager学习三部曲之三：deployment的controller启动分析程序员欣宸 client-go kubernetes实战 kubernetes client-go
欢迎访问我的GitHub这里分类和汇总了欣宸的全部原创(含配套源码)：https://github.com/zq2599/blog_demos《controller-manager学习三部曲》完整链接通过脚本文件寻找程序入口源码学习deployment的controller启动分析本篇概览本文是《controller-manager学习三部曲》的终篇，前面咱们从启动到运行已经分析了controll
学习笔记 2019-04-30 段勇_bf97
HousePrices-bagging_xgboost+lasso+ridgeKaggle入門級賽題：房價預測FFMPEG视音频编解码零基础学习方法35岁程序员的独家面试经历公司名称公司介绍薪水车辆工程专业33岁简历有些传感器方面的东西20k-35k非渣硕是如何获得百度、京东双SP一些面试经验20k-40k吴以均的简历一个大牛的简历北京航空航天大学毕业生的简历厦门大学软件学院毕业生的简历名称介绍H
XGboost集成学习亦旧sea 集成学习机器学习人工智能
XGBoost集成学习是一种基于决策树的集成方法，用于解决分类和回归问题。它是一种GradientBoosting（梯度提升）的改进版，通过使用一系列弱学习器（例如决策树）的集合来构建一个更强大的模型。XGBoost通过迭代的方式逐步优化模型的预测结果。在每一轮迭代中，它先计算模型的负梯度（残差），然后用一个新的弱学习器来拟合这个残差。接着，它将当前模型的预测结果与新学习器的预测结果相加，得到一个
GBDT算法的升级--XGBoost与LightGBM算法 CquptDJ 数据挖掘机器学习机器学习算法数据挖掘人工智能大数据
本文同样不涉及公式推导及代码，对于GBDT算法的学习可以参考前面的文章GBDT算法原理，这里不再讲述GBDT，只讲述XGBoost与LightGBM算法原理下面推荐两篇写得最权威最官方(没有之一)的文档参考文档：XGBoost官方文档(全英文)LightGBM官方文档(全英文)关于GBDT算法，优点非常多，可以算是将boosting的思想发挥到了极致，处理许多数据效果都是非常好，但是正所谓人无完人
关于旗正规则引擎下载页面需要弹窗保存到本地目录的问题何必如此 jsp 超链接文件下载窗口
生成下载页面是需要选择“录入提交页面”，生成之后默认的下载页面<a>标签超链接为：<a href="<%=root_stimage%>stimage/image.jsp?filename=<%=strfile234%>&attachname=<%=java.net.URLEncoder.encode(file234filesourc
【Spark九十八】Standalone Cluster Mode下的资源调度源代码分析 bit1129 cluster
在分析源代码之前，首先对Standalone Cluster Mode的资源调度有一个基本的认识：首先，运行一个Application需要Driver进程和一组Executor进程。在Standalone Cluster Mode下，Driver和Executor都是在Master的监护下给Worker发消息创建(Driver进程和Executor进程都需要分配内存和CPU，这就需要Maste
linux上独立安装部署spark daizj linux 安装 spark 1.4 部署
下面讲一下linux上安装spark，以 Standalone Mode 安装 1）首先安装JDK 下载JDK：jdk-7u79-linux-x64.tar.gz ，版本是1.7以上都行，解压 tar -zxvf jdk-7u79-linux-x64.tar.gz 然后配置 ~/.bashrc&nb
Java 字节码之解析一周凡杨 java 字节码 javap
一： Java 字节代码的组织形式类文件 { OxCAFEBABE ，小版本号，大版本号，常量池大小，常量池数组，访问控制标记，当前类信息，父类信息，实现的接口个数，实现的接口信息数组，域个数，域信息数组，方法个数，方法信息数组，属性个数，属性信息数组 } &nbs
java各种小工具代码 g21121 java
1.数组转换成List import java.util.Arrays; Arrays.asList(Object[] obj); 2.判断一个String型是否有值 import org.springframework.util.StringUtils; if (StringUtils.hasText(str)) 3.判断一个List是否有值 import org.spring
加快FineReport报表设计的几个心得体会老A不折腾 finereport
一、从远程服务器大批量取数进行表样设计时，最好按“列顺序”取一个“空的SQL语句”，这样可提高设计速度。否则每次设计时模板均要从远程读取数据，速度相当慢！！二、找一个富文本编辑软件（如NOTEPAD+）编辑SQL语句，这样会很好地检查语法。有时候带参数较多检查语法复杂时，结合FineReport中生成的日志，再找一个第三方数据库访问软件（如PL/SQL）进行数据检索，可以很快定位语法错误。
mysql linux启动与停止墙头上一根草
如何启动/停止/重启MySQL一、启动方式1、使用 service 启动：service mysqld start2、使用 mysqld 脚本启动：/etc/inint.d/mysqld start3、使用 safe_mysqld 启动：safe_mysqld&二、停止1、使用 service 启动：service mysqld stop2、使用 mysqld 脚本启动：/etc/inin
Spring中事务管理浅谈 aijuans spring 事务管理
Spring中事务管理浅谈 By Tony Jiang@2012-1-20 Spring中对事务的声明式管理拿一个XML举例 [html] view plain copy print ? <?xml version="1.0" encoding="UTF-8"?>&nb
php中隐形字符65279（utf-8的BOM头）问题 alxw4616
php中隐形字符65279（utf-8的BOM头）问题今天遇到一个问题. php输出JSON 前端在解析时发生问题:parsererror. 调试: 1.仔细对比字符串发现字符串拼写正确.怀疑是非打印字符的问题. 2.逐一将字符串还原为unicode编码. 发现在字符串头的位置出现了一个 65279的非打印字符.
调用对象是否需要传递对象(初学者一定要注意这个问题) 百合不是茶对象的传递与调用技巧
类和对象的简单的复习,在做项目的过程中有时候不知道怎样来调用类创建的对象,简单的几个类可以看清楚,一般在项目中创建十几个类往往就不知道怎么来看为了以后能够看清楚,现在来回顾一下类和对象的创建,对象的调用和传递(前面写过一篇) 类和对象的基础概念: JAVA中万事万物都是类类有字段(属性),方法,嵌套类和嵌套接
JDK1.5 AtomicLong实例 bijian1013 java thread java多线程 AtomicLong
JDK1.5 AtomicLong实例类 AtomicLong 可以用原子方式更新的 long 值。有关原子变量属性的描述，请参阅 java.util.concurrent.atomic 包规范。AtomicLong 可用在应用程序中（如以原子方式增加的序列号），并且不能用于替换 Long。但是，此类确实扩展了 Number，允许那些处理基于数字类的工具和实用工具进行统一访问。
自定义的RPC的Java实现 bijian1013 java rpc
网上看到纯java实现的RPC，很不错。 RPC的全名Remote Process Call，即远程过程调用。使用RPC，可以像使用本地的程序一样使用远程服务器上的程序。下面是一个简单的RPC 调用实例，从中可以看到RPC如何
【RPC框架Hessian一】Hessian RPC Hello World bit1129 Hello world
什么是Hessian The Hessian binary web service protocol makes web services usable without requiring a large framework, and without learning yet another alphabet soup of protocols. Because it is a binary p
【Spark九十五】Spark Shell操作Spark SQL bit1129 shell
在Spark Shell上，通过创建HiveContext可以直接进行Hive操作 1. 操作Hive中已存在的表 [hadoop@hadoop bin]$ ./spark-shell Spark assembly has been built with Hive, including Datanucleus jars on classpath Welcom
F5　往header加入客户端的ip ronin47
when HTTP_RESPONSE {if {[HTTP::is_redirect]}{ HTTP::header replace Location [string map {:port/ /} [HTTP::header value Location]]HTTP::header replace Lo
java-61-在数组中，数字减去它右边(注意是右边)的数字得到一个数对之差. 求所有数对之差的最大值。例如在数组{2, 4, 1, 16, 7, 5, bylijinnan java
思路来自： http://zhedahht.blog.163.com/blog/static/2541117420116135376632/ 写了个java版的 public class GreatestLeftRightDiff { /** * Q61.在数组中，数字减去它右边(注意是右边)的数字得到一个数对之差。 * 求所有数对之差的最大值。例如在数组
mongoDB 索引开窍的石头 mongoDB索引
在这一节中我们讲讲在mongo中如何创建索引得到当前查询的索引信息 db.user.find(_id:12).explain(); cursor: basicCoursor 指的是没有索引 &
[硬件和系统]迎峰度夏 comsci 系统
从这几天的气温来看，今年夏天的高温天气可能会维持在一个比较长的时间内所以，从现在开始准备渡过炎热的夏天。。。。每间房屋要有一个落地电风扇，一个空调(空调的功率和房间的面积有密切的关系) 坐的，躺的地方要有凉垫，床上要有凉席电脑的机箱
基于ThinkPHP开发的公司官网 cuiyadll 行业系统
后端基于ThinkPHP，前端基于jQuery和BootstrapCo.MZ 企业系统轻量级企业网站管理系统运行环境:PHP5.3+, MySQL5.0 系统预览系统下载：http://www.tecmz.com 预览地址：http://co.tecmz.com 各种设备自适应响应式的网站设计能够对用户产生友好度，并且对于
Transaction and redelivery in JMS (JMS的事务和失败消息重发机制) darrenzhu jms 事务承认 MQ acknowledge
JMS Message Delivery Reliability and Acknowledgement Patterns http://wso2.com/library/articles/2013/01/jms-message-delivery-reliability-acknowledgement-patterns/ Transaction and redelivery in
Centos添加硬盘完全教程 dcj3sjt126com linux centos hardware
Linux的硬盘识别: sda 表示第1块SCSI硬盘 hda 表示第1块IDE硬盘 scd0 表示第1个USB光驱一般使用“fdisk -l”命
yii2 restful web服务路由 dcj3sjt126com PHP yii2
路由随着资源和控制器类准备，您可以使用URL如 http://localhost/index.php?r=user/create访问资源，类似于你可以用正常的Web应用程序做法。在实践中，你通常要用美观的URL并采取有优势的HTTP动词。例如，请求POST /users意味着访问user/create动作。这可以很容易地通过配置urlManager应用程序组件来完成如下所示
MongoDB查询(4)——游标和分页[八] eksliang mongodb MongoDB游标 MongoDB深分页
转载请出自出处：http://eksliang.iteye.com/blog/2177567 一、游标数据库使用游标返回find的执行结果。客户端对游标的实现通常能够对最终结果进行有效控制，从shell中定义一个游标非常简单，就是将查询结果分配给一个变量（用var声明的变量就是局部变量），便创建了一个游标，如下所示： > var
Activity的四种启动模式和onNewIntent() gundumw100 android
Android中Activity启动模式详解　　在Android中每个界面都是一个Activity，切换界面操作其实是多个不同Activity之间的实例化操作。在Android中Activity的启动模式决定了Activity的启动运行方式。　　Android总Activity的启动模式分为四种： Activity启动模式设置： <acti
攻城狮送女友的CSS3生日蛋糕 ini html Web html5 css css3
在线预览：http://keleyi.com/keleyi/phtml/html5/29.htm 代码如下： <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>攻城狮送女友的CSS3生日蛋糕-柯乐义<
读源码学Servlet（1）GenericServlet 源码分析 jzinfo tomcat Web servlet 网络应用网络协议
Servlet API的核心就是javax.servlet.Servlet接口，所有的Servlet 类（抽象的或者自己写的）都必须实现这个接口。在Servlet接口中定义了5个方法，其中有3个方法是由Servlet 容器在Servlet的生命周期的不同阶段来调用的特定方法。先看javax.servlet.servlet接口源码： package
JAVA进阶：VO(DTO)与PO(DAO)之间的转换 snoopy7713 java VO Hibernate po
PO即 Persistence Object　　VO即 Value Object 　VO和PO的主要区别在于：　　VO是独立的Java Object。　　PO是由Hibernate纳入其实体容器（Entity Map）的对象，它代表了与数据库中某条记录对应的Hibernate实体，PO的变化在事务提交时将反应到实际数据库中。　实际上，这个VO被用作Data Transfer
mongodb group by date 聚合查询日期统计每天数据（信息量） qiaolevip 每天进步一点点学习永无止境 mongodb 纵观千象
/* 1 */ { "_id" : ObjectId("557ac1e2153c43c320393d9d"), "msgType" : "text", "sendTime" : ISODate("2015-06-12T11:26:26.000Z")
java之18天常用的类(一) Luob. Math Date System Runtime Rundom
System类 import java.util.Properties; /** * System: * out:标准输出,默认是控制台 * in:标准输入,默认是键盘 * * 描述系统的一些信息 * 获取系统的属性信息:Properties getProperties(); * * * */ public class Sy
maven wuai maven
1、安装maven：解压缩、添加M2_HOME、添加环境变量path 2、创建maven_home文件夹，创建项目mvn_ch01,在其下面建立src、pom.xml，在src下面简历main、test、main下面建立java文件夹 3、编写类，在java文件夹下面依照类的包逐层创建文件夹，将此类放入最后一级文件夹 4、进入mvn_ch01 4.1、mvn compile ,执行后会在