GBDT 二分类算法实践

问题:
1、残差,残差近似等于用损失函数的负梯度 residual = 2 * y/exp(2 * y * f(i)) f(i)初始值为0
2、叶子节点估值node.predict_value,每个叶子节点region下的所有instance的残差值

image.png

用牛顿法进行优化得到如下公式:
image.png

代码实现:sum( residual(i) )/sum(|residual(i)|*|2-residual(i) |)
3、预测值更新,所有instance都更新 f(i)+=learn_rate * node.predict_value
4、最小化损失函数迭代优化 采用负二项式分布损失函数
loss = L(y,F)=log(1+exp(−2yF)),y∈−1,1
F(x)=1/2log[Pr(y=1|x)/Pr(y=0|x)]
F-对数几率,预测概率,y-真实值

    def fit(self, dataset, train_data):
            self.loss = BinomialDeviance(n_classes=dataset.get_label_size())
            # 1、初始化预测值
            f = dict()  # 记录F_{m-1}的值 预测值
            self.loss.initialize(f, dataset)
            for iter in range(1, self.max_iter + 1):
                subset = train_data
                if 0 < self.sample_rate < 1:
                    # 从list中随机获取5个元素,作为一个片断返回
                    subset = sample(subset, int(len(subset) * self.sample_rate))
                # 用损失函数的负梯度作为回归问题提升树的残差近似值 2*y/exp(2*y*f(i))
                # 2、残差 residual 计算为 = 2*y/exp(2*y*f(i))
                residual = self.loss.compute_residual(dataset, subset, f)
                leaf_nodes = []
                targets = residual
                # 3、构造决策树,进行叶子节点估值,估值需要用到残差 f = sum(fi)/sum(|fi|*|2-fi|)
                tree = construct_decision_tree(dataset, subset, targets, 0, leaf_nodes, self.max_depth, self.loss,
                                               self.split_points)
                self.trees[iter] = tree
                # max_iter=20, sample_rate=0.8, learn_rate=0.5, max_depth=7, loss_type='binary-classification'
                # 更新预测值 learn_rate*预测值
                self.loss.update_f_value(f, tree, leaf_nodes, subset, dataset, self.learn_rate)
          
                # loss[y,f] = log(1+e^(-2yf))
                # f(i) = 1/2*log(p(y=1)/p(y=0)) 对数几率
                # 所有instance的loss和
                train_loss = self.compute_loss(dataset, f)
                print("iter%d : train loss=%f" % (iter, train_loss))

公式参考
https://nbviewer.jupyter.org/github/liudragonfly/GBDT/blob/master/GBDT.ipynb

你可能感兴趣的:(GBDT 二分类算法实践)