LeastSquaresGradient

LeastSquaresGradient 计算每个样本的梯度和误差

作用:对每一个单例样本,计算线性回归的最小二乘损失函数的梯度和损失

在LinearRegression.scala的第87行生成了该类的实例

该类有两个compute方法

1.

override def compute(data: Vector, label: Double, weights: Vector)
 方法返回两个值
 一、梯度值:x*(y-h⊙(x))
 二、损失值loss=1/2(y-h⊙(x))^2
 其中
 x为损失值loss
 y=data*weights 即为该权重下的数据结果值
 h(⊙)=label 即为样本值


2.

override def compute(data: Vector,label: Double,weights: Vector,
      cumGradient: Vector)
 方法只有一个返回值
 一、损失值
 跟上面的三个参数的compute方法相比,y计算公式变了
 y=data*weights+cumGradient
 损失值的计算还是 loss = 1/2(y-h⊙(x))^2



以下是org.apache.spark.mllib.optimization.LeastSquaresGradient的源代码

/**
 * :: DeveloperApi ::
 * Compute gradient and loss for a Least-squared loss function, as used in linear regression.
 * This is correct for the averaged least squares loss function (mean squared error)
 *              L = 1/2n ||A weights-y||^2
 * See also the documentation for the precise formulation.
 */
@DeveloperApi
class LeastSquaresGradient extends Gradient {
  override def compute(data: Vector, label: Double, weights: Vector): (Vector, Double) = {
    val diff = dot(data, weights) - label
    val loss = diff * diff / 2.0
    val gradient = data.copy
    scal(diff, gradient)
    (gradient, loss)
  }

  override def compute(
      data: Vector,
      label: Double,
      weights: Vector,
      cumGradient: Vector): Double = {
    val diff = dot(data, weights) - label
    axpy(diff, data, cumGradient)
    diff * diff / 2.0
  }
}



以下是测试的scala代码

package com.mllib.component
import org.apache.spark.mllib.optimization.LeastSquaresGradient
import org.apache.spark.mllib.linalg.Vectors
/**
 * def compute(data: Vector, label: Double, weights: Vector)
 * 方法返回两个值
 * 一、梯度值:x*(y-h⊙(x))
 * 二、损失值loss=1/2(y-h⊙(x))^2
 * 其中
 * x为损失值loss
 * y=data*weights 即为该权重下的数据结果值
 * h(⊙)=label 即为样本值
 * 
 * 
 * override def compute(data: Vector,label: Double,weights: Vector,
      cumGradient: Vector)
 * 方法只有一个返回值
 * 一、损失值
 * 跟上面的三个参数的compute方法相比,y计算公式变了
 * y=data*weights+cumGradient
 * 损失值的计算还是 loss = 1/2(y-h⊙(x))^2
 * 
 * 
 */
object LeastSquaresGradientTest {
    def main(args: Array[String]): Unit = {
        val gradient = new LeastSquaresGradient
        val dataArr =  Array[Double](1,2,3,4)
        val data =  Vectors.dense(dataArr)
        
        val weightsArr =  Array[Double](2,3,4,5)
        val weights =  Vectors.dense(weightsArr)
        val label = 43.0d
        val (da, loss) = gradient.compute(data, label, weights) //基于最小二乘计算梯度值和损失值
        println(da.toArray.mkString(",")) //-3.0,-6.0,-9.0,-12.0
        println(loss) //4.5
        
        val cumGradientArr = Array[Double](6,7,8,9) 
        val cumGradient = Vectors.dense(cumGradientArr)
        val loss2 = gradient.compute(data, label, weights, cumGradient)
        println(loss2) //4.5
        
    }
  
}


不规范的计算过程如下:

dot(data,weights) = data点乘weights = (1,2,3,4) 点乘 (2,3,4,5) = 40

diff = dot(data,weights) - label = (40-43) = -3

loss = diff^2 /2 = 4.5     
gradient = diff * data = (-3.0,-6.0,-9.0,-12.0) 

方法返回 (gradient, loss)


转载请注明http://blog.csdn.net/wguangliang


你可能感兴趣的:(样本梯度和误差,线性回归,最小二乘损失函数,Spark)