Spark MLlib线性回归简单实现

Spark MLlib线性回归简单实现

  • Spark MLlib线性回归简单实现
    • 一、训练数据
    • 二、实战代码
    • 三、线性回归预测及预测误差

Spark MLlib线性回归简单实现

一、训练数据

普通标签数据,数据格式:“标签,特征值1 特征值2 特征值3…”
训练数据lpsa.data如下:

-0.4307829,-1.63735562648104 -2.00621178480549 -1.86242597251066 -1.02470580167082 -0.52294088712441 -0.863171185425845 -1.04215728919298 -0.864466507337306
-0.1625189,-1.98898046126935 -0.722008756122123 -0.787896192088153 -1.02470580167082 -0.522940888712441 -0.863171185425945 -1.0421572891928 -0.864466507337306

二、实战代码

import org.apache.log4j.{Level, Logger}
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.{LabeledPoint, LinearRegressionModel, LinearRegressionWithSGD}

object LinearRession {
  def main(args: Array[String]): Unit = {
    //1.构建spark对象
    val conf: SparkConf = new SparkConf().setAppName("LinearRessionWithSGD").setMaster("local[2]")
    val sc = new SparkContext(conf)
        Logger.getRootLogger.setLevel(Level.WARN)

    //2.读取样本数据
    val data_path = "hdfs://node-1:9000/spark_data/lpsa.data"
    val data: RDD[String] = sc.textFile(data_path)
    val examples = data.map { line =>
      val parts: Array[String] = line.split(",")
      LabeledPoint(parts(0).toDouble, Vectors.dense(parts(1).split(' ').map(_.toDouble)))
    }.cache()
    val numExamples: Long = examples.count()

    //3.新建线性回归模型、并设置训练参数
    val numIterations = 100
    val stepSize = 1
    val miniBatchFraction = 1.0
    val model: LinearRegressionModel = LinearRegressionWithSGD.train(examples, numIterations, stepSize, miniBatchFraction)
    model.weights
    model.intercept

    //4.对样本进行测试
    val prediction: RDD[Double] = model.predict(examples.map(_.features))
    val predictionAndLabel: RDD[(Double, Double)] = prediction.zip(examples.map(_.label))
    val print_predict: Array[(Double, Double)] = predictionAndLabel.take(50)
    println("prediction" + "\t" + "label")
    for (i <- 0 to print_predict.length - 1) {
      println(print_predict(i)._1 + "\t" + print_predict(i)._2)
    }

    //5.计算测试误差
    val loss: Double = predictionAndLabel.map {
      x => (x._1 - 1) * (x._1 - 1)
    }.reduce(_ + _)

    val rmse: Double = math.sqrt(loss / numExamples)
    println(s"Test RMSE =$rmse")

    //6.保存模型
    val mode_path = "D:\\idea\\SparkLinearRegressionTestT\\LinearRessionModel"
    model.save(sc,mode_path)
    //7.加载模型
    LinearRegressionModel.load(sc,mode_path)

    sc.stop()
  }
}

三、线性回归预测及预测误差

Spark MLlib线性回归简单实现_第1张图片

喜欢就点赞评论+关注吧

Spark MLlib线性回归简单实现_第2张图片

感谢阅读,希望能帮助到大家,谢谢大家的支持!

你可能感兴趣的:(spark,机器学习)