使用scala做二元分类模型的评价

1、假设我们有一个训练好的二元分类模型tvsFitted,我们看下能得到这个模型的哪些指标来评价这个模型的好坏。

    //fit
    val tvsFitted = tvs.fit(trainData)

2、模型训练结束后,使用summary来获取评价指标

    //best model summary 获取训练时最优模型的评价指标
    import org.apache.spark.ml.PipelineModel

    //获取最优模型
    val trainedPipeline = tvsFitted.bestModel.asInstanceOf[PipelineModel]
    val TrainedLR = trainedPipeline.stages(lrStage).asInstanceOf[LogisticRegressionModel]

    //获取最优模型的二元分类summary
     val summaryLR = TrainedLR.binarySummary

    //查看损失函数迭代过程
    println("bestModel object history is:"+ summaryLR.objectiveHistory.mkString(","))
    println("bestModel object history iterate times:"+ summaryLR.objectiveHistory.length)

    //查看回归系数、截距
    println("bestModel coefficients is:"+ TrainedLR.coefficients)
    println("bestModel intercept is:"+ TrainedLR.intercept)

    //查看超参数
    println("bestModel regParam:"+TrainedLR.getRegParam)
    println("bestModel Threshold:"+TrainedLR.getThreshold)

    //获取ROC数据 Obtain the receiver-operating characteristic
    val roc = summaryLR.roc
    println("receiver-operating characteristic DataFrame is :")
    roc.show()
    
    //获取AUC
    val auc1 = summaryLR.areaUnderROC
    println(s"bestModel summary areaUnderROC i.e AUC: $auc1")

    //获取查准率precision、召回率recall、查全率accuracy
    val accuracy = summaryLR.accuracy
    val falsePositiveRate = summaryLR.weightedFalsePositiveRate
    val truePositiveRate = summaryLR.weightedTruePositiveRate
    val fMeasure = summaryLR.weightedFMeasure
    val precision = summaryLR.weightedPrecision
    val recall = summaryLR.weightedRecall
    println(s"bestModel Accuracy: $accuracy\nFPR: $falsePositiveRate\nTPR: $truePositiveRate\n" +
      s"F-measure: $fMeasure\nPrecision: $precision\nRecall: $recall")
    
    //获取使得F1度量最大的阈值,并赋值给最优模型 Set the model threshold to maximize F-Measure
    val fMeasureDF = summaryLR.fMeasureByThreshold
    val maxFMeasure = fMeasureDF.select(max("F-Measure")).head().getDouble(0)
    val bestThreshold = fMeasureDF.where(col("F-Measure") === maxFMeasure)
      .select("threshold").head().getDouble(0)
    TrainedLR.setThreshold(bestThreshold) 
    println(s"segmentModel best threshold is: $bestThreshold, maxFMeasure is $maxFMeasure")

3、模型预测一个测试集后获取评价指标

    //evaluate
    val tvsPredict = tvsFitted.transform(testData)
    tvsPredict.cache()
    tvsPredict.show()

    val auc = evaluator.evaluate(tvsPredict)
    println(s"Model evaluate testData areaUnderROC i.e AUC: $auc")

你可能感兴趣的:(Scala开发日志,spark,机器学习,scala,机器学习,spark-ml,二元分类模型评价器)