随机森林算法(RandomForest)实现MNIST手写体数字识别

一、准备:

第三方库 sklearn

二、代码:

# -*- coding: utf-8 -*-
# @Time    : 2018/8/21 9:35
# @Author  : Barry
# @File    : mnist.py
# @Software: PyCharm Community Edition

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import tensorflow.examples.tutorials.mnist.input_data as input_data

data_dir = 'MNIST_data/'
mnist = input_data.read_data_sets(data_dir,one_hot=False)
batch_size = 50000
batch_x,batch_y = mnist.train.next_batch(batch_size)
test_x = mnist.test.images[:10000]
test_y = mnist.test.labels[:10000]

print("start random forest")
for i in range(10,200,10):
    clf_rf = RandomForestClassifier(n_estimators=i)
    clf_rf.fit(batch_x,batch_y)

    y_pred_rf = clf_rf.predict(test_x)
    acc_rf = accuracy_score(test_y,y_pred_rf)
    print("n_estimators = %d, random forest accuracy:%f" %(i,acc_rf))

运行结果:

start random forest
n_estimators = 10, random forest accuracy:0.947300
n_estimators = 20, random forest accuracy:0.957600
n_estimators = 30, random forest accuracy:0.962900
n_estimators = 40, random forest accuracy:0.965400
n_estimators = 50, random forest accuracy:0.966500
n_estimators = 60, random forest accuracy:0.965500
n_estimators = 70, random forest accuracy:0.968000
n_estimators = 80, random forest accuracy:0.967900
n_estimators = 90, random forest accuracy:0.967300
n_estimators = 100, random forest accuracy:0.968900
n_estimators = 110, random forest accuracy:0.968600
n_estimators = 120, random forest accuracy:0.969800
n_estimators = 130, random forest accuracy:0.967100
n_estimators = 140, random forest accuracy:0.968200
n_estimators = 150, random forest accuracy:0.969200
n_estimators = 160, random forest accuracy:0.969200
n_estimators = 170, random forest accuracy:0.969300
n_estimators = 180, random forest accuracy:0.969300
n_estimators = 190, random forest accuracy:0.968500

三、算法浅析:

随机森林(random forest)是2001年提出来同时支持数据的回归与分类预测算法,在具体了解随机森林算法之前,首先看一下决策树算法(Decision Tree)决策树算法通过不断的分支条件筛选,最终预测分类做出决定,举个简单的例子,你去找工作,对方给了你一个offer,下面可能就是你决定是否最终接受或者拒绝offer一系列条件就是内部节点(矩形)最终的决定就是外部节点(叶子-椭圆)后你自己可能一个人根据上述条件决定接受了offer,但是有时候你还很不确定,你就会去很随机的问问你周围的几个朋友,他们也会根据你的情况与掌握的信息作出一系列的决策,做个形象的比喻,他们就是一棵棵单独存在的决策树,最终你根据这些结果决定接受还是拒绝offer,前一种情况你自己做出接受还是拒绝offer就叫决策树算法,后面一种情况,你一个人拿不定主意,还会随机问你周围的几个朋友一起给你参谋,最终做出接受还是拒绝offer的决定方式,你的那些朋友也是一棵棵单独存在的决策树,他们合在一起做决定,这个就叫做随机森林。
当你在使用随机森林做决定时候,有时候分支条件太多,有些不是决定因素的分支条件其实你可以不考虑的,比如在决定是否接受或者拒绝offer的时候你可能不会考虑公司是否有程序员鼓励师(啊!!!!),这个时候需要对这么小分支看成噪声,进行剪枝算法处理生成决策树、最终得到随机森林。同时随机森林的规模越大(决策树越多)、它的决策准确率也越高。随机森林算法在金融风控分析、股票交易数据分析、电子商务等领域均有应用。

参考资料:

Opencv微信公众号推送文章

你可能感兴趣的:(机器学习,python,图像处理,机器学习算法篇)