cmd:输入pip install h2o
python
import h2o
h2o.init()
启动成功之后可以在浏览器中输入:localhost:54321
e.选择随机森林
f.选择特征,响应列,其他参数按需调整
g.参数填好之后,建模
h.查看Job
i.随机森林数的数量与trainlogloess之间的关系
j.各个属性的重要程度
a.导入测试集
b.接下来几个步骤建模时候的一样,这里就不赘述
c.预测
得到的结果,比之前的测试集多了三列
# coding: utf-8
# In[1]:
import h2o
h2o.init()
# In[75]:
trainFrame =h2o.import_file("C:\\Users\\gpwner\\Desktop\\train.csv")[2:]
names=trainFrame.col_names[:-1]
response_column = 'Catrgory'
# In[37]:
from h2o.estimators import H2ORandomForestEstimator
# Define model
model = H2ORandomForestEstimator(ntrees=50, max_depth=20, nfolds=10)
model.train(x=names,y=response_column,training_frame=trainFrame)
# In[84]:
testdata =h2o.import_file("C:\\Users\\gpwner\\Desktop\\test.csv")[2:]
pre_tag=H2ORandomForestEstimator.predict(model ,testdata)
pre_tag['predict']
resultdata=testdata.cbind(pre_tag['predict'])
resultdata
h2o.download_csv(resultdata,"C:\\Users\\gpwner\\Desktop\\predict.csv")
# In[82]:
from __future__ import division
correct=resultdata[resultdata['Catrgory']==resultdata['predict']]
print(float(len(correct)/len(resultdata)))