h2o 准备

首先,你需要下载R,下载python,之后还需要加载java。然后你可以在R中使用
install.packages(h2o) 进行安装h2o,之后就是library(h2o),然后初始化h2o平台h2o.init()

你也可以在python中安装h2o:
pip install - U h2o
import h2o
h2o.init()

做一个简短的开始

h2o.init()

irish2o <- as.h2o(iris %>% filter(Species !='setosa'))
y <- 'Species'
x <- setdiff(names(irish2o),y)
parts <- h2o.splitFrame(irish2o,0.8)

train <- parts[[1]]
test <- parts[[2]]


----------------------------------------------------------------------

Your next step is to start H2O:
    > h2o.init()

For H2O package documentation, ask for help:
    > ??h2o

After starting H2O, you can use the Web UI at http://localhost:54321
For more information visit http://docs.h2o.ai

----------------------------------------------------------------------


载入程辑包:‘h2o’

The following objects are masked from ‘package:stats’:

    cor, sd, var

The following objects are masked from ‘package:base’:

    &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
    colnames<-, ifelse, is.character, is.factor, is.numeric,
    log, log10, log1p, log2, round, signif, trunc

> h2o.init()

H2O is not running yet, starting it now...

Note:  In case of errors look at the following log files:
    /var/folders/jz/qf7zhsc97f71slzzf59mvs2w0000gn/T//RtmpujsoRp/h2o_milin_started_from_r.out
    /var/folders/jz/qf7zhsc97f71slzzf59mvs2w0000gn/T//RtmpujsoRp/h2o_milin_started_from_r.err

java version "10.0.1" 2018-04-17
Java(TM) SE Runtime Environment 18.3 (build 10.0.1+10)
Java HotSpot(TM) 64-Bit Server VM 18.3 (build 10.0.1+10, mixed mode)

Starting H2O JVM and connecting: ... Connection successful!

R is connected to the H2O cluster: 
    H2O cluster uptime:         3 seconds 560 milliseconds 
    H2O cluster timezone:       Asia/Shanghai 
    H2O data parsing timezone:  UTC 
    H2O cluster version:        3.20.0.8 
    H2O cluster version age:    1 month and 20 days  
    H2O cluster name:           H2O_started_from_R_milin_jhc047 
    H2O cluster total nodes:    1 
    H2O cluster total memory:   2.00 GB 
    H2O cluster total cores:    4 
    H2O cluster allowed cores:  4 
    H2O cluster healthy:        TRUE 
    H2O Connection ip:          localhost 
    H2O Connection port:        54321 
    H2O Connection proxy:       NA 
    H2O Internal Security:      FALSE 
    H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
    R Version:                  R version 3.4.3 (2017-11-30) 

 m <- h2o.randomForest(x = x,y = y,training_frame = train)
  |=============================================================| 100%
> m
Model Details:
==============

H2OBinomialModel: drf
Model ID:  DRF_model_R_1541858573921_1 
Model Summary: 
  number_of_trees number_of_internal_trees model_size_in_bytes
1              50                       50                6827
  min_depth max_depth mean_depth min_leaves max_leaves mean_leaves
1         2         5    3.34000          3         10     5.88000


H2OBinomialMetrics: drf
** Reported on training data. **
** Metrics reported on Out-Of-Bag training samples **

MSE:  0.05615946
RMSE:  0.2369799
LogLoss:  0.2136178
Mean Per-Class Error:  0.05441176
AUC:  0.9779412
Gini:  0.9558824

Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
           versicolor virginica    Error   Rate
versicolor         38         2 0.050000  =2/40
virginica           2        32 0.058824  =2/34
Totals             40        34 0.054054  =4/74

Maximum Metrics: Maximum metrics at their respective thresholds
                        metric threshold    value idx
1                       max f1  0.476190 0.941176  30
2                       max f2  0.260952 0.953757  33
3                 max f0point5  0.937500 0.966667  25
4                 max accuracy  0.476190 0.945946  30
5                max precision  1.000000 1.000000   0
6                   max recall  0.004662 1.000000  49
7              max specificity  1.000000 1.000000   0
8             max absolute_mcc  0.476190 0.891176  30
9   max min_per_class_accuracy  0.476190 0.941176  30
10 max mean_per_class_accuracy  0.476190 0.945588  30

Gains/Lift Table: Extract with `h2o.gainsLift(, )` or `h2o.gainsLift(, valid=, xval=)`


> p <- h2o.predict(m,test)
  |=============================================================| 100%
> p
     predict versicolor   virginica
1 versicolor  0.9679487 0.032051282
2 versicolor  0.8779487 0.122051282
3 versicolor  0.9979487 0.002051282
4 versicolor  0.9679487 0.032051282
5 versicolor  0.9979487 0.002051282
6 versicolor  0.9979487 0.002051282

[26 rows x 3 columns] 
> 

performance Versus Predictions

h2o.performance(m,test)
H2OMultinomialMetrics: drf

Test Set Metrics: 
=====================

MSE: (Extract with `h2o.mse`) 0.08837984
RMSE: (Extract with `h2o.rmse`) 0.2972875
Logloss: (Extract with `h2o.logloss`) 0.2452472
Mean Per-Class Error: 0.1623932
Confusion Matrix: Extract with `h2o.confusionMatrix(, )`)
=========================================================================
Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
           setosa versicolor virginica  Error     Rate
setosa          6          0         0 0.0000 =  0 / 6
versicolor      0         11         2 0.1538 = 2 / 13
virginica       0          3         6 0.3333 =  3 / 9
Totals          6         14         8 0.1786 = 5 / 28

Hit Ratio Table: Extract with `h2o.hit_ratio_table(, )`
=======================================================================
Top-3 Hit Ratios: 
  k hit_ratio
1 1  0.821429
2 2  1.000000
3 3  1.000000

> 

h2o flow

h2o flow 是h2o 的一个网页的接口,你可以直接上传或者下载数据,你可以查看你所建立的所有模型,你可以直接的创建模型,也可以直接的进行预测。

有几种方式打开h2o flow ,首先,第一种是在你的R或者python中初始化h2o,然后在你的网页打开:http://127.0.0.1:54321
另外一种是你要在服务器部署h2o,然后打开

1.Download H2O. This is a zip file that contains everything you need to get started.
2.
 cd ~/Downloads
unzip h2o-3.22.0.1.zip
cd h2o-3.22.0.1
java -jar h2o.jar

3. Point your browser to [http://你的主机地址:54321] 

如何使用h2o flow 参见我以前的文章:
https://www.jianshu.com/p/74d12c682af7

你可能感兴趣的:(h2o 准备)