PredictionIo 0.12.1 安装简介
参考网站:http://predictionio.apache.org/install/
环境:
系统环境:
Ubuntu 14.04
软件版本:
以下为安装测试过程中,使用的软件版本:
必须安装:
Java:64-Bit “1.8.0_171”
Hadoop : 2.7.6
Scala : 2.12.6
Spark :2.1.1(支持 hadoop 2.7 的 spark-2.1.1-bin-hadoop2.7 版本)
选择性安装(三选一),(此次测试安装 3 选择存储):
1:PostgreSQL 9.1
2:MySQL 5.1
3:Apache HBase 1.2.6
Elasticsearch 5.5.2
软件版本选择:
Scala 2.10.x, 2.11.x
Spark 1.6.x, 2.0.x, 2.1.x
Hadoop 2.4.x to 2.7.x
Elasticsearch 1.7.x, 5.x(PredictionIo 0.11.0 版本以上,选择5.x)
安装Java
参考网站:http://www.runoob.com/java/java-environment-setup.html
安装过程:
1:将 jdk-8-64.tar.gz 安装包移动至 /usr/local/java 目录
2:解压 jdk-8-64.tar.gz 至当前目录 ;
命令:” tar zxvf jdk-8-64.tar.gz
”;
安装路径:
/usr/local/java/jdk1.8.0_171
配置:
文件 /root/.bashrc 增加如下:
export JAVA_HOME=/usr/local/java/jdk1.8.0_171
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export PATH=${JAVA_HOME}/bin:$PATH
检查是否安装成功:
输入命令 “java -version”输出版本信息,安装成功;
安装Scala
参考网站:http://www.runoob.com/scala/scala-install.html
安装过程:
1:将scala-2.12.6.tgz 安装包移动至 /usr/local/scala 目录
2:解压 scala-2.12.6.tgz 至当前目录
安装路径:
/usr/local/scala/scala-2.12.6
配置:
文件 /root/.bashrc 增加如下:
export SCALA_PATH=/usr/local/scala/scala-2.12.6
export PATH=${JAVA_HOME}/bin:$SCALA_APTH:$PATH
检查是否安装成功:
输入命令“scalac -version”输出版本信息,安装成功;
安装 Hadoop
参考网站:https://blog.csdn.net/wee_mita/article/details/52750112
https://www.cnblogs.com/xzjf/p/7231519.html
http://hadoop.apache.org/releases.html
描述:
测试安装Hadoop 是单机模式,可扩展安装成分布式集群方式;
安装过程:
1:将 hadoop-2.7.6.tar.gz 安装包移动至 /usr/local/hadoop 目录
2:解压 hadoop-2.7.6.tar.gz 至当前目录;
安装路径:
/usr/local/hadoop/hadoop-2.7.6
配置:
1:文件 /root/.bashrc 增加如下:
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.6
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"
export PATH=${JAVA_HOME}/bin:$SCALA_PATH/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME:$SCALA_APTH:$PATH
2:文件 /usr/local/hadoop/hadoop-2.7.6/etc/hadoop/core-site.xml 添加如下内容:
<configuration>
//增加部分 start
<property>
<name>fs.default.namename>
<value>hdfs://{$HOST_NAME}:9000value>
property>
<property>
<name>hadoop.http.staticuser.username>
<value>hdfsvalue>
property>
<property>
<name>io.file.buffer.sizename>
<value>131072value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/tmpvalue>
property>
<property>
<name>hadoop.proxyuser.hadoop.hostsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.hadoop.groupsname>
<value>*value>
property>
<property>
<name>fs.checkpoint.periodname>
<value>3600value>
property>
<property>
<name>fs.checkpoint.sizename>
<value>67108864value>
property>
//增加部分 end
configuration>
3:文件 /usr/local/hadoop/hadoop-2.7.6/etc/hadoop/hdfs-site.xml 添加如下内容:
<configuration>
//增加部分 start
<property>
<name>dfs.replicationname>
<value>1value>
property>
<property>
<name>dfs.permissionsname>
<value>falsevalue>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/namenodevalue>
property>
<property>
<name>fs.checkpoint.dirname>
<value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/secondarynamenodevalue>
property>
<property>
<name>fs.checkpoint.edits.dirname>
<value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/secondarynamenode value>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>file:/usr/local/hadoop/hadoop-2.7.6/hdfs/datanodevalue>
property>
<property>
<name>dfs.namenode.http-addressname>
<value>{$HOST_NAME}:50070value>
property>
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>{$HOST_NAME}:50090value>
property>
<property>
<name>dfs.webhdfs.enabledname>
<value>truevalue>
property>
//增加部分 end
configuration>
4:文件 /usr/local/hadoop/hadoop-2.7.6/etc/hadoop/mapred-site.xml 添加如下内容:
<configuration>
//增加部分 start
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>{$HOST_NAME}:10020value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>{$HOST_NAME}:19888value>
property>
<property>
<name>yarn.app.mapreduce.am.staging-dirname>
<value>/mapreducevalue>
property>
//增加部分 end
configuration>
5:文件 /usr/local/hadoop/hadoop-2.7.6/etc/hadoop/yarn-site.xml 添加如下内容:
<configuration>
//增加部分 start
<property>
<name>yarn.web-proxy.addressname>
<value>yarn_proxy:YARN_PROXY_PORTvalue>
property>
<property>
<name>yarn.resourcemanager.scheduler.addressname>
<value>{$HOST_NAME}:8030value>
property>
<property>
<name>yarn.resourcemanager.resource-tracker.addressname>
<value>{$HOST_NAME}:8031value>
property>
<property>
<name>yarn.resourcemanager.addressname>
<value>{$HOST_NAME}:8032value>
property>
<property>
<name>yarn.resourcemanager.admin.addressname>
<value>{$HOST_NAME}:8033value>
property>
<property>
<name>yarn.resourcemanager.webapp.addressname>
<value>{$HOST_NAME}:8080value>
property>
<property>
<name> mapreduce.job.ubertask.enablename>
<value>truevalue>
property>
//增加部分 end
configuration>
6:打开/usr/local/hadoop/hadoop-2.7.6/etc/hadoop/slaves文件,添加作为slave的主机名,一行一个:
{$HOST_NAME}
7:打开${HADOOP_HOME}/etc/hadoop/masters文件,添加作为secondarynamenode的主机名,一行一个;
8:修改文件/usr/local/hadoop/hadoop-2.7.6/etc/hadoop/hadoop-env.sh参数如下:
HADOOP_HEAPSIZE=500
HADOOP_NAMENODE_INIT_HEAPSIZE=”500”
9:修改文件/usr/local/hadoop/hadoop-2.7.6/etc/hadoop/mapred-env.sh参数如下:
HADOOP_JOB_HISTORYSERVER_HEAPSIZE=250
10:修改文件/usr/local/hadoop/hadoop-2.7.6/etc/hadoop/yarn-env.sh参数如下:
JAVA_HEAP_MAX=Xmx500m
YANR_HEAPSIZE=500
关闭防火墙:
sudoufw disable
serviceiptables stop / start
serviceiptables status
初次运行Hadoop时一定要有如下操作:
cd /usr/local/hadoop/hadoop-2.7.6
bin/hdfs namenode -format
启动hadoop 服务:
cd /usr/local/hadoop/hadoop-2.7.6/sbin
开启:“./start-all.sh
”
关闭:“./stop-all.sh
”
检测hadoop是否启动成功:
输入命令“ jps -l ”,查看是否存在如下信息:
访问界面地址:
http://{$HOST_NAME}:50070/
安装 Prediction 0.12.1
软件包下载地址:
https://www.apache.org/dyn/closer.cgi/predictionio/0.12.1/apache-predictionio-0.12.1.tar.gz
安装路径:
/home/PredictionIo
安装过程:
1:将 PredictionIO-0.12.1.tar.gz 安装包移动至 /home/PredictionIo目录;
2:解压 PredictionIO-0.12.1.tar.gz 至当前目录
3:进入 PredictionIo 目录,运行 “./make-distribution.sh”文件
cdPredictionIo c d P r e d i c t i o n I o ./make-distribution.sh (此过程需要一段时间,请耐心等待)
4:成功会新建目录及文件
PredictionIo/sbt/sbt
PredictionIo/conf/
PredictionIo/conf/pio-env.sh
5:在安装目录/home/PredictionIo 新建文件夹 vendors
$ mkdir PredictionIo /vendors
配置:
文件 /root/.bashrc 增加如下:
export PIO_HOME=/home/PredictionIo
export PATH=${JAVA_HOME}/bin:$PIO_HOME:$PIO_HOME/bin:$SCALA_PATH/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME:$SCALA_APTH:$PATH
安装Spark
安装目录
/home/PredictionIo/vendors/spark-2.1.1-bin-hadoop2.7
安装过程
1:下载软件包至/home/PredictionIo/vendors安装目录
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-2.1.1-bin-hadoop2.6.tgz
2:解压软件包至安装目录
$ tar zxvfC spark-2.1.1-bin-hadoop2.6.tgz PredictionIo/vendors
配置:
1:文件 /root/.bashrc 增加如下:
export SPARK_HOME=/home/PredictionIo/vendors/spark-2.1.1-bin-hadoop2.7
export PATH=${JAVA_HOME}/bin:$PIO_HOME:$PIO_HOME/bin:$SCALA_PATH/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$HADOOP_HOME:$SPARK_HOME:$SCALA_APTH:$PATH
2:编辑Spark配置文件/conf/spark-env.sh添加如下设置
export HADOOP_CONF_DIR={$HADOOP_PATH}/etc/hadoop
export HADOOP_HOME={$HADOOP_PATH}
export JAVA_HOME={$JAVA_PATH}
export SCALA_HOME={$SCALA_PATH}
export SPARK_WORK_MEMORY=3g
export SPARK_MASTER_HOST={$HOST_NAME}
export SPARK_MASTER_IP={$HOST_NAME}
export MASTER=spark://{$HOST_NAME}:7077
export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=4"
访问界面地址:
http:/{$HOST_NAME}:8080/
安装Elasticsearch
安装目录
/home/PredictionIo/vendors/elasticsearch-5.5.2
安装过程
1:下载软件包至/home/PredictionIo/vendors安装目录
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.2.tar.gz
2:解压软件包至安装目录
$ tar zxvfC elasticsearch-5.5.2.tar.gz PredictionIo/vendors
检测Elasticsearch是否安装成功:
1:注意,启动Elasticsearch 不能使用“root”用户,新建用户“elastic”、密码“elastic”,并设置用户“elastic”管理员“root”权限;
2:切换用户“elastic”
3:进入Elasticsearch安装目录bin,运行“./elasticsearch”启动服务,输出如下信息则安装成功;
[2018-08-03T15:43:05,783][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [aggs-matrix-stats]
[2018-08-03T15:43:05,783][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [ingest-common]
[2018-08-03T15:43:05,784][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [lang-expression]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [lang-groovy]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [lang-mustache]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [lang-painless]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [parent-join]
[2018-08-03T15:43:05,789][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [percolator]
[2018-08-03T15:43:05,790][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [reindex]
[2018-08-03T15:43:05,790][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [transport-netty3]
[2018-08-03T15:43:05,790][INFO ][o.e.p.PluginsService ] [i8eVyaH] loaded module [transport-netty4]
[2018-08-03T15:43:05,790][INFO ][o.e.p.PluginsService ] [i8eVyaH] no plugins loaded
[2018-08-03T15:43:11,281][INFO ][o.e.d.DiscoveryModule ] [i8eVyaH] using discovery type [zen]
[2018-08-03T15:43:12,013][INFO ][o.e.n.Node ] initialized
[2018-08-03T15:43:12,013][INFO ][o.e.n.Node ] [i8eVyaH] starting ...
[2018-08-03T15:43:12,261][INFO ][o.e.t.TransportService ] [i8eVyaH] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}
[2018-08-03T15:43:12,273][WARN ][o.e.b.BootstrapChecks ] [i8eVyaH] max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[2018-08-03T15:43:15,366][INFO ][o.e.c.s.ClusterService ] [i8eVyaH] new_master {i8eVyaH}{i8eVyaHsQwKynitriABD1Q}{dz63krojSnivRerG3RROZQ}{127.0.0.1}{127.0.0.1:9300}, reason: zen-disco-elected-as-master ([0] nodes joined)
[2018-08-03T15:43:15,436][INFO ][o.e.h.n.Netty4HttpServerTransport] [i8eVyaH] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2018-08-03T15:43:15,436][INFO ][o.e.n.Node ] [i8eVyaH] started
[2018-08-03T15:43:15,792][INFO ][o.e.g.GatewayService ] [i8eVyaH] recovered [1] indices into cluster_state
[2018-08-03T15:43:16,354][INFO ][o.e.c.r.a.AllocationService] [i8eVyaH] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[pio_meta][4]] ...]).
安装HBase
软件包下载地址:
http://www.apache.org/dyn/closer.cgi/hbase/1.2.6/hbase-1.2.6-bin.tar.gz
安装目录
/home/PredictionIo/vendors/hbase-1.2.6
安装过程
1:下载软件包至/home/PredictionIo/vendors安装目录
$ wget http://archive.apache.org/dist/hbase/1.2.6/hbase-1.2.6-bin.tar.gz
2:解压软件包至安装目录
$ tar zxvfC hbase-1.2.6-bin.tar.gz PredictionIo/vendors
配置:
1:编辑HBase 配置文件 /conf/hbase-site.xml 添加如下内容:
<configuration>
//新增内容 star
<property>
<name>hbase.rootdirname>
<value>file:///home/PredictionIo/vendors/hbase-1.2.6/datavalue>
property>
<property>
<name>hbase.zookeeper.property.dataDirname>
<value>/home/PredictionIo/vendors/hbase-1.2.6/zookeepervalue>
property>
//新增内容 end
configuration>
2:编辑HBase配置文件/conf/hbase-env.sh添加如下设置:
export JAVA_HOME={$JAVA_PATH}
访问界面地址:
http://{HOST_NAME}:16010/ 或http://{ HOST_NAME}:16010/ 或http://{ HOST_NAME}:60010/
安装PredictionIo 配置
1:进入PredictionIo安装目录/home/PredictionIo
2:编辑文件 /conf/pio-env.sh 配置信息如下:
SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.7
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=127.0.0.1
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9200
IO_STORAGE_SOURCES_ELASTICSEARCH_SCHEMES=http
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-5.5.2
PIO_STORAGE_SOURCES_ELASTICSEARCH_USERNAME=elastic
PIO_STORAGE_SOURCES_ELASTICSEARCH_PASSWORD=elastic
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.2.6
PIO_STORAGE_SOURCES_HBASE_HOSTS=127.0.0.1
PIO_STORAGE_SOURCES_HBASE_PORTS=7070
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
如果安装的是MySql,配置信息如下:
MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.44-bin.jar
SPARK_HOME=$PIO_HOME/vendors/spark-2.1.1-bin-hadoop2.7
HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=MYSQL
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=MYSQL
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=MYSQL
PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://{$MYSQL_URL}:3306/{$MYSQL_DBNAME}
PIO_STORAGE_SOURCES_MYSQL_USERNAME={$MYSQL_USERNAME}
PIO_STORAGE_SOURCES_MYSQL_PASSWORD={$MYSQL_PASSWORD}
启动PredictionIo服务
1:如果选择安装Elasticsearch 和 HBase:
开启:pio-start-all
关闭:pio-stop-all
2:如果选择安装MySql:
启动:pio eventserver &
检查PredictionIo安装是否成功:
输入命令“pio status
”,输出以下内容,则安装成功;
[INFO] [Management$] Inspecting PredictionIO...
[INFO] [Management$] PredictionIO 0.12.1 is installed at /home/PredictionIo
[INFO] [Management$] Inspecting Apache Spark...
[INFO] [Management$] Apache Spark is installed at /home/PredictionIo/vendors/spark-2.1.1-bin-hadoop2.7
[INFO] [Management$] Apache Spark 2.1.1 detected (meets minimum requirement of 1.3.0)
[INFO] [Management$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [HBLEvents] The table pio_event:events_0 doesn't exist yet. Creating now...
[INFO] [HBLEvents] Removing table pio_event:events_0...
[INFO] [Management$] Your system is all ready to go.
启动Hadoop、HBase、Elasticsearch、Spark、Prediction所有服务:
输入命令“jps -l”,输出以下内容:
4512 org.apache.predictionio.tools.console.Console
6705 org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode
9586 org.apache.spark.deploy.SparkSubmit
10948 org.elasticsearch.bootstrap.Elasticsearch
3780 org.apache.spark.deploy.worker.Worker
10870 org.apache.predictionio.tools.console.Console
9673 org.apache.spark.executor.CoarseGrainedExecutorBackend
6282 org.apache.hadoop.hdfs.server.namenode.NameNode
9515 org.apache.predictionio.tools.console.Console
3627 org.apache.spark.deploy.master.Master
6443 org.apache.hadoop.hdfs.server.datanode.DataNode
12894 sun.tools.jps.Jps