1. spark 安装目录问题
ambari 安装的spark 版本号和社区下载的不一样,semver.sh脚本执行正则的时候验证spark版本是否合格,会失败。
重新安装 spark 指定路径
2. 链接数据库须带端口
3.spark路径
/data/Software/spark-2.3.3
4.postgresql数据库:
数据地址:
库名:pio
用户名:pio
密码:
链接
psql -U pio -h ip -p 5432 -d pio
所有数据库:
select * from pg_database;
选择数据库:
\c pio
查询指定库所有表:
\dt
退出:
\q
5.PredictionIO安装目录:
/usr/local/PredictionIO/PredictionIO-0.14.0
unset -f pathmunge
export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
export HADOOP_HOME=/usr/hdp/2.6.4.0-91/hadoop
export SPARK_HOME=/usr/hdp/2.6.4.0-91/spark2
export PGD_HOME=/usr/local/postgresql
export PIO_HOME=/usr/local/PredictionIO/PredictionIO-0.14.0
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$PGD_HOME/bin:$PIO_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export SPARK_MAJOR_VERSION=2
配置文件:
pio-env.sh
jar包;
postgresql-42.2.6.jar
pio eventserver & 启动
jps -l 查看
6.流程
目录项目 /data/online/PredictionIO/MyECommerceRecommendation
git clone https://github.com/apache/predictionio-template-ecom-recommender.git MyECommerceRecommendation
pio app new myEcoRe 创建app项目
[INFO] [App$] Initialized Event Store for this app ID: 1.
[INFO] [Pio$] Created a new app:
[INFO] [Pio$] Name: myEcoRe
[INFO] [Pio$] ID: 1
[INFO] [Pio$] Access Key: CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq
pio app list 查看
[INFO] [Pio$] Name | ID | Access Key | Allowed Event(s)
[INFO] [Pio$] myEcoRe | 1 | CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq | (all)
[INFO] [Pio$] Finished listing 1 app(s).
curl -i -X GET http://localhost:7070 验证事件服务器是否启动成功
7.添加事件:
curl -i -X POST http://localhost:7070/events.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq \
-H "Content-Type: application/json" \
-d '{
"event" : "view",
"entityType" : "user",
"entityId" : "u0",
"targetEntityType" : "item",
"targetEntityId" : "i0",
"eventTime" : "2019-06-10T12:34:56.123-08:00"
}'
8.执行批量插入数据脚本
/data/online/PredictionIO/MyECommerceRecommendation/shell/add_event_batch.sh
Namespace(access_key='CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq', file='/data/online/PredictionIO/MyECommerceRecommendation/py_import_data/sample_movielens_data.txt', url='http://localhost:7070')
Importing data...
1501 events are imported.
批量:
python data/import_eventserver.py --access_key
9.准备训练数据 注:必须在项目目录下进行
pio build --verbose 准备训练数据 --verbose 详细信息打印
报错:[INFO] [Engine$] If the path above is incorrect, this process will fail.
[INFO] [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-0.11.0-incubating.jar is absent.
[INFO] [Engine$] Going to run: /Users/jiazhaopu/program/apache-predictionio-0.11.0-incubating/PredictionIO/sbt/sbt package assemblyPackageDependency in /Users/jiazhaopu/workspace/incubator-predictionio-template-recommender
[ERROR] [Engine$] Downloading sbt launcher for 0.13.15:
[ERROR] [Engine$] From http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.15/sbt-launch.jar
[ERROR] [Engine$] To /Users/jiazhaopu/.sbt/launchers/0.13.15/sbt-launch.jar
下载jar包:sbt-launch.jar
正确打印:
[keyboard@dn-hadoop-keyboard-awsuse1b-02 MyECommerceRecommendation]$ pio build --verbose
[INFO] [Engine$] Using command '/usr/local/PredictionIO/PredictionIO-0.14.0/sbt/sbt' at /data/online/PredictionIO/MyECommerceRecommendation to build.
[INFO] [Engine$] If the path above is incorrect, this process will fail.
[INFO] [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-0.14.0.jar is absent.
[INFO] [Engine$] Going to run: /usr/local/PredictionIO/PredictionIO-0.14.0/sbt/sbt package assemblyPackageDependency in /data/online/PredictionIO/MyECommerceRecommendation
[INFO] [Engine$] [info] Loading settings for project myecommercerecommendation-build from assembly.sbt ...
[INFO] [Engine$] [info] Loading project definition from /data/online/PredictionIO/MyECommerceRecommendation/project
[INFO] [Engine$] [info] Loading settings for project myecommercerecommendation from build.sbt ...
[INFO] [Engine$] [info] Set current project to template-scala-parallel-ecommercerecommendation (in build file:/data/online/PredictionIO/MyECommerceRecommendation/)
[INFO] [Engine$] [success] Total time: 1 s, completed Jun 27, 2019 11:23:29 AM
[INFO] [Engine$] [info] Strategy 'discard' was applied to a file (Run the task at debug level to see details)
[INFO] [Engine$] [info] Assembly up to date: /data/online/PredictionIO/MyECommerceRecommendation/target/scala-2.11/template-scala-parallel-ecommercerecommendation-assembly-0.1.0-SNAPSHOT-deps.jar
[INFO] [Engine$] [success] Total time: 1 s, completed Jun 27, 2019 11:23:30 AM
[INFO] [Engine$] Compilation finished successfully.
[INFO] [Engine$] Looking for an engine...
[INFO] [Engine$] Found template-scala-parallel-ecommercerecommendation_2.11-0.1.0-SNAPSHOT.jar
[INFO] [Engine$] Found template-scala-parallel-ecommercerecommendation-assembly-0.1.0-SNAPSHOT-deps.jar
[INFO] [Engine$] Build finished successfully.
[INFO] [Pio$] Your engine is ready for training.
10.
pio train 开始训练数据
http://localhost:7070/events.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq
查询事件,默认输出20个:
curl -i -X GET http://localhost:7070/events.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq
查询指定ID事件:
curl -i -X GET http://localhost:7070/events/c705e522cddc4ca5b35dd33f6f35672a.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq
删除指定ID事件:
curl -i -X DELETE http://localhost:7070/events/c705e522cddc4ca5b35dd33f6f35672a.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq
pio app data-delete myEcoRe 删除myEcoRe 项目下所有事件
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@191a0351{/metrics/json,null,AVAILABLE,@Spark}
[INFO] [Engine$] EngineWorkflow.train
[INFO] [Engine$] DataSource: org.example.ecommercerecommendation.DataSource@22c8ee48
[INFO] [Engine$] Preparator: org.example.ecommercerecommendation.Preparator@2321e482
[INFO] [Engine$] AlgorithmList: List(org.example.ecommercerecommendation.ECommAlgorithm@4276ad40)
[INFO] [Engine$] Data sanity check is on.
[INFO] [Engine$] org.example.ecommercerecommendation.TrainingData does not support data sanity check. Skipping check.
[INFO] [Engine$] org.example.ecommercerecommendation.PreparedData does not support data sanity check. Skipping check.
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
[WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
[WARN] [LAPACK] Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
[WARN] [LAPACK] Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
[INFO] [Engine$] org.example.ecommercerecommendation.ECommModel does not support data sanity check. Skipping check.
[INFO] [Engine$] EngineWorkflow.train completed
[INFO] [Engine] engineInstanceId=9c612dc0-1c95-4ff7-aa4d-33fdd2c850e4
[INFO] [CoreWorkflow$] Inserting persistent model
[INFO] [CoreWorkflow$] Updating engine instance
[INFO] [CoreWorkflow$] Training completed successfully.
[INFO] [AbstractConnector] Stopped Spark@23c767e6{HTTP/1.1,[http/1.1]}{0.0.0.0:4042}
11. 部署引擎
pio deploy
修改端口:
pio deploy --port 8123
pio deploy --port 8123 --ip 1.2.3.4
pio batchpredict
[INFO] [MasterActor] Undeploying any existing engine instance at http://10.1.100.23:8000
[WARN] [MasterActor] Nothing at http://10.1.100.23:8000
[WARN] [WorkflowUtils$] Non-empty parameters supplied to org.example.ecommercerecommendation.Preparator, but its constructor does not accept any arguments. Stubbing with empty parameters.
[WARN] [WorkflowUtils$] Non-empty parameters supplied to org.example.ecommercerecommendation.Serving, but its constructor does not accept any arguments. Stubbing with empty parameters.
[INFO] [Runner$] Submission command:
/data/Software/spark-2.3.3/bin/spark-submit
--class org.apache.predictionio.workflow.CreateServer
--jars file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/postgresql-42.2.6.jar,
file:/data/online/PredictionIO/MyECommerceRecommendation/target/scala-2.11/template-scala-parallel-ecommercerecommendation_2.11-0.1.0-SNAPSHOT.jar,
file:/data/online/PredictionIO/MyECommerceRecommendation/target/scala-2.11/template-scala-parallel-ecommercerecommendation-assembly-0.1.0-SNAPSHOT-deps.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-localfs-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-hdfs-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-jdbc-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-elasticsearch-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-hbase-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-s3-assembly-0.14.0.jar
--files file:/usr/local/PredictionIO/PredictionIO-0.14.0/conf/log4j.properties
--driver-class-path /usr/local/PredictionIO/PredictionIO-0.14.0/conf:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/postgresql-42.2.6.jar
--driver-java-options -Dpio.log.dir=/home/keyboard file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/pio-assembly-0.14.0.jar
--engineInstanceId 9c612dc0-1c95-4ff7-aa4d-33fdd2c850e4
--engine-variant file:/data/online/PredictionIO/MyECommerceRecommendation/engine.json
--ip 0.0.0.0
--port 8000
--event-server-ip 0.0.0.0
--event-server-port 7070
--json-extractor Both --env PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/keyboard/.pio_store,PIO_HOME=/usr/local/PredictionIO/PredictionIO-0.14.0,PIO_FS_ENGINESDIR=/home/keyboard/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://10.1.100.65:5432/pio,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio@2019,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_FS_TMPDIR=/home/keyboard/.pio_store/tmp,PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/usr/local/PredictionIO/PredictionIO-0.14.0/conf
[INFO] [MasterActor] Undeploying any existing engine instance at http://0.0.0.0:8000
[WARN] [MasterActor] Nothing at http://0.0.0.0:8000
部署问题可能原因:
版本问题:0.14.0
https://groups.google.com/forum/#!topic/actionml-user/H795t022nuk
https://www.mail-archive.com/[email protected]/
http://archive.apache.org/dist/predictionio/0.13.0/
已经重新搭建环境,切换到0.13,问题解决
重新执行上诉步骤
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@d809ab4{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a435f5a{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4c8bc212{/metrics/json,null,AVAILABLE,@Spark}
[INFO] [Engine] Using persisted model
[INFO] [Engine] Loaded model org.example.ecommercerecommendation.ECommModel for algorithm org.example.ecommercerecommendation.ECommAlgorithm
[INFO] [AbstractConnector] Stopped Spark@1f5162ff{HTTP/1.1,[http/1.1]}{0.0.0.0:4042}
[INFO] [MasterActor] Undeploying any existing engine instance at http://0.0.0.0:8000
[WARN] [MasterActor] Nothing at http://0.0.0.0:8000
[INFO] [HttpListener] Bound to /0.0.0.0:8000
[INFO] [MasterActor] Engine is deployed and running. Engine API is live at http://0.0.0.0:8000.
http://34.224.110.105:8000/ 可以访问
curl -H "Content-Type: application/json" -d '{ "user": "u1", "num": 4 }' http://34.224.110.105:8000/queries.json
nohup pio deploy &
[keyboard@dn-hadoop-keyboard-awsuse1b-02 MyECommerceRecommendation]$
[keyboard@dn-hadoop-keyboard-awsuse1b-02 MyECommerceRecommendation]$ curl -H "Content-Type: application/json" -d '{ "user": "u1", "num": 4 }' http://localhost:8000/queries.json
{"itemScores":[{"item":"i14","score":0.007132239883003155},{"item":"i33","score":0.006370474517503999},{"item":"i3","score":0.005327996354803466},{"item":"i7","score":0.004598313880743443}]}[keyboard@dn-hadoop-keyboard-awsuse1b-02 MyECommerceRecommendation]$
{
"itemScores": [{
"item": "i14",
"score": 0.007132239883003155
}, {
"item": "i33",
"score": 0.006370474517503999
}, {
"item": "i3",
"score": 0.005327996354803466
}, {
"item": "i7",
"score": 0.004598313880743443
}]
}
大数据、数据分析、爬虫群: 《453908562》