PredictionIO 做推荐系统踩的坑

1. spark 安装目录问题
    ambari  安装的spark  版本号和社区下载的不一样,semver.sh脚本执行正则的时候验证spark版本是否合格,会失败。
    重新安装 spark 指定路径

2. 链接数据库须带端口

3.spark路径
    /data/Software/spark-2.3.3
    
4.postgresql数据库:
    数据地址:
    库名:pio
    用户名:pio
    密码:
    
    链接
    psql -U pio -h ip -p 5432 -d pio
    所有数据库:
    select * from pg_database;
    选择数据库:
    \c pio
    查询指定库所有表:
    \dt
    退出:
    \q
    
    
5.PredictionIO安装目录:
    /usr/local/PredictionIO/PredictionIO-0.14.0
    unset -f pathmunge
    export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
    export HADOOP_HOME=/usr/hdp/2.6.4.0-91/hadoop
    export SPARK_HOME=/usr/hdp/2.6.4.0-91/spark2
    export PGD_HOME=/usr/local/postgresql
    export PIO_HOME=/usr/local/PredictionIO/PredictionIO-0.14.0
    export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$SPARK_HOME/bin:$PGD_HOME/bin:$PIO_HOME/bin:$PATH
    export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

    export SPARK_MAJOR_VERSION=2
    配置文件:
    pio-env.sh
    
    jar包;
    postgresql-42.2.6.jar

pio eventserver &   启动

jps -l  查看

    
6.流程
    目录项目    /data/online/PredictionIO/MyECommerceRecommendation
    
    git clone https://github.com/apache/predictionio-template-ecom-recommender.git MyECommerceRecommendation
    
    pio app new myEcoRe        创建app项目
        [INFO] [App$] Initialized Event Store for this app ID: 1.
        [INFO] [Pio$] Created a new app:
        [INFO] [Pio$]       Name: myEcoRe
        [INFO] [Pio$]         ID: 1
        [INFO] [Pio$] Access Key: CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq
        
    pio app list    查看
        [INFO] [Pio$]                 Name |   ID |                                                       Access Key | Allowed Event(s)
        [INFO] [Pio$]              myEcoRe |    1 | CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq | (all)
        [INFO] [Pio$] Finished listing 1 app(s).
    
    
    curl -i -X GET http://localhost:7070   验证事件服务器是否启动成功
    
    
7.添加事件:
    
curl -i -X POST http://localhost:7070/events.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq \
-H "Content-Type: application/json" \
-d '{
  "event" : "view",
  "entityType" : "user",
  "entityId" : "u0",
  "targetEntityType" : "item",
  "targetEntityId" : "i0",
  "eventTime" : "2019-06-10T12:34:56.123-08:00"
}'

8.执行批量插入数据脚本
    /data/online/PredictionIO/MyECommerceRecommendation/shell/add_event_batch.sh    

    Namespace(access_key='CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq', file='/data/online/PredictionIO/MyECommerceRecommendation/py_import_data/sample_movielens_data.txt', url='http://localhost:7070')
    Importing data...
    1501 events are imported.
    
    批量:
    python data/import_eventserver.py --access_key    
        

9.准备训练数据    注:必须在项目目录下进行
pio build --verbose     准备训练数据   --verbose  详细信息打印
    报错:[INFO] [Engine$] If the path above is incorrect, this process will fail.
            [INFO] [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-0.11.0-incubating.jar is absent.
            [INFO] [Engine$] Going to run: /Users/jiazhaopu/program/apache-predictionio-0.11.0-incubating/PredictionIO/sbt/sbt  package assemblyPackageDependency in /Users/jiazhaopu/workspace/incubator-predictionio-template-recommender
            [ERROR] [Engine$] Downloading sbt launcher for 0.13.15:
            [ERROR] [Engine$]   From  http://repo.typesafe.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/0.13.15/sbt-launch.jar
            [ERROR] [Engine$]     To  /Users/jiazhaopu/.sbt/launchers/0.13.15/sbt-launch.jar
    
    下载jar包:sbt-launch.jar
    正确打印:
        [keyboard@dn-hadoop-keyboard-awsuse1b-02 MyECommerceRecommendation]$ pio build --verbose
        [INFO] [Engine$] Using command '/usr/local/PredictionIO/PredictionIO-0.14.0/sbt/sbt' at /data/online/PredictionIO/MyECommerceRecommendation to build.
        [INFO] [Engine$] If the path above is incorrect, this process will fail.
        [INFO] [Engine$] Uber JAR disabled. Making sure lib/pio-assembly-0.14.0.jar is absent.
        [INFO] [Engine$] Going to run: /usr/local/PredictionIO/PredictionIO-0.14.0/sbt/sbt  package assemblyPackageDependency in /data/online/PredictionIO/MyECommerceRecommendation
        [INFO] [Engine$] [info] Loading settings for project myecommercerecommendation-build from assembly.sbt ...
        [INFO] [Engine$] [info] Loading project definition from /data/online/PredictionIO/MyECommerceRecommendation/project
        [INFO] [Engine$] [info] Loading settings for project myecommercerecommendation from build.sbt ...
        [INFO] [Engine$] [info] Set current project to template-scala-parallel-ecommercerecommendation (in build file:/data/online/PredictionIO/MyECommerceRecommendation/)
        [INFO] [Engine$] [success] Total time: 1 s, completed Jun 27, 2019 11:23:29 AM
        [INFO] [Engine$] [info] Strategy 'discard' was applied to a file (Run the task at debug level to see details)
        [INFO] [Engine$] [info] Assembly up to date: /data/online/PredictionIO/MyECommerceRecommendation/target/scala-2.11/template-scala-parallel-ecommercerecommendation-assembly-0.1.0-SNAPSHOT-deps.jar
        [INFO] [Engine$] [success] Total time: 1 s, completed Jun 27, 2019 11:23:30 AM
        [INFO] [Engine$] Compilation finished successfully.
        [INFO] [Engine$] Looking for an engine...
        [INFO] [Engine$] Found template-scala-parallel-ecommercerecommendation_2.11-0.1.0-SNAPSHOT.jar
        [INFO] [Engine$] Found template-scala-parallel-ecommercerecommendation-assembly-0.1.0-SNAPSHOT-deps.jar
        [INFO] [Engine$] Build finished successfully.
        [INFO] [Pio$] Your engine is ready for training.
    
10.
pio train     开始训练数据


http://localhost:7070/events.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq


查询事件,默认输出20个:
curl -i -X GET http://localhost:7070/events.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq

查询指定ID事件:
curl -i -X GET http://localhost:7070/events/c705e522cddc4ca5b35dd33f6f35672a.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq

删除指定ID事件:
curl -i -X DELETE http://localhost:7070/events/c705e522cddc4ca5b35dd33f6f35672a.json?accessKey=CDLjmOcO3iRD82dxkMKHsRIUgffc9sLUuENUjp36-tE5bqV_aCK22eBa1-bfUAIq

pio app data-delete myEcoRe    删除myEcoRe 项目下所有事件

        [INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@191a0351{/metrics/json,null,AVAILABLE,@Spark}
        [INFO] [Engine$] EngineWorkflow.train
        [INFO] [Engine$] DataSource: org.example.ecommercerecommendation.DataSource@22c8ee48
        [INFO] [Engine$] Preparator: org.example.ecommercerecommendation.Preparator@2321e482
        [INFO] [Engine$] AlgorithmList: List(org.example.ecommercerecommendation.ECommAlgorithm@4276ad40)
        [INFO] [Engine$] Data sanity check is on.
        [INFO] [Engine$] org.example.ecommercerecommendation.TrainingData does not support data sanity check. Skipping check.
        [INFO] [Engine$] org.example.ecommercerecommendation.PreparedData does not support data sanity check. Skipping check.
        [WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
        [WARN] [BLAS] Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
        [WARN] [LAPACK] Failed to load implementation from: com.github.fommil.netlib.NativeSystemLAPACK
        [WARN] [LAPACK] Failed to load implementation from: com.github.fommil.netlib.NativeRefLAPACK
        [INFO] [Engine$] org.example.ecommercerecommendation.ECommModel does not support data sanity check. Skipping check.
        [INFO] [Engine$] EngineWorkflow.train completed
        [INFO] [Engine] engineInstanceId=9c612dc0-1c95-4ff7-aa4d-33fdd2c850e4
        [INFO] [CoreWorkflow$] Inserting persistent model
        [INFO] [CoreWorkflow$] Updating engine instance
        [INFO] [CoreWorkflow$] Training completed successfully.
        [INFO] [AbstractConnector] Stopped Spark@23c767e6{HTTP/1.1,[http/1.1]}{0.0.0.0:4042}

11.  部署引擎
    pio deploy
    修改端口:
        pio deploy --port 8123
        pio deploy --port 8123 --ip 1.2.3.4
        
        
        
    pio batchpredict


[INFO] [MasterActor] Undeploying any existing engine instance at http://10.1.100.23:8000
[WARN] [MasterActor] Nothing at http://10.1.100.23:8000
[WARN] [WorkflowUtils$] Non-empty parameters supplied to org.example.ecommercerecommendation.Preparator, but its constructor does not accept any arguments. Stubbing with empty parameters.
[WARN] [WorkflowUtils$] Non-empty parameters supplied to org.example.ecommercerecommendation.Serving, but its constructor does not accept any arguments. Stubbing with empty parameters.

[INFO] [Runner$] Submission command: 
/data/Software/spark-2.3.3/bin/spark-submit 
--class org.apache.predictionio.workflow.CreateServer 
--jars file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/postgresql-42.2.6.jar,
file:/data/online/PredictionIO/MyECommerceRecommendation/target/scala-2.11/template-scala-parallel-ecommercerecommendation_2.11-0.1.0-SNAPSHOT.jar,
file:/data/online/PredictionIO/MyECommerceRecommendation/target/scala-2.11/template-scala-parallel-ecommercerecommendation-assembly-0.1.0-SNAPSHOT-deps.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-localfs-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-hdfs-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-jdbc-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-elasticsearch-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-hbase-assembly-0.14.0.jar,
file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/spark/pio-data-s3-assembly-0.14.0.jar 
--files file:/usr/local/PredictionIO/PredictionIO-0.14.0/conf/log4j.properties 
--driver-class-path /usr/local/PredictionIO/PredictionIO-0.14.0/conf:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/postgresql-42.2.6.jar 
--driver-java-options -Dpio.log.dir=/home/keyboard file:/usr/local/PredictionIO/PredictionIO-0.14.0/lib/pio-assembly-0.14.0.jar 
--engineInstanceId 9c612dc0-1c95-4ff7-aa4d-33fdd2c850e4 
--engine-variant file:/data/online/PredictionIO/MyECommerceRecommendation/engine.json 
--ip 0.0.0.0 
--port 8000 
--event-server-ip 0.0.0.0 
--event-server-port 7070 
--json-extractor Both --env PIO_ENV_LOADED=1,PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta,PIO_FS_BASEDIR=/home/keyboard/.pio_store,PIO_HOME=/usr/local/PredictionIO/PredictionIO-0.14.0,PIO_FS_ENGINESDIR=/home/keyboard/.pio_store/engines,PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://10.1.100.65:5432/pio,PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=PGSQL,PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event,PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio@2019,PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc,PIO_FS_TMPDIR=/home/keyboard/.pio_store/tmp,PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio,PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model,PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=PGSQL,PIO_CONF_DIR=/usr/local/PredictionIO/PredictionIO-0.14.0/conf


[INFO] [MasterActor] Undeploying any existing engine instance at http://0.0.0.0:8000
[WARN] [MasterActor] Nothing at http://0.0.0.0:8000

部署问题可能原因:
版本问题:0.14.0
https://groups.google.com/forum/#!topic/actionml-user/H795t022nuk
https://www.mail-archive.com/[email protected]/
http://archive.apache.org/dist/predictionio/0.13.0/


已经重新搭建环境,切换到0.13,问题解决


重新执行上诉步骤


[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@d809ab4{/jobs/job/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@7a435f5a{/stages/stage/kill,null,AVAILABLE,@Spark}
[INFO] [ContextHandler] Started o.s.j.s.ServletContextHandler@4c8bc212{/metrics/json,null,AVAILABLE,@Spark}
[INFO] [Engine] Using persisted model
[INFO] [Engine] Loaded model org.example.ecommercerecommendation.ECommModel for algorithm org.example.ecommercerecommendation.ECommAlgorithm
[INFO] [AbstractConnector] Stopped Spark@1f5162ff{HTTP/1.1,[http/1.1]}{0.0.0.0:4042}
[INFO] [MasterActor] Undeploying any existing engine instance at http://0.0.0.0:8000
[WARN] [MasterActor] Nothing at http://0.0.0.0:8000
[INFO] [HttpListener] Bound to /0.0.0.0:8000
[INFO] [MasterActor] Engine is deployed and running. Engine API is live at http://0.0.0.0:8000.

http://34.224.110.105:8000/     可以访问


curl -H "Content-Type: application/json" -d '{ "user": "u1", "num": 4 }' http://34.224.110.105:8000/queries.json


nohup pio deploy &


[keyboard@dn-hadoop-keyboard-awsuse1b-02 MyECommerceRecommendation]$ 
[keyboard@dn-hadoop-keyboard-awsuse1b-02 MyECommerceRecommendation]$ curl -H "Content-Type: application/json" -d '{ "user": "u1", "num": 4 }' http://localhost:8000/queries.json
{"itemScores":[{"item":"i14","score":0.007132239883003155},{"item":"i33","score":0.006370474517503999},{"item":"i3","score":0.005327996354803466},{"item":"i7","score":0.004598313880743443}]}[keyboard@dn-hadoop-keyboard-awsuse1b-02 MyECommerceRecommendation]$ 

{
    "itemScores": [{
        "item": "i14",
        "score": 0.007132239883003155
    }, {
        "item": "i33",
        "score": 0.006370474517503999
    }, {
        "item": "i3",
        "score": 0.005327996354803466
    }, {
        "item": "i7",
        "score": 0.004598313880743443
    }]
}


大数据、数据分析、爬虫群: 《453908562》

你可能感兴趣的:(大数据,算法)