livy是cloudera开发的通过REST来连接。管理spark的解决方案,此文记录在使用livy中遇到的一些问题
1、livy的下载
livy安装不多赘述,可以从github上自己build,也可以直接从livy.io上直接下载tar包。
下载 地址 :http://livy.io/quickstart.html
下载之后,解压即可运行
2、解压livy后,在livy-env.sh中添加
export SPARK_HOME=/opt/module/spark-2.1.1-bin-hadoop2.7.2
export HADOOP_CONF_DIR=/opt/module/hadoop/etc/conf
3、在livy.conf中可以进行一些配置
//默认使用hiveContext
livy.repl.enableHiveContext = true
//开启用户代理
livy.impersonation.enabled = true
//设置session空闲过期时间
livy.server.session.timeout = 1h
livy.server.session.factory = yarn/local本地模式或者yarn模式
4、开启livy服务
[root@node1 livy]$ /bin/livy-server
5、使用livy-session来执行spark-shell。
通过使用livy,可以通过rest来执行spark-shell。
curl -X POST --data '{"kind": "scala","proxyUser": "caoaoxiang"}' -H "Content-Type: application/json" localhost:8998/sessions
curl localhost:8998/sessions
curl localhost:8998/sessions/{{sessionId}}/statements -X POST -H 'Content-Type: application/json' -d '{"code":"var a = 1;var b=a+1"}'
curl localhost:8998/sessions/{{sessionId}}/statements/{{statId}}
curl -X DELETE localhost:8998/sessions/{{sessionId}}
java.lang.IllegalStateException: RPC channel is closed
6、livy batch and spark-submit
curl -X POST -H "Content-Type: application/json" localhost:8998/batches --data '{ "conf": {"spark.master":"yarn-cluster"}, "file": "/user/hdfs/spark-examples-1.6.1-hadoop2.6.0.jar", "className": "org.apache.spark.examples.SparkPi", "name": "Scala Livy Pi Example", "executorCores":1, "executorMemory":"512m", "driverCores":1, "driverMemory":"512m", "queue":"default", "args":["100"]}'
7、 batches的相关问题
Error: Cannot load main class from JAR hdfs://datanode32:8020/user/caoaoxiang/spark-examples-1.6.1-hadoop2.6.0.jar with URI hdfs. Please specify a class through --class.