官网案例:http://livy.incubator.apache.org/examples/
REST API:http://livy.incubator.apache.org/docs/latest/rest-api.html
通过REST API的方式去获取到session,返回活的交互式session
打开Postman,在其上面进行操作:
GET 192.168.26.131:8998/sessions
{
"from": 0,
"total": 0,
"sessions": []
}
创建一个新的交互式Scala、Python、R 的shell在集群中
在Postman上进行操作(操作失败):
POST 192.168.26.131:8998/sessions
使用以下方式创建Session
[hadoop@hadoop001 conf]$ curl -X POST --data '{"kind":"spark"}' -H "Content-Type:application/json" 192.168.26.131:8998/sessions
启动失败,打印如下日志:
18/06/09 06:35:27 WARN SparkEntries: SparkSession is not supported
java.net.ConnectException: Call From hadoop001/192.168.26.131 to hadoop001:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
解决方案:启动HDFS
重新启动,启动成功,打印日志:
18/06/09 06:42:56 INFO LineBufferedStream: stdout: 18/06/09 06:42:56 INFO SparkEntries: Spark context finished initialization in 6153ms
18/06/09 06:42:56 INFO LineBufferedStream: stdout: 18/06/09 06:42:56 INFO SparkEntries: Created Spark session.
启动成功之后,返回如下JSON信息:
{"id":0,"appId":null,"owner":null,"proxyUser":null,"state":"starting","kind":"spark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":["stdout: ","\nstderr: "]}
创建Seesion成功之后,再去查看活的Seesion信息:
2种方式:
返回的JSON信息:
{
"from": 0,
"total": 1,
"sessions": [
{
"id": 0,
"appId": null,
"owner": null,
"proxyUser": null,
"state": "idle",
"kind": "spark",
"appInfo": {
"driverLogUrl": null,
"sparkUiUrl": null
},
"log": [
"18/06/09 06:42:53 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37616.",
"18/06/09 06:42:53 INFO NettyBlockTransferService: Server created on 192.168.26.131:37616",
"18/06/09 06:42:53 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy",
"18/06/09 06:42:53 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.26.131, 37616, None)",
"18/06/09 06:42:53 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.26.131:37616 with 413.9 MB RAM, BlockManagerId(driver, 192.168.26.131, 37616, None)",
"18/06/09 06:42:53 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.26.131, 37616, None)",
"18/06/09 06:42:53 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.26.131, 37616, None)",
"18/06/09 06:42:56 INFO EventLoggingListener: Logging events to hdfs://192.168.26.131:8020/spark_log/local-1528497773147",
"18/06/09 06:42:56 INFO SparkEntries: Spark context finished initialization in 6153ms",
"18/06/09 06:42:56 INFO SparkEntries: Created Spark session."
]
}
]
}
查询session的状态
打开Postman,在其上面进行操作:
GET 192.168.26.131:8998/sessions/0/state
{
"id": 0,
"state": "idle"
}
执行代码片段,简单的加法操作(一):
[hadoop@hadoop001 livy-0.5.0-incubating-bin]$curl -X POST 192.168.26.131:8998/sessions/0/statements -H "Content-Type:application/json" -d '{"code":"1+1"}'
{"id":0,"code":"1+1","state":"waiting","output":null,"progress":0.0}
查询代码片段执行是否成功:
打开Postman,在其上面进行操作:
GET 192.168.26.131:8998/sessions/0/statements/0
{
"id": 0,
"code": "1+1",
"state": "available",
"output": {
"status": "ok",
"execution_count": 0,
"data": {
"text/plain": "res0: Int = 2\n"
}
},
"progress": 1
}
执行代码片段,简单的加法操作(二):
{
"kind":"spark",
"code":"1+2"
}
{
"id": 1,
"code": "1+2",
"state": "waiting",
"output": null,
"progress": 0
}
查询代码片段执行是否成功:
打开Postman,在其上面进行操作:
GET 192.168.26.131:8998/sessions/0/statements/1
{
"id": 1,
"code": "1+2",
"state": "available",
"output": {
"status": "ok",
"execution_count": 1,
"data": {
"text/plain": "res1: Int = 3\n"
}
},
"progress": 1
}
如果想执行wordcount操作的话,就直接把代码贴在code处就行了
通过web ui查看代码片段执行结果:
代码片段与其执行结果都在图中进行了显示
删除session状态:
打开Postman,在其上面进行操作:
DELETE 192.168.26.131:8998/sessions/0
{
"msg": "deleted"
}
有个客户端client,中间有个livy server,后面有spark interactive session和spark batch session(在这2个里面的底层都是有一个SparkContext的)
client发请求过来(http或rest)到livy server,然后会去spark interactive session和spark batch session分别去创建2个session;与spark集群交互打交道,去创建session的方式有2种:http或rpc,现在用的比较多的方式是:rpc
livy server就是一个rest的服务,收到客户端的请求之后,与spark集群进行连接;客户端只需要把请求发到server上就可以了
这样的话,就分为了3层:
这样能带来一个优点,对于原来提交作业机器的压力可以减少很多,我们只要保障Livy Server的HA就OK了
对于这个是可以保证的
对比:
在提交作业的时候,很多监控,可以通过UI搞的定,可以获取相应的接口,自己定制化的去开发监控界面
通过rest api去把接口拿到、去把数据拿到,这样就可以自己定制化出来了,是很方便的
总体执行的流程:
多用户的特性:
上述是一个用户的操作,如果第二个、第三个用户来,可以这样操作:
比如:蓝色的client共享一个session,黑色的client共享一个session,可以通过一定的标识,它们自己能够识别出来
安全性的扩展思路:
中间过程,在Livy Server这里,是可以做安全框架的
可以使用编程式的API:http://livy.incubator.apache.org/docs/latest/programmatic-api.html
Github的example:https://github.com/apache/incubator-livy/tree/master/examples
写在最后,关于Livy的使用,可以参考邵赛赛的博客:http://jerryshao.me/2018/01/05/livy-spark-based-rest-service/