环境部署
$ docker pull nerdammer/hbase-phoenix
$ docker run -d -p 2181:2181 -p 60000:60000 -p 60010:60010 -p 60020:60020 -p 60030:60030 nerdammer/hbase-phoenix
$ docker exec -i -t d90 bash
# cd /opt/phoenix/bin/
# ./sqlline.py 127.0.0.1:2181
> CREATE TABLE INPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);
> CREATE TABLE OUTPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);
- ps 此处最好使用-p指定端口,同时添加container id到host中 避免后续客户端连上zk后获取到的host:port 无法访问
开发环境
-
环境依赖
libraryDependencies += "org.apache.phoenix" % "phoenix-spark" % "4.4.0-HBase-1.1
代码片段:
streams.foreachRDD( rdd => {
println(s"rdd count: ${rdd.count()}")
val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)
import sqlContext.implicits._
val dataFrame = rdd.toDF()
dataFrame.show()
dataFrame.write.format("org.apache.phoenix.spark").mode( SaveMode.Overwrite).options(Map("table" -> "OUTPUT_TABLE",
"zkUrl" -> "192.168.99.100:2181")).save()
})
http://git.oschina.net/wangpeibin/codes/wonvmskrhcd9apfyetjgl60
参考资料
- https://phoenix.apache.org/phoenix_spark.html