hbase-spark 简单实践

环境部署


$ docker pull nerdammer/hbase-phoenix
$ docker run  -d -p 2181:2181 -p 60000:60000 -p 60010:60010 -p 60020:60020 -p 60030:60030 nerdammer/hbase-phoenix

$ docker exec -i -t d90 bash
  # cd /opt/phoenix/bin/
  # ./sqlline.py 127.0.0.1:2181
    > CREATE TABLE INPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);
    > CREATE TABLE OUTPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);
    
  • ps 此处最好使用-p指定端口,同时添加container id到host中 避免后续客户端连上zk后获取到的host:port 无法访问

开发环境


  • 环境依赖

      libraryDependencies += "org.apache.phoenix" % "phoenix-spark" % "4.4.0-HBase-1.1
    
  • 代码片段:

streams.foreachRDD( rdd => {
println(s"rdd count: ${rdd.count()}")
val sqlContext = SQLContext.getOrCreate(rdd.sparkContext)
import sqlContext.implicits._
val dataFrame = rdd.toDF()
dataFrame.show()
dataFrame.write.format("org.apache.phoenix.spark").mode( SaveMode.Overwrite).options(Map("table" -> "OUTPUT_TABLE",
"zkUrl" -> "192.168.99.100:2181")).save()
 })

http://git.oschina.net/wangpeibin/codes/wonvmskrhcd9apfyetjgl60

参考资料


  • https://phoenix.apache.org/phoenix_spark.html

你可能感兴趣的:(hbase-spark 简单实践)