spark3.3.1通过hbase-connectors连接CDH6.3.2自带hbase

补充编译
hbase-connectors-master
源码下载

wget https://github.com/apache/hbase-connectors
mvn --settings /Users/admin/Documents/softwares/repository-zi/settings-aliyun.xml  -Dspark.version=3.3.1 -Dscala.version=2.12.10 -Dscala.binary.version=2.12 -Dhbase.version=2.4.9 -Dhadoop-three.version=3.2.0 -DskipTests clean package

pom.xml添加配置

<repositories>
    <repository>
      <id>central</id>
      <name>Maven Repository</name>
      <url>https://repo.maven.apache.org/maven2</url>
      <releases>
        <enabled>true</enabled>
      </releases>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
    </repository>
  </repositories>

1、导出对应的jar包

cp /opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-netty-2.2.1.jar /opt/cloudera/parcels/CDH/lib/spark3/jars/

cp /opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-protobuf-2.2.1.jar /opt/cloudera/parcels/CDH/lib/spark3/jars/

cp /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-shaded-2.1.0-cdh6.3.2.jar /opt/cloudera/parcels/CDH/lib/spark3/jars/

cp /opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-miscellaneous-2.2.1.jar /opt/cloudera/parcels/CDH/lib/spark3/jars/

cp hbase-spark-1.0.1-SNAPSHOT.jar /opt/cloudera/parcels/CDH/lib/spark3/jars/

2、scala脚本测试

spark-shell -c spark.ui.port=11111
import org.apache.hadoop.hbase.spark.HBaseContext
import org.apache.hadoop.hbase.HBaseConfiguration
val conf = HBaseConfiguration.create()
conf.set("hbase.zookeeper.quorum", "hadoop103:2181")
new HBaseContext(spark.sparkContext, conf)
val hbaseDF = (spark.read.format("org.apache.hadoop.hbase.spark")
 .option("hbase.columns.mapping",
   "empno STRING :key,ename STRING info:ename, job STRING info:job"
 ).option("hbase.table", "hive_hbase_emp_table")
 ).load()
hbaseDF.show()

spark3.3.1通过hbase-connectors连接CDH6.3.2自带hbase_第1张图片
3、pyspark脚本测试hbase-connectors

import findspark
findspark.init(spark_home='/opt/cloudera/parcels/CDH/lib/spark3',python_path='/opt/cloudera/anaconda3/bin/python')
from pyspark.sql import SparkSession

spark = SparkSession.Builder().appName("Demo_spark_conn").getOrCreate()

df = spark.read.format("org.apache.hadoop.hbase.spark").option("hbase.zookeeper.quorum","hadoop103:2181").option("hbase.columns.mapping","empno STRING :key, ename STRING info:ename,job STRING info:job").option("hbase.table", "hive_hbase_emp_table").option("hbase.spark.use.hbasecontext", False).load()

df.show(10)

spark3.3.1通过hbase-connectors连接CDH6.3.2自带hbase_第2张图片

你可能感兴趣的:(hadoop,hbase,大数据,hadoop)