SaprkSql 集成 hive 两种方式 (spark2.0+hive1.22 )

首先明确  sparkSQl  关联hive的时候 有多种交互方式:

spark SQL     和    thrift



【先说  thrift】

在   $SPARK_HOEM/conf  中的 hive-site.xml中  要去设置以下参数:


  hive.metastore.uris
  thrift://mach40:9083
 Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.
 

  
   
    hive.server2.thrift.min.worker.threads
    5
    Minimum number of Thrift worker threads
 

 
 
    hive.server2.thrift.max.worker.threads
    500
    Maximum number of Thrift worker threads
 

 
 
    hive.server2.thrift.port
    10000
    Port number of HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT
 

 
 
    hive.server2.thrift.bind.host
    mach40
    Bind host on which to run the HiveServer2 Thrift interface.Can be overridden by setting$HIVE_SERVER2_THRIFT_BIND_HOST
 

 


在hive中启动:

[root@slaver3 conf]#   hive --service hiveserver2 &

[root@slaver3 conf]#  jobs   

[1]+  运行中               hive --service hiveserver2 &

[root@slaver3 conf]# jps

6064 RunJar


在  spark 客户端启动:

[root@slaver3 conf]#    beeline -u jdbc:hive2://slaver3:10000/default -n root -p root

出现一下

Connecting to jdbc:hive2://slaver3:10000/default
18/04/24 09:35:50 INFO jdbc.Utils: Supplied authorities: slaver3:10000
18/04/24 09:35:50 INFO jdbc.Utils: Resolved authority: slaver3:10000
18/04/24 09:35:50 INFO jdbc.HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://slaver3:10000/default
Connected to: Apache Hive (version 2.0.1)
Driver: Spark Project Core (version 1.6.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.6.0 by Apache Hive

0: jdbc:hive2://slaver3:10000/default> 


【第二种:sparksql】

首先 要知晓一些东西:

spark   访问  hive  metastore  也就是他的元数据来执行  一些操作 ;

但是spark 现在支持到了 hive的1.2  一旦hive 版本过高 是无法关联的  


1.两者安装步骤略:

(参见本博客其他资料)

2.hive-site.xml  

scp  hive-site.xml  $SPARK_HOEM/conf

hive-site.xml内容:

(内容实在多的烦就全清空)



      javax.jdo.option.ConnectionURL
            jdbc:mysql://mach40:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
            JDBC connect string for a JDBC metastore

   
          javax.jdo.option.ConnectionDriverName
          com.mysql.jdbc.Driver
          Driver class name for a JDBC metastore
   


          javax.jdo.option.ConnectionUserName
          hive
 username to use against metastore database
 

 
          javax.jdo.option.ConnectionPassword
           hive
           password to use against metastore database
 

 
    hive.querylog.location
    /home/hive1.22/iotmp
 

 
    hive.exec.scratchdir
    /tmp/hive
 

 
    hive.exec.local.scratchdir
    /home/hive1.22/iotmp
    Local scratch space for Hive jobs
 

 
    hive.downloaded.resources.dir
    /home/hive1.22/iotmp
  Temporary local directory for added resources in the remote file system.

 

//  下面这一步  在需要访问的客户端的 spark/conf/中的  hive-site.xml 中 去添加   服务端不能配置此项

 
  hive.metastore.uris
  thrift://mach40:9083
 Thrift uri for the remote metastore. Used by metastore client to connect to remote metastore.
 

  
   
    hive.server2.thrift.min.worker.threads
    5
    Minimum number of Thrift worker threads
 

 
 
    hive.server2.thrift.max.worker.threads
    500
    Maximum number of Thrift worker threads
 

 
 
    hive.server2.thrift.port
    10000
    Port number of HiveServer2 Thrift interface. Can be overridden by setting $HIVE_SERVER2_THRIFT_PORT
 

 
 
    hive.server2.thrift.bind.host
    mach40
    Bind host on which to run the HiveServer2 Thrift interface.Can be overridden by setting$HIVE_SERVER2_THRIFT_BIND_HOST
 

 


修改   hadoop : etc/hadoop/core-site.xml


      hadoop.proxyuser.root.groups
      *
      Allow the superuser oozie to impersonate any members of the group group1 and group2
 




 
      hadoop.proxyuser.root.hosts
      *
      The superuser can connect only from host1 and host2 to impersonate a user
 


3.驱动包:

scp mysql-connector-java-5.1.41-bin.jar /home/spark2.10/jars/


如果是其他版本的 spark  在lib  中或者 是spark-env.sh 中加入 上述 驱动包的 classpath,有可能还需要 hive中的  

/home/hive2.01/lib/datanucleus-api-jdo-4.2.1.jar:

/home/hive2.01/lib/datanucleus-core-4.1.6.jar:

/home/hive2.01/lib/datanucleus-rdbms-4.1.7.jar


关于异常:

spark2.0以上的jar 包在$SPARK_HOME/jars下面,启动hive的时候可能汇报错:

没有文件:spark2.0/lib/spark-assmby......jar


处理方式:

vim bin/hive

 spark2.0/lib/spark-assmby......jar  改成 spark2.0/jars/*jar


你可能感兴趣的:(linux,spark)