本地IDEA中使用Spark SQL 连接服务器Hive

1. 复制hive-site.xml
将Spark-Home/conf目录下的hive-site.xml复制到IDEA工程的resource目录下

2. 修改入口
Spark 2.x版本将SparkContext和HiveContext整合起来,统一使用SparkSession作为连接入口。所以建立连接需要使用如下代码

val conf = new SparkConf()
conf.setAppName("WordCount").setMaster("local")
val hive = SparkSession.builder().enableHiveSupport().config(conf).getOrCreate()
hive.sql(xxx)

一定要加上enableHiveSupport(),不然使用的是默认的配置,不会读取hive-site.xml 
若使用Spark 1.X版本可以通过如下代码使用Hive

val sc = new SparkContext(conf)
val hive = new HiveContext(sc)
hive.sql(xxxx)

3. 修改hive-site.xml



 
    hive.metastore.uris
    thrift://pc3:9083
 

 
    hive.metastore.client.socket.timeout
    300
 

 
    hive.metastore.warehouse.dir
    hdfs://pc2:8020/user/hive/warehouse
 

  ......
 
    javax.jdo.option.ConnectionURL
    jdbc:mysql://192.168.251.7:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
 

 
    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
 

 
    javax.jdo.option.ConnectionUserName
    hive
 

 
    javax.jdo.option.ConnectionPassword
    hive

标红部分是修改的。

未修改之前控制台打印信息:

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/D:/codingsofeware/maven/mavenrerepository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/D:/codingsofeware/maven/mavenrerepository/ch/qos/logback/logback-classic/1.1.3/logback-classic-1.1.3.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/02/19 11:11:54 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
19/02/19 11:11:54 INFO ObjectStore: ObjectStore, initialize called
19/02/19 11:11:54 INFO Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
19/02/19 11:11:54 INFO Persistence: Property datanucleus.cache.level2 unknown - will be ignored
19/02/19 11:11:56 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
19/02/19 11:11:57 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
19/02/19 11:11:57 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
19/02/19 11:11:57 INFO ObjectStore: Initialized ObjectStore
19/02/19 11:11:57 INFO HiveMetaStore: Added admin role in metastore
19/02/19 11:11:57 INFO HiveMetaStore: Added public role in metastore
19/02/19 11:11:57 INFO HiveMetaStore: No user is added in admin role, since config is empty
19/02/19 11:11:57 INFO HiveMetaStore: 0: get_all_databases
19/02/19 11:11:57 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_all_databases    
19/02/19 11:11:57 INFO HiveMetaStore: 0: get_functions: db=default pat=*
19/02/19 11:11:57 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_functions: db=default pat=*    
19/02/19 11:11:57 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
19/02/19 11:11:57 INFO SessionState: Created local directory: C:/Users/zcah/AppData/Local/Temp/c2adb2df-c591-4150-b82c-51a4153c0d35_resources
19/02/19 11:11:57 INFO SessionState: Created HDFS directory: /tmp/hive/zcah/c2adb2df-c591-4150-b82c-51a4153c0d35
19/02/19 11:11:57 INFO SessionState: Created local directory: C:/Users/zcah/AppData/Local/Temp/zcah/c2adb2df-c591-4150-b82c-51a4153c0d35
19/02/19 11:11:57 INFO SessionState: Created HDFS directory: /tmp/hive/zcah/c2adb2df-c591-4150-b82c-51a4153c0d35/_tmp_space.db
19/02/19 11:11:57 INFO HiveMetaStore: 0: get_database: default
19/02/19 11:11:57 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_database: default    
19/02/19 11:11:57 INFO HiveMetaStore: 0: get_database: global_temp
19/02/19 11:11:57 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_database: global_temp    
19/02/19 11:11:57 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
19/02/19 11:11:58 INFO SessionState: Created local directory: C:/Users/zcah/AppData/Local/Temp/21b57641-9da3-466b-b436-a265be48de67_resources
19/02/19 11:11:58 INFO SessionState: Created HDFS directory: /tmp/hive/zcah/21b57641-9da3-466b-b436-a265be48de67
19/02/19 11:11:58 INFO SessionState: Created local directory: C:/Users/zcah/AppData/Local/Temp/zcah/21b57641-9da3-466b-b436-a265be48de67
19/02/19 11:11:58 INFO SessionState: Created HDFS directory: /tmp/hive/zcah/21b57641-9da3-466b-b436-a265be48de67/_tmp_space.db
19/02/19 11:11:59 INFO HiveMetaStore: 0: get_database: default
19/02/19 11:11:59 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_database: default    
19/02/19 11:11:59 INFO HiveMetaStore: 0: get_database: default
19/02/19 11:11:59 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_database: default    
19/02/19 11:11:59 INFO HiveMetaStore: 0: get_tables: db=default pat=*
19/02/19 11:11:59 INFO audit: ugi=zcah    ip=unknown-ip-addr    cmd=get_tables: db=default pat=*    
+--------+--------------------+-----------+
|database|           tableName|isTemporary|
+--------+--------------------+-----------+
| default|clusters_tweet_user2|      false|
+--------+--------------------+-----------+

分析原因:标红处使用的数据库不对,服务器上的hive使用的mysql这里连接不对,代码没有报错,不是代码的问题,应该就是配置文件的问题,这里能连上hdfs但是为了保险起见我还是改了hive.metastore.warehouse.dir,从服务器上下载的hive-site.xml没有后面那一段标后,于是就加上了,问题解决。

参考:
https://blog.csdn.net/agent_x/article/details/79952229

你可能感兴趣的:(大数据,CDH,hive,spark)