spark向hive中写入数据时,提示找不到已经创建完成的数据库,
org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'traffic' not found;
我的问题出现原因:
没有为spark设置hive的路径
解决方案:
将hive客户端的hive-site.xml放入到spark的conf目录下,spark会自己找到该配置文件,找到hive的所在位置
[root@node4 bin]# ./spark-submit --master spark://node1:7077 --class com.producedate2hive.Data2Hive /root/TrafficProject-1.0-SNAPSHOT-jar-with-dependencies.jar
2019-07-15 09:58:43 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2019-07-15 09:58:44 INFO SparkContext:54 - Running Spark version 2.3.1
2019-07-15 09:58:44 INFO SparkContext:54 - Submitted application: traffic2hive
2019-07-15 09:58:44 INFO SecurityManager:54 - Changing view acls to: root
2019-07-15 09:58:44 INFO SecurityManager:54 - Changing modify acls to: root
2019-07-15 09:58:44 INFO SecurityManager:54 - Changing view acls groups to:
2019-07-15 09:58:44 INFO SecurityManager:54 - Changing modify acls groups to:
2019-07-15 09:58:44 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
2019-07-15 09:58:44 INFO Utils:54 - Successfully started service 'sparkDriver' on port 53087.
2019-07-15 09:58:44 INFO SparkEnv:54 - Registering MapOutputTracker
2019-07-15 09:58:44 INFO SparkEnv:54 - Registering BlockManagerMaster
2019-07-15 09:58:44 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
2019-07-15 09:58:44 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up
2019-07-15 09:58:44 INFO DiskBlockManager:54 - Created local directory at /tmp/blockmgr-af28f89f-6caa-45b3-93e2-e1531b36d850
2019-07-15 09:58:44 INFO MemoryStore:54 - MemoryStore started with capacity 366.3 MB
2019-07-15 09:58:44 INFO SparkEnv:54 - Registering OutputCommitCoordinator
2019-07-15 09:58:44 INFO log:192 - Logging initialized @1921ms
2019-07-15 09:58:44 INFO Server:346 - jetty-9.3.z-SNAPSHOT
2019-07-15 09:58:44 INFO Server:414 - Started @2006ms
2019-07-15 09:58:44 INFO AbstractConnector:278 - Started ServerConnector@53f83f19{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-07-15 09:58:44 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040.
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@71b1a49c{/jobs,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@28f8e165{/jobs/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@545f80bf{/jobs/job,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@22fa55b2{/jobs/job/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4d666b41{/stages,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6594402a{/stages/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@30f4b1a6{/stages/stage,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@79c3f01f{/stages/stage/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@6c2f1700{/stages/pool,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@350b3a17{/stages/pool/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@38600b{/storage,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@669d2b1b{/storage/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@721eb7df{/storage/rdd,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1ea9f009{/storage/rdd/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5d52e3ef{/environment,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5298dead{/environment/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@553f3b6e{/executors,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4c7a078{/executors/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4e406694{/executors/threadDump,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5ab9b447{/executors/threadDump/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@76f10035{/static,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@50b1f030{/,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@4163f1cd{/api,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@23aae55{/jobs/job/kill,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5f574cc2{/stages/stage/kill,null,AVAILABLE,@Spark}
2019-07-15 09:58:44 INFO SparkUI:54 - Bound SparkUI to 0.0.0.0, and started at http://node4:4040
2019-07-15 09:58:45 INFO SparkContext:54 - Added JAR file:/root/TrafficProject-1.0-SNAPSHOT-jar-with-dependencies.jar at spark://node4:53087/jars/TrafficProject-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1563155925020
2019-07-15 09:58:45 INFO StandaloneAppClient$ClientEndpoint:54 - Connecting to master spark://node1:7077...
2019-07-15 09:58:45 INFO TransportClientFactory:267 - Successfully created connection to node1/172.16.220.151:7077 after 32 ms (0 ms spent in bootstraps)
2019-07-15 09:58:45 INFO StandaloneSchedulerBackend:54 - Connected to Spark cluster with app ID app-20190715095845-0001
2019-07-15 09:58:45 INFO Utils:54 - Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35754.
2019-07-15 09:58:45 INFO NettyBlockTransferService:54 - Server created on node4:35754
2019-07-15 09:58:45 INFO BlockManager:54 - Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
2019-07-15 09:58:45 INFO StandaloneAppClient$ClientEndpoint:54 - Executor added: app-20190715095845-0001/0 on worker-20190715095545-172.16.220.153-47385 (172.16.220.153:47385) with 2 core(s)
2019-07-15 09:58:45 INFO StandaloneSchedulerBackend:54 - Granted executor ID app-20190715095845-0001/0 on hostPort 172.16.220.153:47385 with 2 core(s), 1024.0 MB RAM
2019-07-15 09:58:45 INFO StandaloneAppClient$ClientEndpoint:54 - Executor added: app-20190715095845-0001/1 on worker-20190715095545-172.16.220.152-42429 (172.16.220.152:42429) with 2 core(s)
2019-07-15 09:58:45 INFO StandaloneSchedulerBackend:54 - Granted executor ID app-20190715095845-0001/1 on hostPort 172.16.220.152:42429 with 2 core(s), 1024.0 MB RAM
2019-07-15 09:58:45 INFO BlockManagerMaster:54 - Registering BlockManager BlockManagerId(driver, node4, 35754, None)
2019-07-15 09:58:45 INFO BlockManagerMasterEndpoint:54 - Registering block manager node4:35754 with 366.3 MB RAM, BlockManagerId(driver, node4, 35754, None)
2019-07-15 09:58:45 INFO BlockManagerMaster:54 - Registered BlockManager BlockManagerId(driver, node4, 35754, None)
2019-07-15 09:58:45 INFO BlockManager:54 - Initialized BlockManager: BlockManagerId(driver, node4, 35754, None)
2019-07-15 09:58:45 INFO StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190715095845-0001/0 is now RUNNING
2019-07-15 09:58:45 INFO StandaloneAppClient$ClientEndpoint:54 - Executor updated: app-20190715095845-0001/1 is now RUNNING
2019-07-15 09:58:45 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@5eb97ced{/metrics/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:46 INFO EventLoggingListener:54 - Logging events to hdfs://mycluster/spark/log/app-20190715095845-0001.lz4
2019-07-15 09:58:46 INFO StandaloneSchedulerBackend:54 - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
2019-07-15 09:58:46 INFO SharedState:54 - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/opt/spark-2.3.1/bin/spark-warehouse').
2019-07-15 09:58:46 INFO SharedState:54 - Warehouse path is 'file:/opt/spark-2.3.1/bin/spark-warehouse'.
2019-07-15 09:58:46 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@715b886f{/SQL,null,AVAILABLE,@Spark}
2019-07-15 09:58:46 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7fb29ca9{/SQL/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:46 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@73844119{/SQL/execution,null,AVAILABLE,@Spark}
2019-07-15 09:58:46 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@44f24a20{/SQL/execution/json,null,AVAILABLE,@Spark}
2019-07-15 09:58:46 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1687eb01{/static/sql,null,AVAILABLE,@Spark}
2019-07-15 09:58:47 INFO StateStoreCoordinatorRef:54 - Registered StateStoreCoordinator endpoint
2019-07-15 09:58:47 INFO HiveUtils:54 - Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
2019-07-15 09:58:48 INFO HiveMetaStore:589 - 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
2019-07-15 09:58:48 INFO ObjectStore:289 - ObjectStore, initialize called
2019-07-15 09:58:48 INFO Persistence:77 - Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
2019-07-15 09:58:48 INFO Persistence:77 - Property datanucleus.cache.level2 unknown - will be ignored
2019-07-15 09:58:49 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.16.220.153:52220) with ID 0
2019-07-15 09:58:49 INFO ObjectStore:370 - Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
2019-07-15 09:58:50 INFO BlockManagerMasterEndpoint:54 - Registering block manager 172.16.220.153:60391 with 366.3 MB RAM, BlockManagerId(0, 172.16.220.153, 60391, None)
2019-07-15 09:58:50 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Registered executor NettyRpcEndpointRef(spark-client://Executor) (172.16.220.152:46055) with ID 1
2019-07-15 09:58:50 INFO BlockManagerMasterEndpoint:54 - Registering block manager 172.16.220.152:53876 with 366.3 MB RAM, BlockManagerId(1, 172.16.220.152, 53876, None)
2019-07-15 09:58:51 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2019-07-15 09:58:51 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2019-07-15 09:58:51 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
2019-07-15 09:58:51 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
2019-07-15 09:58:52 INFO Query:77 - Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
2019-07-15 09:58:52 INFO MetaStoreDirectSql:139 - Using direct SQL, underlying DB is DERBY
2019-07-15 09:58:52 INFO ObjectStore:272 - Initialized ObjectStore
2019-07-15 09:58:52 INFO HiveMetaStore:663 - Added admin role in metastore
2019-07-15 09:58:52 INFO HiveMetaStore:672 - Added public role in metastore
2019-07-15 09:58:52 INFO HiveMetaStore:712 - No user is added in admin role, since config is empty
2019-07-15 09:58:52 INFO HiveMetaStore:746 - 0: get_all_databases
2019-07-15 09:58:52 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_all_databases
2019-07-15 09:58:52 INFO HiveMetaStore:746 - 0: get_functions: db=default pat=*
2019-07-15 09:58:52 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_functions: db=default pat=*
2019-07-15 09:58:52 INFO Datastore:77 - The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
2019-07-15 09:58:52 INFO SessionState:641 - Created local directory: /tmp/8fd07625-00e6-4c7e-ab25-7f6ce58ff886_resources
2019-07-15 09:58:52 INFO SessionState:641 - Created HDFS directory: /tmp/hive/root/8fd07625-00e6-4c7e-ab25-7f6ce58ff886
2019-07-15 09:58:52 INFO SessionState:641 - Created local directory: /tmp/root/8fd07625-00e6-4c7e-ab25-7f6ce58ff886
2019-07-15 09:58:52 INFO SessionState:641 - Created HDFS directory: /tmp/hive/root/8fd07625-00e6-4c7e-ab25-7f6ce58ff886/_tmp_space.db
2019-07-15 09:58:52 INFO HiveClientImpl:54 - Warehouse location for Hive client (version 1.2.2) is file:/opt/spark-2.3.1/bin/spark-warehouse
2019-07-15 09:58:52 INFO HiveMetaStore:746 - 0: get_database: default
2019-07-15 09:58:52 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_database: default
2019-07-15 09:58:52 INFO HiveMetaStore:746 - 0: get_database: global_temp
2019-07-15 09:58:52 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_database: global_temp
2019-07-15 09:58:52 WARN ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
2019-07-15 09:58:53 INFO HiveMetaStore:746 - 0: get_database: traffic
2019-07-15 09:58:53 INFO audit:371 - ugi=root ip=unknown-ip-addr cmd=get_database: traffic
2019-07-15 09:58:53 WARN ObjectStore:568 - Failed to get database traffic, returning NoSuchObjectException
Exception in thread "main" org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'traffic' not found;
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.org$apache$spark$sql$catalyst$catalog$SessionCatalog$$requireDbExists(SessionCatalog.scala:174)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.setCurrentDatabase(SessionCatalog.scala:256)
at org.apache.spark.sql.execution.command.SetDatabaseCommand.run(databases.scala:59)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:190)
at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3254)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3253)
at org.apache.spark.sql.Dataset.(Dataset.scala:190)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:75)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:641)
at com.producedate2hive.Data2Hive.main(Data2Hive.java:13)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:894)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2019-07-15 09:58:53 INFO SparkContext:54 - Invoking stop() from shutdown hook
2019-07-15 09:58:53 INFO AbstractConnector:318 - Stopped Spark@53f83f19{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
2019-07-15 09:58:53 INFO SparkUI:54 - Stopped Spark web UI at http://node4:4040
2019-07-15 09:58:54 INFO StandaloneSchedulerBackend:54 - Shutting down all executors
2019-07-15 09:58:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - Asking each executor to shut down
2019-07-15 09:58:54 INFO MapOutputTrackerMasterEndpoint:54 - MapOutputTrackerMasterEndpoint stopped!
2019-07-15 09:58:54 INFO MemoryStore:54 - MemoryStore cleared
2019-07-15 09:58:54 INFO BlockManager:54 - BlockManager stopped
2019-07-15 09:58:54 INFO BlockManagerMaster:54 - BlockManagerMaster stopped
2019-07-15 09:58:54 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:54 - OutputCommitCoordinator stopped!
2019-07-15 09:58:54 INFO SparkContext:54 - Successfully stopped SparkContext
2019-07-15 09:58:54 INFO ShutdownHookManager:54 - Shutdown hook called
2019-07-15 09:58:54 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-1634ff5f-c1ba-4ce5-9f04-c2eb82e5aac0
2019-07-15 09:58:54 INFO ShutdownHookManager:54 - Deleting directory /tmp/spark-632c2c0e-2443-438f-8166-4558633eb6de