场景 : 在前端页面提交一个spark-sql,然后通过spark-thrift-server 调用来执行. sql逻辑很简单, 就是使用join关联两表(一个大表90G,一个小表3G)查询,前台界面执行出错.后台抛出java.lang.OutOfMemoryError: GC overhead limit exceeded 异常;
直接使用./spark-sql --master yarn --executor-memory 4G --num-executors 19 启动spark-sql 在后台执行成功,说明sql语句没有问题.
查看后台spark日志,发现抛出java.lang.OutOfMemoryError: GC overhead limit exceeded 异常.详细异常如下:
Exception in thread "HiveServer2-Handler-Pool: Thread-199" 16/10/12 17:52:27 WARN NioEventLoop: Unexpected exception in the selector loop.
16/10/12 17:52:27 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 15
Exception in thread "HiveServer2-Handler-Pool: Thread-167" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
at java.lang.StringCoding.decode(StringCoding.java:193)
at java.lang.String.(String.java:416)
at java.lang.String.(String.java:481)
at org.apache.thrift.protocol.TBinaryProtocol.readStringBody(TBinaryProtocol.java:381)
at org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol.java:374)
at org.apache.hive.service.cli.thrift.TGetTablesReq$TGetTablesReqStandardScheme.read(TGetTablesReq.java:697)
at org.apache.hive.service.cli.thrift.TGetTablesReq$TGetTablesReqStandardScheme.read(TGetTablesReq.java:666)
at org.apache.hive.service.cli.thrift.TGetTablesReq.read(TGetTablesReq.java:569)
at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args$GetTables_argsStandardScheme.read(TCLIService.java:7000)
at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args$GetTables_argsStandardScheme.read(TCLIService.java:6985)
at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args.read(TCLIService.java:6932)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:25)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "HiveServer2-Handler-Pool: Thread-178" 16/10/12 17:52:57 WARN SingleThreadEventExecutor: Unexpected exception from an event executor:
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.thrift.protocol.TBinaryProtocol.readFieldBegin(TBinaryProtocol.java:245)
at org.apache.hive.service.cli.thrift.THandleIdentifier$THandleIdentifierStandardScheme.read(THandleIdentifier.java:430)
at org.apache.hive.service.cli.thrift.THandleIdentifier$THandleIdentifierStandardScheme.read(THandleIdentifier.java:423)
at org.apache.hive.service.cli.thrift.THandleIdentifier.read(THandleIdentifier.java:357)
at org.apache.hive.service.cli.thrift.TSessionHandle$TSessionHandleStandardScheme.read(TSessionHandle.java:336)
at org.apache.hive.service.cli.thrift.TSessionHandle$TSessionHandleStandardScheme.read(TSessionHandle.java:321)
at org.apache.hive.service.cli.thrift.TSessionHandle.read(TSessionHandle.java:264)
at org.apache.hive.service.cli.thrift.TGetTablesReq$TGetTablesReqStandardScheme.read(TGetTablesReq.java:681)
at org.apache.hive.service.cli.thrift.TGetTablesReq$TGetTablesReqStandardScheme.read(TGetTablesReq.java:666)
at org.apache.hive.service.cli.thrift.TGetTablesReq.read(TGetTablesReq.java:569)
at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args$GetTables_argsStandardScheme.read(TCLIService.java:7000)
at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args$GetTables_argsStandardScheme.read(TCLIService.java:6985)
at org.apache.hive.service.cli.thrift.TCLIService$GetTables_args.read(TCLIService.java:6932)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:25)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/10/12 17:53:22 INFO HiveMetaStore: 26: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 8: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 15: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 18: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 28: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 WARN HeartbeatReceiver: Removing executor 15 with no recent heartbeats: 148218 ms exceeds timeout 120000 ms
16/10/12 17:53:22 ERROR YarnScheduler: Lost executor 15 on node18.it.leap.com: Executor heartbeat timed out after 148218 ms
16/10/12 17:53:22 INFO DAGScheduler: Executor lost: 15 (epoch 4)
16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Trying to remove executor 15 from BlockManagerMaster.
16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(15, node18.it.leap.com, 62536)
16/10/12 17:53:22 WARN HeartbeatReceiver: Removing executor 16 with no recent heartbeats: 147212 ms exceeds timeout 120000 ms
16/10/12 17:53:22 ERROR YarnScheduler: Lost executor 16 on node18.it.leap.com: Executor heartbeat timed out after 147212 ms
java.lang.OutOfMemoryError: GC overhead limit exceeded
16/10/12 17:53:22 INFO HiveMetaStore: 4: get_databases: *
16/10/12 17:53:22 WARN NioEventLoop: Unexpected exception in the selector loop.
java.lang.OutOfMemoryError: GC overhead limit exceeded
16/10/12 17:53:22 INFO HiveMetaStore: 16: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 3: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 29: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 29: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/10/12 17:53:22 INFO ObjectStore: ObjectStore, initialize called
16/10/12 17:53:22 INFO BlockManagerMaster: Removed 15 successfully in removeExecutor
16/10/12 17:53:22 INFO DAGScheduler: Executor lost: 16 (epoch 4)
16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Trying to remove executor 16 from BlockManagerMaster.
16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(16, node18.it.leap.com, 41766)
16/10/12 17:53:22 INFO BlockManagerMaster: Removed 16 successfully in removeExecutor
16/10/12 17:53:22 INFO HiveMetaStore: 1: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 10: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 17: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 21: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 14: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 19: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 27: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 20: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO DAGScheduler: Host added was in lost list earlier: node18.it.leap.com
Exception in thread "broadcast-exchange-0" 16/10/12 17:53:22 ERROR TThreadPoolServer: ExecutorService threw error: java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
16/10/12 17:53:22 ERROR ThriftCLIService: Error starting HiveServer2: could not start ThriftBinaryCLIService
java.lang.OutOfMemoryError: GC overhead limit exceeded
Exception in thread "HiveServer2-Handler-Pool: Thread-159" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringBuilder.toString(StringBuilder.java:405)
at javax.security.sasl.Sasl.createSaslServer(Sasl.java:499)
at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:140)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271)
at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Exception in thread "HiveServer2-Handler-Pool: Thread-149" java.lang.OutOfMemoryError: GC overhead limit exceeded
16/10/12 17:53:22 INFO HiveMetaStore: 26: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 26: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveServer2: Shutting down HiveServer2
16/10/12 17:53:22 INFO HiveMetaStore: 20: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 10: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 10: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO ServerConnector: Stopped ServerConnector@624d190f{HTTP/1.1}{0.0.0.0:4040}
16/10/12 17:53:22 INFO HiveMetaStore: 15: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 4: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 16: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 18: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Registering block manager node18.it.leap.com:62536 with 408.9 MB RAM, BlockManagerId(15, node18.it.leap.com, 62536)
16/10/12 17:53:22 INFO HiveMetaStore: 4: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 16: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 18: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5ae5d5ca{/stages/stage/kill,null,UNAVAILABLE}
16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@cb37961{/api,null,UNAVAILABLE}
16/10/12 17:53:22 INFO HiveMetaStore: 4: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO HiveMetaStore: 16: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO HiveMetaStore: 18: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO BlockManagerMasterEndpoint: Registering block manager node18.it.leap.com:41766 with 408.9 MB RAM, BlockManagerId(16, node18.it.leap.com, 41766)
16/10/12 17:53:22 INFO HiveMetaStore: 28: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 21: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 3: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO ThriftCLIService: Thrift server has stopped
16/10/12 17:53:22 INFO AbstractService: Service:ThriftBinaryCLIService is stopped.
16/10/12 17:53:22 INFO HiveMetaStore: 17: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 26: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO HiveMetaStore: 27: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO HiveMetaStore: 20: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 1: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 14: get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 19: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO HiveMetaStore: 17: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO AbstractService: Service:OperationManager is stopped.
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:22 INFO HiveMetaStore: 10: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO Query: Reading in results for query "org.datanucleus.store.rdbms.query.SQLQuery@0" since the connection used is closing
16/10/12 17:53:22 INFO HiveMetaStore: 3: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 21: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 28: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@61ad39{/,null,UNAVAILABLE}
16/10/12 17:53:22 INFO HiveMetaStore: 8: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/10/12 17:53:22 INFO HiveMetaStore: 17: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO HiveMetaStore: 21: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO HiveMetaStore: 15: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 28: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO HiveMetaStore: 20: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO AbstractService: Service:SessionManager is stopped.
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO HiveMetaStore: 19: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 14: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 15: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO HiveMetaStore: 3: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO HiveMetaStore: 19: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO HiveMetaStore: 14: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO HiveMetaStore: 27: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO ObjectStore: Initialized ObjectStore
16/10/12 17:53:22 INFO HiveMetaStore: 1: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO HiveMetaStore: 1: get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:22 INFO HiveMetaStore: 29: get_databases: *
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_databases: *
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO HiveMetaStore: 29: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@644b20f8{/static,null,UNAVAILABLE}
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO HiveMetaStore: 8: get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:22 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:22 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:23 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_tables: db=default pat=PROBABLYNOT
16/10/12 17:53:22 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1355c642{/executors/threadDump/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7aa3bb2d{/executors/threadDump,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ExecutorAllocationManager: Removing executor 15 because it has been idle for 60 seconds (new desired total will be 1)
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@34659b02{/executors/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@22c11d39{/executors,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@358aaa85{/environment/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6f0aea07{/environment,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5163cc8f{/storage/rdd/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7c8c17a2{/storage/rdd,null,UNAVAILABLE}
16/10/12 17:53:23 INFO HiveMetaStore: 29: get_multi_table : db=default tbls=
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@14bed4e9{/storage/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@753e481{/storage,null,UNAVAILABLE}
16/10/12 17:53:23 INFO HiveMetaStore: 8: get_multi_table : db=default tbls=
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@738e5c64{/stages/pool/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO HiveMetaStore: 27: get_multi_table : db=default tbls=
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@7b29c61f{/stages/pool,null,UNAVAILABLE}
16/10/12 17:53:23 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 15
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@61ae508a{/stages/stage/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO YarnClientSchedulerBackend: Requesting to kill executor(s) 16
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@4371c2d5{/stages/stage,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2a2a5dc3{/stages/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@2943ebbf{/stages,null,UNAVAILABLE}
16/10/12 17:53:23 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@1b7b0401{/jobs/job/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@44c076b{/jobs/job,null,UNAVAILABLE}
16/10/12 17:53:23 INFO audit: ugi=hive ip=unknown-ip-addr cmd=get_multi_table : db=default tbls=
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@760dcdf2{/jobs/json,null,UNAVAILABLE}
16/10/12 17:53:23 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@725819eb{/jobs,null,UNAVAILABLE}
16/10/12 17:53:23 INFO SparkUI: Stopped Spark web UI at http://10.120.193.4:4040
16/10/12 17:53:23 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:23 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:23 INFO ThriftCLIService: Session disconnected without closing properly, close it now
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003023,BlockManagerId(15, node18.it.leap.com, 62536),428762726)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003026,BlockManagerId(16, node18.it.leap.com, 41766),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003032,BlockManagerId(15, node18.it.leap.com, 62536),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003035,BlockManagerId(16, node18.it.leap.com, 41766),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003040,BlockManagerId(15, node18.it.leap.com, 62536),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003044,BlockManagerId(16, node18.it.leap.com, 41766),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003048,BlockManagerId(15, node18.it.leap.com, 62536),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003052,BlockManagerId(16, node18.it.leap.com, 41766),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003056,BlockManagerId(15, node18.it.leap.com, 62536),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003060,BlockManagerId(16, node18.it.leap.com, 41766),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003064,BlockManagerId(15, node18.it.leap.com, 62536),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003068,BlockManagerId(16, node18.it.leap.com, 41766),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003072,BlockManagerId(15, node18.it.leap.com, 62536),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:62536 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(15, node18.it.leap.com, 62536),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockManagerAdded(1476266003077,BlockManagerId(16, node18.it.leap.com, 41766),428762726)
16/10/12 17:53:23 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on node18.it.leap.com:41766 (size: 27.0 KB, free: 408.9 MB)
16/10/12 17:53:23 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(16, node18.it.leap.com, 41766),broadcast_10_piece0,StorageLevel(memory, 1 replicas),27695,0))
16/10/12 17:53:23 ERROR SparkExecuteStatementOperation: Error executing query, currentState CLOSED,
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *LocalLimit 1000
+- *HashAggregate(keys=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654], functions=[], output=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654])
+- Exchange hashpartitioning(PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654, 200)
+- *HashAggregate(keys=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654], functions=[], output=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654])
+- *Project [CASE WHEN (substring(ZMAT#982, 1, 10) = 0000000000) THEN substring(ZMAT#982, 11, 8) WHEN (substring(ZMAT#982, 1, 9) = 000000000) THEN substring(ZMAT#982, 10, 9) WHEN (substring(ZMAT#982, 1, 5) = 00000) THEN substring(ZMAT#982, 6, 7) ELSE ZMAT#982 END AS PART_NUMBER#649, ZPRODH_D#1050 AS PRODUCT_LOB#650, ZBRAND_D#1040 AS PRODUCT_BRAND#651, ZSERIES_D#1044 AS PRODUCT_FAMILY#652, ZSUBSER_D#1046 AS PRODUCT_SUB_FAMILY#653, ZMAT#982 AS PRODUCT_CD#654]
+- *BroadcastHashJoin [ZMAT#982], [MATERIAL#1063], Inner, BuildLeft
:- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
: +- *Project [zmat#982, zbrand_d#1040, zseries_d#1044, zsubser_d#1046, zprodh_d#1050]
: +- *Filter isnotnull(ZMAT#982)
: +- *BatchedScan parquet idl_bw.zoh_mds32_idl_p[zmat#982,zbrand_d#1040,zseries_d#1044,zsubser_d#1046,zprodh_d#1050] Format: ParquetFormat, InputPaths: hdfs://node4.it.leap.com:8020/apps/hive/warehouse/idl_bw.db/zoh_mds32_idl_p, PushedFilters: [IsNotNull(zmat)], ReadSchema: struct
+- *Project [material#1063]
+- *Filter (SALESORG#1064 IN (AR10,BR10,CA10,CO10,MX10,US10) && isnotnull(MATERIAL#1063))
+- *BatchedScan parquet idl_bw.zoh_mms05_idl_p[material#1063,salesorg#1064] Format: ParquetFormat, InputPaths: hdfs://node4.it.leap.com:8020/apps/hive/warehouse/idl_bw.db/zoh_mms05_idl_p, PushedFilters: [In(salesorg, [AR10,BR10,CA10,CO10,MX10,US10], IsNotNull(material)], ReadSchema: struct
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:113)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:233)
at org.apache.spark.sql.execution.BaseLimitExec$class.inputRDDs(limit.scala:63)
at org.apache.spark.sql.execution.GlobalLimitExec.inputRDDs(limit.scala:103)
at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:36)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:361)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:237)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:142)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:313)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:94)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
at org.apache.spark.sql.Dataset.(Dataset.scala:186)
at org.apache.spark.sql.Dataset.(Dataset.scala:167)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:582)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:682)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:213)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:157)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:167)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange hashpartitioning(PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654, 200)
+- *HashAggregate(keys=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654], functions=[], output=[PART_NUMBER#649, PRODUCT_LOB#650, PRODUCT_BRAND#651, PRODUCT_FAMILY#652, PRODUCT_SUB_FAMILY#653, PRODUCT_CD#654])
+- *Project [CASE WHEN (substring(ZMAT#982, 1, 10) = 0000000000) THEN substring(ZMAT#982, 11, 8) WHEN (substring(ZMAT#982, 1, 9) = 000000000) THEN substring(ZMAT#982, 10, 9) WHEN (substring(ZMAT#982, 1, 5) = 00000) THEN substring(ZMAT#982, 6, 7) ELSE ZMAT#982 END AS PART_NUMBER#649, ZPRODH_D#1050 AS PRODUCT_LOB#650, ZBRAND_D#1040 AS PRODUCT_BRAND#651, ZSERIES_D#1044 AS PRODUCT_FAMILY#652, ZSUBSER_D#1046 AS PRODUCT_SUB_FAMILY#653, ZMAT#982 AS PRODUCT_CD#654]
+- *BroadcastHashJoin [ZMAT#982], [MATERIAL#1063], Inner, BuildLeft
:- BroadcastExchange HashedRelationBroadcastMode(List(input[0, string, true]))
: +- *Project [zmat#982, zbrand_d#1040, zseries_d#1044, zsubser_d#1046, zprodh_d#1050]
: +- *Filter isnotnull(ZMAT#982)
: +- *BatchedScan parquet idl_bw.zoh_mds32_idl_p[zmat#982,zbrand_d#1040,zseries_d#1044,zsubser_d#1046,zprodh_d#1050] Format: ParquetFormat, InputPaths: hdfs://node4.it.leap.com:8020/apps/hive/warehouse/idl_bw.db/zoh_mds32_idl_p, PushedFilters: [IsNotNull(zmat)], ReadSchema: struct
+- *Project [material#1063]
+- *Filter (SALESORG#1064 IN (AR10,BR10,CA10,CO10,MX10,US10) && isnotnull(MATERIAL#1063))
+- *BatchedScan parquet idl_bw.zoh_mms05_idl_p[material#1063,salesorg#1064] Format: ParquetFormat, InputPaths: hdfs://node4.it.leap.com:8020/apps/hive/warehouse/idl_bw.db/zoh_mms05_idl_p, PushedFilters: [In(salesorg, [AR10,BR10,CA10,CO10,MX10,US10], IsNotNull(material)], ReadSchema: struct
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
at org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:113)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:233)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.inputRDDs(HashAggregateExec.scala:138)
at org.apache.spark.sql.execution.BaseLimitExec$class.inputRDDs(limit.scala:63)
at org.apache.spark.sql.execution.LocalLimitExec.inputRDDs(limit.scala:96)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:361)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:86)
at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:122)
at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:113)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
... 58 more
Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1038)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190)
at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:120)
at org.apache.spark.sql.execution.InputAdapter.doExecuteBroadcast(WholeStageCodegenExec.scala:229)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeBroadcast$1.apply(SparkPlan.scala:125)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.executeBroadcast(SparkPlan.scala:124)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.prepareBroadcast(BroadcastHashJoinExec.scala:98)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.codegenInner(BroadcastHashJoinExec.scala:197)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doConsume(BroadcastHashJoinExec.scala:82)
at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:30)
at org.apache.spark.sql.execution.ProjectExec.doConsume(basicPhysicalOperators.scala:62)
at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at org.apache.spark.sql.execution.FilterExec.consume(basicPhysicalOperators.scala:79)
at org.apache.spark.sql.execution.FilterExec.doConsume(basicPhysicalOperators.scala:194)
at org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:153)
at org.apache.spark.sql.execution.BatchedDataSourceScanExec.consume(ExistingRDD.scala:225)
at org.apache.spark.sql.execution.BatchedDataSourceScanExec.doProduce(ExistingRDD.scala:328)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.BatchedDataSourceScanExec.produce(ExistingRDD.scala:225)
at org.apache.spark.sql.execution.FilterExec.doProduce(basicPhysicalOperators.scala:113)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.FilterExec.produce(basicPhysicalOperators.scala:79)
at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:30)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.doProduce(BroadcastHashJoinExec.scala:77)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.joins.BroadcastHashJoinExec.produce(BroadcastHashJoinExec.scala:38)
at org.apache.spark.sql.execution.ProjectExec.doProduce(basicPhysicalOperators.scala:40)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.ProjectExec.produce(basicPhysicalOperators.scala:30)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:526)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:145)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:83)
at org.apache.spark.sql.execution.CodegenSupport$$anonfun$produce$1.apply(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.CodegenSupport$class.produce(WholeStageCodegenExec.scala:78)
at org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:37)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:309)
at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:347)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at org.apache.spark.sql.execution.exchange.ShuffleExchange.prepareShuffleDependency(ShuffleExchange.scala:86)
at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:122)
at org.apache.spark.sql.execution.exchange.ShuffleExchange$$anonfun$doExecute$1.apply(ShuffleExchange.scala:113)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:49)
... 80 more
16/10/12 17:53:23 ERROR SparkExecuteStatementOperation: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Illegal Operation state transition from CLOSED to ERROR
at org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:92)
at org.apache.hive.service.cli.OperationState.validateTransition(OperationState.java:98)
at org.apache.hive.service.cli.operation.Operation.setState(Operation.java:126)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(SparkExecuteStatementOperation.scala:245)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:157)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1$$anon$2.run(SparkExecuteStatementOperation.scala:154)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$1.run(SparkExecuteStatementOperation.scala:167)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
16/10/12 17:53:23 INFO AbstractService: Service:CLIService is stopped.
16/10/12 17:53:23 INFO AbstractService: Service:HiveServer2 is stopped.
首先去查看spark-env.sh 文件修改下列参数:
SPARK_EXECUTOR_CORES="4"
SPARK_EXECUTOR_MEMORY="4G"
SPARK_DRIVER_MEMORY="20G"
执行测试sql,发现还是一样的异常.
查看spark-thrift-sparkconf.conf 文件发现executor是动态的.如下配置
spark.dynamicAllocation.enabled true
spark.dynamicAllocation.initialExecutors 0
spark.dynamicAllocation.maxExecutors 200
spark.dynamicAllocation.minExecutors 0
将动态的关闭,使用静态的配置测试, 以下配置是静态的
首先将上述动态配置删除,加上下列静态配置
spark.executor.memory 10G
spark.executor.instances 20
spark.executor.cores 2
spark.shuffle.service.enabled false
然后重新启动spark-thrift-server , /sbin/start-thriftserver.sh --properties-file ../conf/spark-thrift-sparkconf.conf
使用beeline -u "数据库连接串" 连接thriftserver ,测试sql , 结果还是失败,同样的异常
最后通过跟踪表数据操作流向,发现此sql语句在做join的时候,有很多笛卡尔积运算,并且在运算时会把小表broadcast到每台worker上,造成OOM GC异常。
解决方法: 在spark-thrift-sparkconf.conf 文件中增加下述配置:
spark.sql.autoBroadcastJoinThreshold =-1
重启服务,测试通过.