第69课:SparkSQL通过Hive数据源实战学习笔记
本期内容:
1 SparkSQL操作Hive解析
2 SparkSQL操作Hive实战
数据源:home/richard/slq/spark/people.txt和/home/richard/slq/spark/peoplescores.txt两个文件。
people.txt的文件内容:
Michael 29
Andy 30
Justin 19
peoplescores.txt文件内容:
Michael 99
Andy 97
Justin 68
注:people.txt和peoplescores.txt的文件内容以Tab键为分隔符。
代码:
package com.dt.spark.SparkApps.sql
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
// 使用Java的方式开发实战对DataFrame的操作
object SparkSQL2Hive {
def main(args: Array[String]): Unit = {
val conf = new SparkConf() //创建SparkConf对象
conf.setAppName("SparkSQL2Hive") //设置应用程序名
conf.setMaster("spark://slq1:7077") //设置集群的Master
val sc = new SparkContext //创建SparkContext对象,
//在目前企业级大数据Spark开发的时候,绝大多数情况下是采用Hive作为数据仓库
//Spark提供了HIve的支持功能,Spark通过HiveContext可以直接操作Hive中的数据
//基于HiveContext我们可以使用sql/hql两种方式才编写SQL语句对Hive进行操作,
//包括创建表、删除表、往表里导入数据 以及用SQL语法构造 各种SQL语句对表中的数据进行CRUD操作
//第二:也可以直接通过saveAsTable的方式把DaraFrame中的数据保存到Hive数据仓库中
//第三:可以直接通过HiveContext.table方法来直接加载Hive中的表而生成DataFrame
val hiveContext = new HiveContext(sc)
hiveContext.sql("use hive")
hiveContext.sql("DROP TABLE IF EXISTS people")
hiveContext.sql("CREATE TABLE IF NOT EXISTS people(name STRING,age INT)")
hiveContext.sql("LOAD DATA LOCAL INPATH '/home/richard/slq/spark/people.txt' INTO TABLE people")
//把本地数据加载到Hive中(背后实际上发生了数据的拷贝)
//当然也可以通过LOAD DATA INPATH去获得HDFS等上面的数据 到Hive(此时发生了数据的移动)
hiveContext.sql("DROP TABLE IF EXISTS peoplescores")
hiveContext.sql("CREATE TABLE IF NOT EXISTS peoplescores(name STRING,score INT)")
hiveContext.sql("LOAD DATA LOCAL INPATH '/home/richard/slq/spark/peoplescores.txt' INTO TABLE peoplescores")
//通过HiveContext使用join直接基于Hive中的两张表进行操作获得大于90分的人的name,age,score
val resultDF = hiveContext.sql("SELECT pi.name,pi.age,ps.score"
+ "FROM people pi JOIN peoplescores ps ON pi.name=ps.name WHERE ps.score > 90")
//通过saveAsTable创建一张Hive Managed Table,数据放在什么地方、元数据都是Hive管理的
//当删除该表时,数据也会一起被删除(磁盘上的数据不再存在)
hiveContext.sql("DROP TABLE IF EXISTS peopleinformationresult")
resultDF.saveAsTable("peopleinformationresult")
//使用HivewContext的Table方法可以直接去读Hive中的Table并生成DaraFrame
//读取的数据就可以进行机器学习、图计算、各种复杂ETL等操作
val dataFrameHive = hiveContext.table("peopleinformationresult")
dataFrameHive.show()
}
}
创建一个shell脚本:
/home/richard/spark-1.6.0/bin/spark-submit --class SparkSQLByScala.SparkSQL2Hive --master spark://slq1:7077 /home/richard/slq/spark/160327/SparkSQL2Hive.jar
通过./SparkAppsScala.sh执行。执行时的log如下:
[richard@slq1 160327]$ ./SparkAppsScala.sh
16/04/09 23:44:11 INFO spark.SparkContext: Running Spark version 1.6.0
16/04/09 23:44:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/09 23:44:17 INFO spark.SecurityManager: Changing view acls to: richard
16/04/09 23:44:17 INFO spark.SecurityManager: Changing modify acls to: richard
16/04/09 23:44:17 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(richard); users with modify permissions: Set(richard)
16/04/09 23:44:25 INFO util.Utils: Successfully started service 'sparkDriver' on port 35849.
16/04/09 23:44:30 INFO slf4j.Slf4jLogger: Slf4jLogger started
16/04/09 23:44:31 INFO Remoting: Starting remoting
16/04/09 23:44:34 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:48550]
16/04/09 23:44:34 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 48550.
16/04/09 23:44:35 INFO spark.SparkEnv: Registering MapOutputTracker
16/04/09 23:44:35 INFO spark.SparkEnv: Registering BlockManagerMaster
16/04/09 23:44:35 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-a5fed17f-9ca9-4356-8433-94184add459c
16/04/09 23:44:35 INFO storage.MemoryStore: MemoryStore started with capacity 517.4 MB
16/04/09 23:44:37 INFO spark.SparkEnv: Registering OutputCommitCoordinator
16/04/09 23:44:40 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/04/09 23:44:40 INFO server.AbstractConnector: Started [email protected]:4040
16/04/09 23:44:40 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
16/04/09 23:44:40 INFO ui.SparkUI: Started SparkUI at http://192.168.1.121:4040
16/04/09 23:44:41 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-03bc88ca-b3b3-410e-b8f7-10f85be695e3/httpd-ea4628a0-346c-4cf9-a951-092ab9f72f2a
16/04/09 23:44:41 INFO spark.HttpServer: Starting HTTP Server
16/04/09 23:44:41 INFO server.Server: jetty-8.y.z-SNAPSHOT
16/04/09 23:44:41 INFO server.AbstractConnector: Started [email protected]:60023
16/04/09 23:44:41 INFO util.Utils: Successfully started service 'HTTP file server' on port 60023.
16/04/09 23:44:41 INFO spark.SparkContext: Added JAR file:/home/richard/slq/spark/160327/SparkSQL2Hive.jar at http://192.168.1.121:60023/jars/SparkSQL2Hive.jar with timestamp 1460216681926
16/04/09 23:44:42 INFO client.AppClient$ClientEndpoint: Connecting to master spark://slq1:7077...
16/04/09 23:44:45 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160409234445-0002
16/04/09 23:44:45 INFO client.AppClient$ClientEndpoint: Executor added: app-20160409234445-0002/0 on worker-20160409171905-192.168.1.122-40769 (192.168.1.122:40769) with 1 cores
16/04/09 23:44:45 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160409234445-0002/0 on hostPort 192.168.1.122:40769 with 1 cores, 1024.0 MB RAM
16/04/09 23:44:45 INFO client.AppClient$ClientEndpoint: Executor added: app-20160409234445-0002/1 on worker-20160409171920-192.168.1.121-58385 (192.168.1.121:58385) with 1 cores
16/04/09 23:44:45 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160409234445-0002/1 on hostPort 192.168.1.121:58385 with 1 cores, 1024.0 MB RAM
16/04/09 23:44:45 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40241.
16/04/09 23:44:45 INFO netty.NettyBlockTransferService: Server created on 40241
16/04/09 23:44:45 INFO client.AppClient$ClientEndpoint: Executor added: app-20160409234445-0002/2 on worker-20160409171905-192.168.1.123-55110 (192.168.1.123:55110) with 1 cores
16/04/09 23:44:45 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20160409234445-0002/2 on hostPort 192.168.1.123:55110 with 1 cores, 1024.0 MB RAM
16/04/09 23:44:46 INFO storage.BlockManagerMaster: Trying to register BlockManager
16/04/09 23:44:46 INFO storage.BlockManagerMasterEndpoint: Registering block manager 192.168.1.121:40241 with 517.4 MB RAM, BlockManagerId(driver, 192.168.1.121, 40241)
16/04/09 23:44:46 INFO storage.BlockManagerMaster: Registered BlockManager
16/04/09 23:44:47 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160409234445-0002/2 is now RUNNING
16/04/09 23:44:47 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160409234445-0002/0 is now RUNNING
16/04/09 23:44:47 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160409234445-0002/1 is now RUNNING
16/04/09 23:44:52 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/04/09 23:45:16 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slq3:44752) with ID 2
16/04/09 23:45:16 INFO storage.BlockManagerMasterEndpoint: Registering block manager slq3:43259 with 517.4 MB RAM, BlockManagerId(2, slq3, 43259)
16/04/09 23:45:19 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slq2:34148) with ID 0
16/04/09 23:45:19 INFO storage.BlockManagerMasterEndpoint: Registering block manager slq2:42107 with 517.4 MB RAM, BlockManagerId(0, slq2, 42107)
16/04/09 23:45:24 INFO hive.HiveContext: Initializing execution hive, version 1.2.1
16/04/09 23:45:25 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/04/09 23:45:25 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/04/09 23:45:33 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
16/04/09 23:45:34 INFO metastore.ObjectStore: ObjectStore, initialize called
16/04/09 23:45:38 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
16/04/09 23:45:38 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
16/04/09 23:45:41 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/09 23:45:49 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
16/04/09 23:45:57 INFO cluster.SparkDeploySchedulerBackend: Registered executor NettyRpcEndpointRef(null) (slq1:42766) with ID 1
16/04/09 23:45:58 INFO storage.BlockManagerMasterEndpoint: Registering block manager slq1:55585 with 517.4 MB RAM, BlockManagerId(1, slq1, 55585)
16/04/09 23:46:15 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/04/09 23:46:29 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/04/09 23:46:29 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/04/09 23:46:44 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/04/09 23:46:44 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/04/09 23:46:47 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY
16/04/09 23:46:47 INFO metastore.ObjectStore: Initialized ObjectStore
16/04/09 23:46:49 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/04/09 23:46:51 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
16/04/09 23:46:54 INFO metastore.HiveMetaStore: Added admin role in metastore
16/04/09 23:46:55 INFO metastore.HiveMetaStore: Added public role in metastore
16/04/09 23:46:56 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
16/04/09 23:46:58 INFO metastore.HiveMetaStore: 0: get_all_databases
16/04/09 23:46:58 INFO HiveMetaStore.audit: ugi=richard ip=unknown-ip-addr cmd=get_all_databases
16/04/09 23:46:58 INFO metastore.HiveMetaStore: 0: get_functions: db=default pat=*
16/04/09 23:46:58 INFO HiveMetaStore.audit: ugi=richard ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/04/09 23:46:58 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
16/04/09 23:47:08 INFO session.SessionState: Created local directory: /tmp/e3ba4cf4-de8c-4270-81b1-ce3ea721f144_resources
16/04/09 23:47:08 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/e3ba4cf4-de8c-4270-81b1-ce3ea721f144
16/04/09 23:47:08 INFO session.SessionState: Created local directory: /tmp/richard/e3ba4cf4-de8c-4270-81b1-ce3ea721f144
16/04/09 23:47:08 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/e3ba4cf4-de8c-4270-81b1-ce3ea721f144/_tmp_space.db
16/04/09 23:47:10 INFO hive.HiveContext: default warehouse location is /user/hive/warehouse
16/04/09 23:47:10 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
16/04/09 23:47:11 INFO client.ClientWrapper: Inspected Hadoop version: 2.6.0
16/04/09 23:47:11 INFO client.ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.6.0
16/04/09 23:47:18 INFO hive.metastore: Trying to connect to metastore with URI thrift://slq1:9083
16/04/09 23:47:19 INFO hive.metastore: Connected to metastore.
16/04/09 23:47:22 INFO session.SessionState: Created local directory: /tmp/4329319b-2c25-4345-862d-7b35b5dc5d6f_resources
16/04/09 23:47:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/4329319b-2c25-4345-862d-7b35b5dc5d6f
16/04/09 23:47:22 INFO session.SessionState: Created local directory: /tmp/richard/4329319b-2c25-4345-862d-7b35b5dc5d6f
16/04/09 23:47:22 INFO session.SessionState: Created HDFS directory: /tmp/hive/richard/4329319b-2c25-4345-862d-7b35b5dc5d6f/_tmp_space.db
16/04/09 23:47:25 INFO parse.ParseDriver: Parsing command: use hive
16/04/09 23:47:35 INFO parse.ParseDriver: Parse Completed
16/04/09 23:47:42 INFO log.PerfLogger:
16/04/09 23:47:42 INFO log.PerfLogger:
16/04/09 23:47:42 INFO log.PerfLogger:
16/04/09 23:47:43 INFO log.PerfLogger:
16/04/09 23:47:43 INFO parse.ParseDriver: Parsing command: use hive
16/04/09 23:47:51 INFO parse.ParseDriver: Parse Completed
16/04/09 23:47:51 INFO log.PerfLogger:
16/04/09 23:47:51 INFO log.PerfLogger:
16/04/09 23:47:52 INFO ql.Driver: Semantic Analysis Completed
16/04/09 23:47:52 INFO log.PerfLogger:
16/04/09 23:47:53 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/09 23:47:53 INFO log.PerfLogger:
16/04/09 23:47:53 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/09 23:47:53 INFO log.PerfLogger:
16/04/09 23:47:53 INFO ql.Driver: Starting command(queryId=richard_20160409234743_dd603e3c-c274-45d7-b647-95dceb1eb8fe): use hive
16/04/09 23:47:53 INFO log.PerfLogger:
16/04/09 23:47:53 INFO log.PerfLogger:
16/04/09 23:47:53 INFO log.PerfLogger:
16/04/09 23:47:53 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
16/04/09 23:47:53 INFO log.PerfLogger:
16/04/09 23:47:53 INFO log.PerfLogger:
16/04/09 23:47:54 INFO ql.Driver: OK
16/04/09 23:47:54 INFO log.PerfLogger:
16/04/09 23:47:54 INFO log.PerfLogger:
16/04/09 23:47:54 INFO log.PerfLogger:
16/04/09 23:47:55 INFO parse.ParseDriver: Parsing command: DROP TABLE IF EXISTS people
16/04/09 23:47:55 INFO parse.ParseDriver: Parse Completed
16/04/09 23:47:57 INFO log.PerfLogger:
16/04/09 23:47:57 INFO log.PerfLogger:
16/04/09 23:47:57 INFO log.PerfLogger:
16/04/09 23:47:57 INFO log.PerfLogger:
16/04/09 23:47:57 INFO parse.ParseDriver: Parsing command: DROP TABLE IF EXISTS people
16/04/09 23:47:57 INFO parse.ParseDriver: Parse Completed
16/04/09 23:47:57 INFO log.PerfLogger:
16/04/09 23:47:57 INFO log.PerfLogger:
16/04/09 23:47:58 INFO ql.Driver: Semantic Analysis Completed
16/04/09 23:47:58 INFO log.PerfLogger:
16/04/09 23:47:58 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/09 23:47:58 INFO log.PerfLogger:
16/04/09 23:47:58 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/09 23:47:58 INFO log.PerfLogger:
16/04/09 23:47:58 INFO ql.Driver: Starting command(queryId=richard_20160409234757_4297553f-8ad7-4df2-8109-cfb6166a140b): DROP TABLE IF EXISTS people
16/04/09 23:47:58 INFO log.PerfLogger:
16/04/09 23:47:58 INFO log.PerfLogger:
16/04/09 23:47:58 INFO log.PerfLogger:
16/04/09 23:47:58 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
16/04/09 23:48:00 INFO log.PerfLogger:
16/04/09 23:48:00 INFO metadata.Hive: Dumping metastore api call timing information for : execution phase
16/04/09 23:48:00 INFO metadata.Hive: Total time spent in this metastore function was greater than 1000ms : dropTable_(String, String, boolean, boolean, boolean, )=1325
16/04/09 23:48:00 INFO log.PerfLogger:
16/04/09 23:48:00 INFO ql.Driver: OK
16/04/09 23:48:00 INFO log.PerfLogger:
16/04/09 23:48:00 INFO log.PerfLogger:
16/04/09 23:48:00 INFO log.PerfLogger:
16/04/09 23:48:01 INFO parse.ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS people(name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
16/04/09 23:48:01 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:01 INFO log.PerfLogger:
16/04/09 23:48:01 INFO log.PerfLogger:
16/04/09 23:48:01 INFO log.PerfLogger:
16/04/09 23:48:01 INFO log.PerfLogger:
16/04/09 23:48:01 INFO parse.ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS people(name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
16/04/09 23:48:01 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:01 INFO log.PerfLogger:
16/04/09 23:48:01 INFO log.PerfLogger:
16/04/09 23:48:01 INFO parse.CalcitePlanner: Starting Semantic Analysis
16/04/09 23:48:02 INFO parse.CalcitePlanner: Creating table hive.people position=27
16/04/09 23:48:02 INFO ql.Driver: Semantic Analysis Completed
16/04/09 23:48:02 INFO log.PerfLogger:
16/04/09 23:48:02 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/09 23:48:02 INFO log.PerfLogger:
16/04/09 23:48:02 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/09 23:48:02 INFO log.PerfLogger:
16/04/09 23:48:02 INFO ql.Driver: Starting command(queryId=richard_20160409234801_1b05e701-b5b9-428d-89bf-9d587b9476f9): CREATE TABLE IF NOT EXISTS people(name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
16/04/09 23:48:02 INFO log.PerfLogger:
16/04/09 23:48:02 INFO log.PerfLogger:
16/04/09 23:48:02 INFO log.PerfLogger:
16/04/09 23:48:02 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO ql.Driver: OK
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO parse.ParseDriver: Parsing command: LOAD DATA LOCAL INPATH '/home/richard/slq/spark/people.txt' INTO TABLE people
16/04/09 23:48:05 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO parse.ParseDriver: Parsing command: LOAD DATA LOCAL INPATH '/home/richard/slq/spark/people.txt' INTO TABLE people
16/04/09 23:48:05 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:05 INFO log.PerfLogger:
16/04/09 23:48:06 INFO ql.Driver: Semantic Analysis Completed
16/04/09 23:48:06 INFO log.PerfLogger:
16/04/09 23:48:06 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/09 23:48:06 INFO log.PerfLogger:
16/04/09 23:48:06 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/09 23:48:06 INFO log.PerfLogger:
16/04/09 23:48:06 INFO ql.Driver: Starting command(queryId=richard_20160409234805_a604d20e-a275-4000-b35e-6944cece8724): LOAD DATA LOCAL INPATH '/home/richard/slq/spark/people.txt' INTO TABLE people
16/04/09 23:48:06 INFO log.PerfLogger:
16/04/09 23:48:06 INFO log.PerfLogger:
16/04/09 23:48:06 INFO log.PerfLogger:
16/04/09 23:48:06 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
16/04/09 23:48:06 INFO exec.Task: Loading data to table hive.people from file:/home/richard/slq/spark/people.txt
16/04/09 23:48:10 INFO metadata.Hive: Renaming src: file:/home/richard/slq/spark/people.txt, dest: hdfs://slq1:9000/user/hive/warehouse/hive.db/people/people.txt, Status:true
16/04/09 23:48:11 INFO log.PerfLogger:
16/04/09 23:48:11 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
16/04/09 23:48:11 INFO exec.StatsTask: Executing stats task
16/04/09 23:48:12 INFO exec.Task: Table hive.people stats: [numFiles=1, totalSize=29]
16/04/09 23:48:12 INFO log.PerfLogger:
16/04/09 23:48:12 INFO metadata.Hive: Dumping metastore api call timing information for : execution phase
16/04/09 23:48:12 INFO metadata.Hive: Total time spent in this metastore function was greater than 1000ms : alter_table_(String, String, Table, boolean, )=1048
16/04/09 23:48:12 INFO log.PerfLogger:
16/04/09 23:48:12 INFO ql.Driver: OK
16/04/09 23:48:12 INFO log.PerfLogger:
16/04/09 23:48:12 INFO log.PerfLogger:
16/04/09 23:48:12 INFO log.PerfLogger:
16/04/09 23:48:12 INFO parse.ParseDriver: Parsing command: DROP TABLE IF EXISTS peoplescores
16/04/09 23:48:12 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO parse.ParseDriver: Parsing command: DROP TABLE IF EXISTS peoplescores
16/04/09 23:48:13 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO ql.Driver: Semantic Analysis Completed
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO ql.Driver: Starting command(queryId=richard_20160409234813_a4049719-c4a9-4b5b-8cee-e0256c5dc075): DROP TABLE IF EXISTS peoplescores
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO log.PerfLogger:
16/04/09 23:48:13 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
16/04/09 23:48:14 INFO log.PerfLogger:
16/04/09 23:48:14 INFO log.PerfLogger:
16/04/09 23:48:14 INFO ql.Driver: OK
16/04/09 23:48:14 INFO log.PerfLogger:
16/04/09 23:48:14 INFO log.PerfLogger:
16/04/09 23:48:14 INFO log.PerfLogger:
16/04/09 23:48:14 INFO parse.ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS peoplescores(name STRING, score INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
16/04/09 23:48:14 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:14 INFO log.PerfLogger:
16/04/09 23:48:14 INFO log.PerfLogger:
16/04/09 23:48:14 INFO log.PerfLogger:
16/04/09 23:48:14 INFO log.PerfLogger:
16/04/09 23:48:15 INFO parse.ParseDriver: Parsing command: CREATE TABLE IF NOT EXISTS peoplescores(name STRING, score INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
16/04/09 23:48:15 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:15 INFO log.PerfLogger:
16/04/09 23:48:15 INFO log.PerfLogger:
16/04/09 23:48:15 INFO parse.CalcitePlanner: Starting Semantic Analysis
16/04/09 23:48:15 INFO parse.CalcitePlanner: Creating table hive.peoplescores position=27
16/04/09 23:48:15 INFO ql.Driver: Semantic Analysis Completed
16/04/09 23:48:15 INFO log.PerfLogger:
16/04/09 23:48:15 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/09 23:48:15 INFO log.PerfLogger:
16/04/09 23:48:15 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/09 23:48:15 INFO log.PerfLogger:
16/04/09 23:48:15 INFO ql.Driver: Starting command(queryId=richard_20160409234814_fb368874-3ec7-4a99-9c79-edcea30096ed): CREATE TABLE IF NOT EXISTS peoplescores(name STRING, score INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
16/04/09 23:48:15 INFO log.PerfLogger:
16/04/09 23:48:15 INFO log.PerfLogger:
16/04/09 23:48:15 INFO log.PerfLogger:
16/04/09 23:48:15 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO ql.Driver: OK
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO parse.ParseDriver: Parsing command: LOAD DATA LOCAL INPATH '/home/richard/slq/spark/peoplescores.txt' INTO TABLE peoplescores
16/04/09 23:48:16 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO parse.ParseDriver: Parsing command: LOAD DATA LOCAL INPATH '/home/richard/slq/spark/peoplescores.txt' INTO TABLE peoplescores
16/04/09 23:48:16 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO ql.Driver: Semantic Analysis Completed
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO ql.Driver: Starting command(queryId=richard_20160409234816_1d279f16-8c1d-4f1e-811a-621729bb4d62): LOAD DATA LOCAL INPATH '/home/richard/slq/spark/peoplescores.txt' INTO TABLE peoplescores
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO log.PerfLogger:
16/04/09 23:48:16 INFO ql.Driver: Starting task [Stage-0:MOVE] in serial mode
16/04/09 23:48:16 INFO exec.Task: Loading data to table hive.peoplescores from file:/home/richard/slq/spark/peoplescores.txt
16/04/09 23:48:17 INFO metadata.Hive: Renaming src: file:/home/richard/slq/spark/peoplescores.txt, dest: hdfs://slq1:9000/user/hive/warehouse/hive.db/peoplescores/peoplescores.txt, Status:true
16/04/09 23:48:18 INFO log.PerfLogger:
16/04/09 23:48:18 INFO ql.Driver: Starting task [Stage-1:STATS] in serial mode
16/04/09 23:48:18 INFO exec.StatsTask: Executing stats task
16/04/09 23:48:19 INFO exec.Task: Table hive.peoplescores stats: [numFiles=1, totalSize=29]
16/04/09 23:48:19 INFO log.PerfLogger:
16/04/09 23:48:19 INFO metadata.Hive: Dumping metastore api call timing information for : execution phase
16/04/09 23:48:19 INFO metadata.Hive: Total time spent in this metastore function was greater than 1000ms : alter_table_(String, String, Table, boolean, )=1202
16/04/09 23:48:19 INFO log.PerfLogger:
16/04/09 23:48:19 INFO ql.Driver: OK
16/04/09 23:48:19 INFO log.PerfLogger:
16/04/09 23:48:19 INFO log.PerfLogger:
16/04/09 23:48:19 INFO log.PerfLogger:
16/04/09 23:48:19 INFO parse.ParseDriver: Parsing command: SELECT pi.name, pi.age, ps.score FROM people pi JOIN peoplescores ps ON pi.name=ps.name WHERE ps.score>90
16/04/09 23:48:19 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:21 INFO parse.ParseDriver: Parsing command: DROP TABLE IF EXISTS peopleinformationresult
16/04/09 23:48:21 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:23 INFO parquet.ParquetRelation: Listing hdfs://slq1:9000/user/hive/warehouse/hive.db/peopleinformationresult on driver
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO parse.ParseDriver: Parsing command: DROP TABLE IF EXISTS peopleinformationresult
16/04/09 23:48:24 INFO parse.ParseDriver: Parse Completed
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO ql.Driver: Semantic Analysis Completed
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO ql.Driver: Starting command(queryId=richard_20160409234824_664b629d-ca06-4498-911a-611f9034c090): DROP TABLE IF EXISTS peopleinformationresult
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO log.PerfLogger:
16/04/09 23:48:24 INFO ql.Driver: Starting task [Stage-0:DDL] in serial mode
16/04/09 23:48:25 INFO log.PerfLogger:
16/04/09 23:48:25 INFO log.PerfLogger:
16/04/09 23:48:25 INFO ql.Driver: OK
16/04/09 23:48:25 INFO log.PerfLogger:
16/04/09 23:48:25 INFO log.PerfLogger:
16/04/09 23:48:25 INFO log.PerfLogger:
16/04/09 23:48:28 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
16/04/09 23:48:30 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 520.3 KB, free 520.3 KB)
16/04/09 23:48:31 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 41.9 KB, free 562.3 KB)
16/04/09 23:48:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.1.121:40241 (size: 41.9 KB, free: 517.4 MB)
16/04/09 23:48:31 INFO spark.SparkContext: Created broadcast 0 from saveAsTable at SparkSQL2Hive.scala:50
16/04/09 23:48:32 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 520.3 KB, free 1082.6 KB)
16/04/09 23:48:32 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 41.9 KB, free 1124.5 KB)
16/04/09 23:48:32 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.1.121:40241 (size: 41.9 KB, free: 517.3 MB)
16/04/09 23:48:32 INFO spark.SparkContext: Created broadcast 1 from saveAsTable at SparkSQL2Hive.scala:50
16/04/09 23:48:35 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/09 23:48:36 INFO spark.SparkContext: Starting job: run at ThreadPoolExecutor.java:1142
16/04/09 23:48:36 INFO scheduler.DAGScheduler: Got job 0 (run at ThreadPoolExecutor.java:1142) with 2 output partitions
16/04/09 23:48:36 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (run at ThreadPoolExecutor.java:1142)
16/04/09 23:48:36 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/04/09 23:48:36 INFO scheduler.DAGScheduler: Missing parents: List()
16/04/09 23:48:37 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[13] at run at ThreadPoolExecutor.java:1142), which has no missing parents
16/04/09 23:48:37 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 11.3 KB, free 1135.8 KB)
16/04/09 23:48:37 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 5.6 KB, free 1141.4 KB)
16/04/09 23:48:37 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.1.121:40241 (size: 5.6 KB, free: 517.3 MB)
16/04/09 23:48:37 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
16/04/09 23:48:37 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[13] at run at ThreadPoolExecutor.java:1142)
16/04/09 23:48:37 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/04/09 23:48:38 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, slq1, partition 0,NODE_LOCAL, 2235 bytes)
16/04/09 23:48:38 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, slq2, partition 1,NODE_LOCAL, 2235 bytes)
16/04/09 23:48:43 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on slq2:42107 (size: 5.6 KB, free: 517.4 MB)
16/04/09 23:48:44 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on slq1:55585 (size: 5.6 KB, free: 517.4 MB)
16/04/09 23:48:52 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slq2:42107 (size: 41.9 KB, free: 517.4 MB)
16/04/09 23:49:03 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on slq1:55585 (size: 41.9 KB, free: 517.4 MB)
16/04/09 23:49:06 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 28108 ms on slq2 (1/2)
16/04/09 23:49:23 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 45654 ms on slq1 (2/2)
16/04/09 23:49:23 INFO scheduler.DAGScheduler: ResultStage 0 (run at ThreadPoolExecutor.java:1142) finished in 45.689 s
16/04/09 23:49:23 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/04/09 23:49:23 INFO scheduler.DAGScheduler: Job 0 finished: run at ThreadPoolExecutor.java:1142, took 47.639028 s
16/04/09 23:49:27 INFO codegen.GenerateUnsafeProjection: Code generated in 3005.850072 ms
16/04/09 23:49:27 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 600.0 B, free 1142.0 KB)
16/04/09 23:49:27 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 196.0 B, free 1142.2 KB)
16/04/09 23:49:27 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 192.168.1.121:40241 (size: 196.0 B, free: 517.3 MB)
16/04/09 23:49:27 INFO spark.SparkContext: Created broadcast 3 from run at ThreadPoolExecutor.java:1142
16/04/09 23:49:29 INFO Configuration.deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
16/04/09 23:49:29 INFO Configuration.deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
16/04/09 23:49:29 INFO Configuration.deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
16/04/09 23:49:29 INFO Configuration.deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
16/04/09 23:49:29 INFO Configuration.deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
16/04/09 23:49:29 INFO parquet.ParquetRelation: Using default output committer for Parquet: org.apache.parquet.hadoop.ParquetOutputCommitter
16/04/09 23:49:29 INFO datasources.DefaultWriterContainer: Using user defined output committer class org.apache.parquet.hadoop.ParquetOutputCommitter
16/04/09 23:49:30 INFO mapred.FileInputFormat: Total input paths to process : 1
16/04/09 23:49:30 INFO spark.SparkContext: Starting job: saveAsTable at SparkSQL2Hive.scala:50
16/04/09 23:49:30 INFO scheduler.DAGScheduler: Got job 1 (saveAsTable at SparkSQL2Hive.scala:50) with 2 output partitions
16/04/09 23:49:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (saveAsTable at SparkSQL2Hive.scala:50)
16/04/09 23:49:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/04/09 23:49:30 INFO scheduler.DAGScheduler: Missing parents: List()
16/04/09 23:49:30 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[19] at saveAsTable at SparkSQL2Hive.scala:50), which has no missing parents
16/04/09 23:49:30 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 76.3 KB, free 1218.5 KB)
16/04/09 23:49:30 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 29.0 KB, free 1247.4 KB)
16/04/09 23:49:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 192.168.1.121:40241 (size: 29.0 KB, free: 517.3 MB)
16/04/09 23:49:30 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006
16/04/09 23:49:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[19] at saveAsTable at SparkSQL2Hive.scala:50)
16/04/09 23:49:30 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
16/04/09 23:49:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, slq1, partition 0,NODE_LOCAL, 2223 bytes)
16/04/09 23:49:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, slq2, partition 1,NODE_LOCAL, 2223 bytes)
16/04/09 23:49:31 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on slq2:42107 (size: 29.0 KB, free: 517.3 MB)
16/04/09 23:49:31 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on slq1:55585 (size: 29.0 KB, free: 517.3 MB)
16/04/09 23:49:31 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slq2:42107 (size: 41.9 KB, free: 517.3 MB)
16/04/09 23:49:32 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on slq2:42107 (size: 196.0 B, free: 517.3 MB)
16/04/09 23:49:32 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on slq1:55585 (size: 41.9 KB, free: 517.3 MB)
16/04/09 23:49:34 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on slq1:55585 (size: 196.0 B, free: 517.3 MB)
16/04/09 23:50:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 29926 ms on slq1 (1/2)
16/04/09 23:50:04 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 33310 ms on slq2 (2/2)
16/04/09 23:50:04 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
16/04/09 23:50:04 INFO scheduler.DAGScheduler: ResultStage 1 (saveAsTable at SparkSQL2Hive.scala:50) finished in 33.317 s
16/04/09 23:50:04 INFO scheduler.DAGScheduler: Job 1 finished: saveAsTable at SparkSQL2Hive.scala:50, took 33.704851 s
16/04/09 23:50:05 INFO hadoop.ParquetFileReader: Initiating action with parallelism: 5
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
16/04/09 23:50:10 INFO datasources.DefaultWriterContainer: Job job_201604092349_0000 committed.
16/04/09 23:50:10 INFO parquet.ParquetRelation: Listing hdfs://slq1:9000/user/hive/warehouse/hive.db/peopleinformationresult on driver
16/04/09 23:50:10 INFO parquet.ParquetRelation: Listing hdfs://slq1:9000/user/hive/warehouse/hive.db/peopleinformationresult on driver
16/04/09 23:50:10 INFO parquet.ParquetRelation: Listing hdfs://slq1:9000/user/hive/warehouse/hive.db/peopleinformationresult on driver
16/04/09 23:50:10 INFO hive.HiveContext$$anon$2: Persisting data source relation `peopleinformationresult` with a single input path into Hive metastore in Hive compatible format. Input path: hdfs://slq1:9000/user/hive/warehouse/hive.db/peopleinformationresult.
16/04/09 23:50:13 INFO parquet.ParquetRelation: Listing hdfs://slq1:9000/user/hive/warehouse/hive.db/peopleinformationresult on driver
16/04/09 23:50:13 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 62.4 KB, free 1309.9 KB)
16/04/09 23:50:14 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 19.7 KB, free 1329.6 KB)
16/04/09 23:50:14 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 192.168.1.121:40241 (size: 19.7 KB, free: 517.3 MB)
16/04/09 23:50:14 INFO spark.SparkContext: Created broadcast 5 from show at SparkSQL2Hive.scala:57
16/04/09 23:50:15 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/04/09 23:50:15 INFO parquet.ParquetRelation: Reading Parquet file(s) from hdfs://slq1:9000/user/hive/warehouse/hive.db/peopleinformationresult/part-r-00000-3ac34634-5b4a-46b3-badf-903fed7a8e1f.gz.parquet, hdfs://slq1:9000/user/hive/warehouse/hive.db/peopleinformationresult/part-r-00001-3ac34634-5b4a-46b3-badf-903fed7a8e1f.gz.parquet
16/04/09 23:50:15 INFO spark.SparkContext: Starting job: show at SparkSQL2Hive.scala:57
16/04/09 23:50:15 INFO scheduler.DAGScheduler: Got job 2 (show at SparkSQL2Hive.scala:57) with 1 output partitions
16/04/09 23:50:15 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (show at SparkSQL2Hive.scala:57)
16/04/09 23:50:15 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/04/09 23:50:15 INFO scheduler.DAGScheduler: Missing parents: List()
16/04/09 23:50:15 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[24] at show at SparkSQL2Hive.scala:57), which has no missing parents
16/04/09 23:50:16 INFO storage.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 5.7 KB, free 1335.2 KB)
16/04/09 23:50:16 INFO storage.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 3.2 KB, free 1338.4 KB)
16/04/09 23:50:16 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on 192.168.1.121:40241 (size: 3.2 KB, free: 517.3 MB)
16/04/09 23:50:16 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1006
16/04/09 23:50:16 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[24] at show at SparkSQL2Hive.scala:57)
16/04/09 23:50:16 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
16/04/09 23:50:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, slq3, partition 0,NODE_LOCAL, 2316 bytes)
16/04/09 23:50:19 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on slq3:43259 (size: 3.2 KB, free: 517.4 MB)
16/04/09 23:50:27 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on slq3:43259 (size: 19.7 KB, free: 517.4 MB)
16/04/09 23:50:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 25922 ms on slq3 (1/1)
16/04/09 23:50:42 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
16/04/09 23:50:42 INFO scheduler.DAGScheduler: ResultStage 2 (show at SparkSQL2Hive.scala:57) finished in 25.928 s
16/04/09 23:50:42 INFO scheduler.DAGScheduler: Job 2 finished: show at SparkSQL2Hive.scala:57, took 26.103248 s
16/04/09 23:50:42 INFO spark.SparkContext: Starting job: show at SparkSQL2Hive.scala:57
16/04/09 23:50:42 INFO scheduler.DAGScheduler: Got job 3 (show at SparkSQL2Hive.scala:57) with 1 output partitions
16/04/09 23:50:42 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (show at SparkSQL2Hive.scala:57)
16/04/09 23:50:42 INFO scheduler.DAGScheduler: Parents of final stage: List()
16/04/09 23:50:42 INFO scheduler.DAGScheduler: Missing parents: List()
16/04/09 23:50:42 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[24] at show at SparkSQL2Hive.scala:57), which has no missing parents
16/04/09 23:50:42 INFO storage.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 5.7 KB, free 1344.1 KB)
16/04/09 23:50:42 INFO storage.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 3.2 KB, free 1347.3 KB)
16/04/09 23:50:42 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on 192.168.1.121:40241 (size: 3.2 KB, free: 517.3 MB)
16/04/09 23:50:42 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1006
16/04/09 23:50:42 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[24] at show at SparkSQL2Hive.scala:57)
16/04/09 23:50:42 INFO scheduler.TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
16/04/09 23:50:42 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 5, slq3, partition 1,NODE_LOCAL, 2314 bytes)
16/04/09 23:50:42 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on slq3:43259 (size: 3.2 KB, free: 517.4 MB)
16/04/09 23:50:42 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 5) in 512 ms on slq3 (1/1)
16/04/09 23:50:42 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
16/04/09 23:50:42 INFO scheduler.DAGScheduler: ResultStage 3 (show at SparkSQL2Hive.scala:57) finished in 0.515 s
16/04/09 23:50:42 INFO scheduler.DAGScheduler: Job 3 finished: show at SparkSQL2Hive.scala:57, took 0.728242 s
+-------+---+-----+
| name|age|score|
+-------+---+-----+
|Michael| 29| 99|
| Andy| 30| 97|
+-------+---+-----+
16/04/09 23:50:43 INFO spark.SparkContext: Invoking stop() from shutdown hook
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static/sql,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/execution/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/execution,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/SQL,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null}
16/04/09 23:50:43 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null}
16/04/09 23:50:43 INFO ui.SparkUI: Stopped Spark web UI at http://192.168.1.121:4040
16/04/09 23:50:43 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
16/04/09 23:50:43 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
16/04/09 23:50:44 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/04/09 23:50:44 INFO storage.MemoryStore: MemoryStore cleared
16/04/09 23:50:44 INFO storage.BlockManager: BlockManager stopped
16/04/09 23:50:44 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
16/04/09 23:50:45 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/04/09 23:50:45 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/04/09 23:50:45 INFO spark.SparkContext: Successfully stopped SparkContext
16/04/09 23:50:45 INFO util.ShutdownHookManager: Shutdown hook called
16/04/09 23:50:45 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-03bc88ca-b3b3-410e-b8f7-10f85be695e3/httpd-ea4628a0-346c-4cf9-a951-092ab9f72f2a
16/04/09 23:50:45 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-c6f6f72f-b64c-4c83-b97f-aab038b6cefb
16/04/09 23:50:45 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/04/09 23:50:45 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-03bc88ca-b3b3-410e-b8f7-10f85be695e3
以上内容是王家林老师DT大数据梦工厂《 IMF传奇行动》第69课的学习笔记。
王家林老师是Spark、Flink、Docker、Android技术中国区布道师。Spark亚太研究院院长和首席专家,DT大数据梦工厂创始人,Android软硬整合源码级专家,英语发音魔术师,健身狂热爱好者。
微信公众账号:DT_Spark
电话:18610086859
QQ:1740415547
微信号:18610086859
新浪微博:ilovepains