spark sql 1.2.0 测试


1:启动shell

master=spark://feng02:7077 ./bin/spark-shell

[jifeng@feng02 spark-1.2.0-bin-2.4.1]$ master=spark://feng02:7077 ./bin/spark-shell
Spark assembly has been built with Hive, including Datanucleus jars on classpath
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/02/27 16:33:58 INFO SecurityManager: Changing view acls to: jifeng
15/02/27 16:33:58 INFO SecurityManager: Changing modify acls to: jifeng
15/02/27 16:33:58 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/02/27 16:33:58 INFO HttpServer: Starting HTTP Server
15/02/27 16:33:58 INFO Utils: Successfully started service 'HTTP class server' on port 34677.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.2.0
      /_/


Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
15/02/27 16:34:05 INFO SecurityManager: Changing view acls to: jifeng
15/02/27 16:34:05 INFO SecurityManager: Changing modify acls to: jifeng
15/02/27 16:34:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/02/27 16:34:06 INFO Slf4jLogger: Slf4jLogger started
15/02/27 16:34:06 INFO Remoting: Starting remoting
15/02/27 16:34:06 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@feng02:50101]
15/02/27 16:34:06 INFO Utils: Successfully started service 'sparkDriver' on port 50101.
15/02/27 16:34:06 INFO SparkEnv: Registering MapOutputTracker
15/02/27 16:34:06 INFO SparkEnv: Registering BlockManagerMaster
15/02/27 16:34:06 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150227163406-af47
15/02/27 16:34:06 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/02/27 16:34:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/27 16:34:07 INFO HttpFileServer: HTTP File server directory is /tmp/spark-87b83dea-f916-4c1d-bd4f-f77cceca4b56
15/02/27 16:34:07 INFO HttpServer: Starting HTTP Server
15/02/27 16:34:07 INFO Utils: Successfully started service 'HTTP file server' on port 45673.
15/02/27 16:34:08 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/02/27 16:34:08 INFO SparkUI: Started SparkUI at http://feng02:4040
15/02/27 16:34:08 INFO Executor: Using REPL class URI: http://10.6.3.201:34677
15/02/27 16:34:08 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@feng02:50101/user/HeartbeatReceiver
15/02/27 16:34:09 INFO NettyBlockTransferService: Server created on 54438
15/02/27 16:34:09 INFO BlockManagerMaster: Trying to register BlockManager
15/02/27 16:34:09 INFO BlockManagerMasterActor: Registering block manager localhost:54438 with 267.3 MB RAM, BlockManagerId(<driver>, localhost, 54438)
15/02/27 16:34:09 INFO BlockManagerMaster: Registered BlockManager
15/02/27 16:34:09 INFO SparkILoop: Created spark context..
Spark context available as sc.


scala>
2:RDD 测试people.txt

// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
// createSchemaRDD is used to implicitly convert an RDD to a SchemaRDD.
import sqlContext.createSchemaRDD

// Define the schema using a case class.
// Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit,
// you can use custom classes that implement the Product interface.
case class Person(name: String, age: Int)

// Create an RDD of Person objects and register it as a table.
val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt))
people.registerTempTable("people")

// SQL statements can be run by using the sql methods provided by sqlContext.
val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")

// The results of SQL queries are SchemaRDDs and support all the normal RDD operations.
// The columns of a row in the result can be accessed by ordinal.
teenagers.map(t => "Name: " + t(0)).collect().foreach(println)

scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@30aceb27

scala> import sqlContext.createSchemaRDD
import sqlContext.createSchemaRDD

scala> case class Person(name: String, age: Int)
defined class Person

scala> val people = sc.textFile("examples/src/main/resources/people.txt").map(_.split(",")).map(p => Person(p(0), p(1).trim.toInt))
15/02/27 16:41:19 INFO MemoryStore: ensureFreeSpace(163705) called with curMem=0, maxMem=280248975
15/02/27 16:41:19 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 159.9 KB, free 267.1 MB)
15/02/27 16:41:19 INFO MemoryStore: ensureFreeSpace(22736) called with curMem=163705, maxMem=280248975
15/02/27 16:41:19 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.2 KB, free 267.1 MB)
15/02/27 16:41:19 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:54438 (size: 22.2 KB, free: 267.2 MB)
15/02/27 16:41:19 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/02/27 16:41:19 INFO SparkContext: Created broadcast 0 from textFile at <console>:17
people: org.apache.spark.rdd.RDD[Person] = MappedRDD[3] at map at <console>:17

scala> people.registerTempTable("people")

scala> val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")
teenagers: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[6] at RDD at SchemaRDD.scala:108
== Query Plan ==
== Physical Plan ==
Project [name#0]
 Filter ((age#1 >= 13) && (age#1 <= 19))
  PhysicalRDD [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at ExistingRDD.scala:36

scala> teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
15/02/27 16:42:24 INFO FileInputFormat: Total input paths to process : 1
15/02/27 16:42:24 INFO SparkContext: Starting job: collect at <console>:18
15/02/27 16:42:24 INFO DAGScheduler: Got job 0 (collect at <console>:18) with 1 output partitions (allowLocal=false)
15/02/27 16:42:24 INFO DAGScheduler: Final stage: Stage 0(collect at <console>:18)
15/02/27 16:42:24 INFO DAGScheduler: Parents of final stage: List()
15/02/27 16:42:24 INFO DAGScheduler: Missing parents: List()
15/02/27 16:42:24 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[7] at map at <console>:18), which has no missing parents
15/02/27 16:42:24 INFO MemoryStore: ensureFreeSpace(6416) called with curMem=186441, maxMem=280248975
15/02/27 16:42:24 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 6.3 KB, free 267.1 MB)
15/02/27 16:42:24 INFO MemoryStore: ensureFreeSpace(4290) called with curMem=192857, maxMem=280248975
15/02/27 16:42:24 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 4.2 KB, free 267.1 MB)
15/02/27 16:42:24 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:54438 (size: 4.2 KB, free: 267.2 MB)
15/02/27 16:42:24 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/02/27 16:42:24 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:838
15/02/27 16:42:24 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[7] at map at <console>:18)
15/02/27 16:42:24 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/02/27 16:42:24 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1349 bytes)
15/02/27 16:42:24 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/02/27 16:42:24 INFO HadoopRDD: Input split: file:/home/jifeng/hadoop/spark-1.2.0-bin-2.4.1/examples/src/main/resources/people.txt:0+32
15/02/27 16:42:24 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/02/27 16:42:24 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/02/27 16:42:24 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/02/27 16:42:24 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/02/27 16:42:24 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/02/27 16:42:24 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1734 bytes result sent to driver
15/02/27 16:42:24 INFO DAGScheduler: Stage 0 (collect at <console>:18) finished in 0.248 s
15/02/27 16:42:24 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 237 ms on localhost (1/1)
15/02/27 16:42:24 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/02/27 16:42:24 INFO DAGScheduler: Job 0 finished: collect at <console>:18, took 0.348078 s
Name: Justin

scala> val teenagers = sqlContext.sql("SELECT name FROM people ")
15/02/27 17:06:45 INFO BlockManager: Removing broadcast 1
15/02/27 17:06:45 INFO BlockManager: Removing block broadcast_1_piece0
15/02/27 17:06:45 INFO MemoryStore: Block broadcast_1_piece0 of size 4290 dropped from memory (free 280056118)
15/02/27 17:06:45 INFO BlockManagerInfo: Removed broadcast_1_piece0 on localhost:54438 in memory (size: 4.2 KB, free: 267.2 MB)
15/02/27 17:06:45 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/02/27 17:06:45 INFO BlockManager: Removing block broadcast_1
15/02/27 17:06:45 INFO MemoryStore: Block broadcast_1 of size 6416 dropped from memory (free 280062534)
15/02/27 17:06:45 INFO ContextCleaner: Cleaned broadcast 1
teenagers: org.apache.spark.sql.SchemaRDD = 
SchemaRDD[10] at RDD at SchemaRDD.scala:108
== Query Plan ==
== Physical Plan ==
Project [name#0]
 PhysicalRDD [name#0,age#1], MapPartitionsRDD[4] at mapPartitions at ExistingRDD.scala:36

scala> teenagers.map(t => "Name: " + t(0)).collect().foreach(println)
15/02/27 17:06:50 INFO SparkContext: Starting job: collect at <console>:18
15/02/27 17:06:50 INFO DAGScheduler: Got job 1 (collect at <console>:18) with 1 output partitions (allowLocal=false)
15/02/27 17:06:50 INFO DAGScheduler: Final stage: Stage 1(collect at <console>:18)
15/02/27 17:06:50 INFO DAGScheduler: Parents of final stage: List()
15/02/27 17:06:50 INFO DAGScheduler: Missing parents: List()
15/02/27 17:06:50 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[11] at map at <console>:18), which has no missing parents
15/02/27 17:06:50 INFO MemoryStore: ensureFreeSpace(5512) called with curMem=186441, maxMem=280248975
15/02/27 17:06:50 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 5.4 KB, free 267.1 MB)
15/02/27 17:06:50 INFO MemoryStore: ensureFreeSpace(3790) called with curMem=191953, maxMem=280248975
15/02/27 17:06:50 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 3.7 KB, free 267.1 MB)
15/02/27 17:06:50 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:54438 (size: 3.7 KB, free: 267.2 MB)
15/02/27 17:06:50 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/02/27 17:06:50 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/02/27 17:06:50 INFO DAGScheduler: Submitting 1 missing tasks from Stage 1 (MappedRDD[11] at map at <console>:18)
15/02/27 17:06:50 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/02/27 17:06:50 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1349 bytes)
15/02/27 17:06:50 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/02/27 17:06:50 INFO HadoopRDD: Input split: file:/home/jifeng/hadoop/spark-1.2.0-bin-2.4.1/examples/src/main/resources/people.txt:0+32
15/02/27 17:06:50 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1763 bytes result sent to driver
15/02/27 17:06:50 INFO DAGScheduler: Stage 1 (collect at <console>:18) finished in 0.018 s
15/02/27 17:06:50 INFO DAGScheduler: Job 1 finished: collect at <console>:18, took 0.032411 s
Name: Michael
Name: Andy
Name: Justin

scala> 15/02/27 17:06:50 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 19 ms on localhost (1/1)
15/02/27 17:06:50 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/02/27 17:36:19 INFO BlockManager: Removing broadcast 2
15/02/27 17:36:19 INFO BlockManager: Removing block broadcast_2
15/02/27 17:36:19 INFO MemoryStore: Block broadcast_2 of size 5512 dropped from memory (free 280058744)
15/02/27 17:36:19 INFO BlockManager: Removing block broadcast_2_piece0
15/02/27 17:36:19 INFO MemoryStore: Block broadcast_2_piece0 of size 3790 dropped from memory (free 280062534)
15/02/27 17:36:19 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:54438 in memory (size: 3.7 KB, free: 267.2 MB)
15/02/27 17:36:19 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/02/27 17:36:19 INFO ContextCleaner: Cleaned broadcast 2


你可能感兴趣的:(sql,spark)