HBase实战(6):使用Spark 2.2.1 直接操作HBASE 1.2.0数据库
之前对于Hbase系统已实验成功的内容:
Hbase分布式集群搭建:点击打开链接
本次大数据实验室的内容:
5.直接使用Spark 2.2.1 操作HBase 1.2.0的数据。
编写测试代码:
package HbaseTest.sparkconnectHbase;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.*;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableInputFormat;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.VoidFunction;
import org.apache.spark.rdd.RDD;
import org.apache.spark.sql.SparkSession;
import scala.Function1;
import scala.Tuple2;
import scala.collection.Iterator;
import scala.runtime.BoxedUnit;
/***
* 使用Spark 2.2.1 直接连接 Hbase 1.2.0 数据库。
* */
public class SparkConnectHbaseTest {
public static void main(String[] args) {
Configuration confhbase = HBaseConfiguration.create();
confhbase.set("hbase.zookeeper.property.clientPort", "2181");
confhbase.set("hbase.zookeeper.quorum", "192.168.189.1,192.168.189.2,192.168.189.3");
confhbase.set("hbase.master", "192.168.189.1:60000");
confhbase.set(TableInputFormat.INPUT_TABLE, "db_res:wtb_ow_operation");
SparkConf conf = new SparkConf().
setAppName("Spark_Connect_Hbase_Test");
JavaSparkContext sc = new JavaSparkContext(conf);
JavaPairRDD resultRDD = sc.newAPIHadoopRDD(confhbase, TableInputFormat.class, ImmutableBytesWritable.class, Result.class);
long count = resultRDD.count();
System.out.print("************SPARK from hbase count *************** " + count + " ");
resultRDD.foreach(new VoidFunction>() {
@Override
public void call(Tuple2 v1) throws Exception {
String key = Bytes.toString(v1._2().getRow());
String operate_begin_time = Bytes.toString(v1._2().getValue(Bytes.toBytes("info"), Bytes.toBytes("operate_begin_time")));
System.out.print("==================spark from hbase record=========== : " + key + " " + operate_begin_time);
}
});
while (true) {
}
}
}
pom文件:
4.0.0
noc_hbase_test
noc_hbase_test
1.0-SNAPSHOT
2.11.8
2.2.1
2.8.2
1.2.14
9.2.5.v20141112
2.17
1.8
1.2.0
scala-tools.org
Scala-Tools Maven2 Repository
http://scala-tools.org/repo-releases
scala-tools.org
Scala-Tools Maven2 Repository
http://scala-tools.org/repo-releases
org.scala-lang
scala-library
${scala.version}
org.scala-lang
scala-compiler
${scala.version}
org.scala-lang
scala-reflect
${scala.version}
org.scala-lang
scalap
${scala.version}
org.apache.spark
spark-core_2.10
${spark.version}
org.apache.spark
spark-launcher_2.10
${spark.version}
org.apache.spark
spark-network-shuffle_2.10
${spark.version}
org.apache.spark
spark-sql_2.10
${spark.version}
org.apache.spark
spark-hive_2.10
${spark.version}
org.apache.spark
spark-catalyst_2.10
${spark.version}
org.apache.spark
spark-repl_2.10
${spark.version}
org.apache.hive
hive-jdbc
1.2.1
org.apache.hbase
hbase-client
${hbase.version}
org.slf4j
slf4j-log4j12
org.apache.hbase
hbase-common
${hbase.version}
org.slf4j
slf4j-log4j12
org.apache.hbase
hbase-server
${hbase.version}
org.slf4j
slf4j-log4j12
org.apache.hadoop
hadoop-common
2.6.0
org.apache.hadoop
hadoop-client
2.6.0
org.apache.hadoop
hadoop-hdfs
2.6.0
maven-assembly-plugin
dist
true
jar-with-dependencies
make-assembly
package
single
maven-compiler-plugin
1.7
net.alchim31.maven
scala-maven-plugin
3.2.2
scala-compile-first
process-resources
compile
${scala.version}
incremental
true
-unchecked
-deprecation
-feature
-Xms1024m
-Xmx1024m
-source
${java.version}
-target
${java.version}
-Xlint:all,-serial,-path
org.antlr
antlr4-maven-plugin
4.3
antlr
antlr4
none
src/test/java
true
true
在spark集群中提交运行:
root@master:~# spark-submit --name noc_hbase_test --class HbaseTest.sparkconnectHbase.SparkConnectHbaseTest --master spark://master:7077 --jars /usr/local/apache-hive-1.2.1/lib/mysql-connector-java-5.1.13-bin.jar,/usr/local/apache-hive-1.2.1/lib/hive-hbase-handler-1.2.1.jar,/usr/local/hbase-1.2.0/lib/hbase-client-1.2.0.jar,/usr/local/hbase-1.2.0/lib/hbase-common-1.2.0.jar,/usr/local/hbase-1.2.0/lib/hbase-protocol-1.2.0.jar,/usr/local/hbase-1.2.0/lib/hbase-server-1.2.0.jar,/usr/local/hbase-1.2.0/lib/htrace-core-3.1.0-incubating.jar,/usr/local/hbase-1.2.0/lib/metrics-core-2.2.0.jar,/usr/local/hbase-1.2.0/lib/hbase-hadoop2-compat-1.2.0.jar,/usr/local/hbase-1.2.0/lib/guava-12.0.1.jar,/usr/local/hbase-1.2.0/lib/protobuf-java-2.5.0.jar --executor-memory 512m --total-executor-cores 2 /usr/local/setup_tools/noc_hbase_test.jar
spark运行成功,运行结果如下:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/alluxio-1.7.0-hadoop-2.6/client/alluxio-1.7.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/spark-2.2.1-bin-hadoop2.6/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/06/15 14:52:57 INFO spark.SparkContext: Running Spark version 2.2.1
18/06/15 14:52:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/06/15 14:52:58 INFO spark.SparkContext: Submitted application: Spark_Connect_Hbase_Test
18/06/15 14:52:58 INFO spark.SecurityManager: Changing view acls to: root
18/06/15 14:52:58 INFO spark.SecurityManager: Changing modify acls to: root
18/06/15 14:52:58 INFO spark.SecurityManager: Changing view acls groups to:
18/06/15 14:52:58 INFO spark.SecurityManager: Changing modify acls groups to:
18/06/15 14:52:58 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
18/06/15 14:52:58 INFO util.Utils: Successfully started service 'sparkDriver' on port 46964.
18/06/15 14:52:58 INFO spark.SparkEnv: Registering MapOutputTracker
18/06/15 14:52:58 INFO spark.SparkEnv: Registering BlockManagerMaster
18/06/15 14:52:58 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/06/15 14:52:58 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/06/15 14:52:58 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-4ac96a51-bf1d-4c35-b9d7-53e481274c63
18/06/15 14:52:58 INFO memory.MemoryStore: MemoryStore started with capacity 413.9 MB
18/06/15 14:52:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator
18/06/15 14:52:59 INFO util.log: Logging initialized @2617ms
18/06/15 14:52:59 INFO server.Server: jetty-9.3.z-SNAPSHOT
18/06/15 14:52:59 INFO server.Server: Started @2799ms
18/06/15 14:52:59 INFO server.AbstractConnector: Started ServerConnector@2ca308df{HTTP/1.1,[http/1.1]}{0.0.0.0:4040}
18/06/15 14:52:59 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@70e0accd{/jobs,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@65f87a2c{/jobs/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6ce1f601{/jobs/job,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@d816dde{/jobs/job/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6c451c9c{/stages,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@372b0d86{/stages/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3113a37{/stages/stage,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@20312893{/stages/stage/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@c41709a{/stages/pool,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@54ec8cc9{/stages/pool/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5528a42c{/storage,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1a6f5124{/storage/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@ec2bf82{/storage/rdd,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6cc0bcf6{/storage/rdd/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@32f61a31{/environment,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@669253b7{/environment/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@51a06cbe{/executors,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@49a64d82{/executors/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@66d23e4a{/executors/threadDump,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4d9d1b69{/executors/threadDump/json,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@251f7d26{/static,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37d3d232{/,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@581d969c{/api,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5851bd4f{/jobs/job/kill,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f40a43{/stages/stage/kill,null,AVAILABLE,@Spark}
18/06/15 14:52:59 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://master:4040
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/apache-hive-1.2.1/lib/mysql-connector-java-5.1.13-bin.jar at spark://master:46964/jars/mysql-connector-java-5.1.13-bin.jar with timestamp 1529045579564
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/apache-hive-1.2.1/lib/hive-hbase-handler-1.2.1.jar at spark://master:46964/jars/hive-hbase-handler-1.2.1.jar with timestamp 1529045579571
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/hbase-1.2.0/lib/hbase-client-1.2.0.jar at spark://master:46964/jars/hbase-client-1.2.0.jar with timestamp 1529045579572
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/hbase-1.2.0/lib/hbase-common-1.2.0.jar at spark://master:46964/jars/hbase-common-1.2.0.jar with timestamp 1529045579574
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/hbase-1.2.0/lib/hbase-protocol-1.2.0.jar at spark://master:46964/jars/hbase-protocol-1.2.0.jar with timestamp 1529045579575
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/hbase-1.2.0/lib/hbase-server-1.2.0.jar at spark://master:46964/jars/hbase-server-1.2.0.jar with timestamp 1529045579577
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/hbase-1.2.0/lib/htrace-core-3.1.0-incubating.jar at spark://master:46964/jars/htrace-core-3.1.0-incubating.jar with timestamp 1529045579578
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/hbase-1.2.0/lib/metrics-core-2.2.0.jar at spark://master:46964/jars/metrics-core-2.2.0.jar with timestamp 1529045579579
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/hbase-1.2.0/lib/hbase-hadoop2-compat-1.2.0.jar at spark://master:46964/jars/hbase-hadoop2-compat-1.2.0.jar with timestamp 1529045579581
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/hbase-1.2.0/lib/guava-12.0.1.jar at spark://master:46964/jars/guava-12.0.1.jar with timestamp 1529045579583
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/hbase-1.2.0/lib/protobuf-java-2.5.0.jar at spark://master:46964/jars/protobuf-java-2.5.0.jar with timestamp 1529045579584
18/06/15 14:52:59 INFO spark.SparkContext: Added JAR file:/usr/local/setup_tools/noc_hbase_test.jar at spark://master:46964/jars/noc_hbase_test.jar with timestamp 1529045579585
18/06/15 14:52:59 INFO client.StandaloneAppClient$ClientEndpoint: Connecting to master spark://master:7077...
18/06/15 14:52:59 INFO client.TransportClientFactory: Successfully created connection to master/192.168.189.1:7077 after 40 ms (0 ms spent in bootstraps)
18/06/15 14:53:00 INFO cluster.StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20180615145300-0004
18/06/15 14:53:00 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20180615145300-0004/0 on worker-20180615140035-worker1-39457 (worker1:39457) with 1 cores
18/06/15 14:53:00 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20180615145300-0004/0 on hostPort worker1:39457 with 1 cores, 512.0 MB RAM
18/06/15 14:53:00 INFO client.StandaloneAppClient$ClientEndpoint: Executor added: app-20180615145300-0004/1 on worker-20180615140043-worker3-56574 (worker3:56574) with 1 cores
18/06/15 14:53:00 INFO cluster.StandaloneSchedulerBackend: Granted executor ID app-20180615145300-0004/1 on hostPort worker3:56574 with 1 cores, 512.0 MB RAM
18/06/15 14:53:00 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43464.
18/06/15 14:53:00 INFO netty.NettyBlockTransferService: Server created on master:43464
18/06/15 14:53:00 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/06/15 14:53:00 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, master, 43464, None)
18/06/15 14:53:00 INFO storage.BlockManagerMasterEndpoint: Registering block manager master:43464 with 413.9 MB RAM, BlockManagerId(driver, master, 43464, None)
18/06/15 14:53:00 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, master, 43464, None)
18/06/15 14:53:00 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, master, 43464, None)
18/06/15 14:53:00 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20180615145300-0004/0 is now RUNNING
18/06/15 14:53:00 INFO client.StandaloneAppClient$ClientEndpoint: Executor updated: app-20180615145300-0004/1 is now RUNNING
18/06/15 14:53:00 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@d02f8d{/metrics/json,null,AVAILABLE,@Spark}
18/06/15 14:53:00 INFO cluster.StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
18/06/15 14:53:01 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 300.0 KB, free 413.6 MB)
18/06/15 14:53:01 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.5 KB, free 413.6 MB)
18/06/15 14:53:01 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on master:43464 (size: 26.5 KB, free: 413.9 MB)
18/06/15 14:53:01 INFO spark.SparkContext: Created broadcast 0 from newAPIHadoopRDD at SparkConnectHbaseTest.java:35
18/06/15 14:53:01 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x895416d connecting to ZooKeeper ensemble=192.168.189.1:2181,192.168.189.2:2181,192.168.189.3:2181
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:host.name=master
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_60
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/local/jdk1.8.0_60/jre
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/local/alluxio-1.7.0-hadoop-2.6/client/alluxio-1.7.0-client.jar:/usr/local/spark-2.2.1-bin-hadoop2.6/conf/:。。。。。。。。。
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:java.compiler=
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:os.version=3.16.0-30-generic
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:user.name=root
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Client environment:user.dir=/root
18/06/15 14:53:02 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=192.168.189.1:2181,192.168.189.2:2181,192.168.189.3:2181 sessionTimeout=90000 watcher=hconnection-0x895416d0x0, quorum=192.168.189.1:2181,192.168.189.2:2181,192.168.189.3:2181, baseZNode=/hbase
18/06/15 14:53:02 INFO zookeeper.ClientCnxn: Opening socket connection to server 192.168.189.3/192.168.189.3:2181. Will not attempt to authenticate using SASL (unknown error)
18/06/15 14:53:02 INFO zookeeper.ClientCnxn: Socket connection established to 192.168.189.3/192.168.189.3:2181, initiating session
18/06/15 14:53:02 INFO zookeeper.ClientCnxn: Session establishment complete on server 192.168.189.3/192.168.189.3:2181, sessionid = 0x3640207247f0009, negotiated timeout = 40000
18/06/15 14:53:04 INFO util.RegionSizeCalculator: Calculating region sizes for table "db_res:wtb_ow_operation".
18/06/15 14:53:05 INFO client.ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
18/06/15 14:53:05 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x3640207247f0009
18/06/15 14:53:05 INFO zookeeper.ClientCnxn: EventThread shut down
18/06/15 14:53:05 INFO zookeeper.ZooKeeper: Session: 0x3640207247f0009 closed
18/06/15 14:53:05 INFO spark.SparkContext: Starting job: count at SparkConnectHbaseTest.java:37
18/06/15 14:53:05 INFO scheduler.DAGScheduler: Got job 0 (count at SparkConnectHbaseTest.java:37) with 1 output partitions
18/06/15 14:53:05 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (count at SparkConnectHbaseTest.java:37)
18/06/15 14:53:05 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/06/15 14:53:05 INFO scheduler.DAGScheduler: Missing parents: List()
18/06/15 14:53:05 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at SparkConnectHbaseTest.java:35), which has no missing parents
18/06/15 14:53:05 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2040.0 B, free 413.6 MB)
18/06/15 14:53:05 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1278.0 B, free 413.6 MB)
18/06/15 14:53:05 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on master:43464 (size: 1278.0 B, free: 413.9 MB)
18/06/15 14:53:05 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/06/15 14:53:05 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (NewHadoopRDD[0] at newAPIHadoopRDD at SparkConnectHbaseTest.java:35) (first 15 tasks are for partitions Vector(0))
18/06/15 14:53:05 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
18/06/15 14:53:20 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
18/06/15 14:53:33 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.189.2:36455) with ID 0
18/06/15 14:53:33 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, worker1, executor 0, partition 0, NODE_LOCAL, 4879 bytes)
18/06/15 14:53:34 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker1:56820 with 117.0 MB RAM, BlockManagerId(0, worker1, 56820, None)
18/06/15 14:53:34 INFO cluster.CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (192.168.189.4:45624) with ID 1
18/06/15 14:53:35 INFO storage.BlockManagerMasterEndpoint: Registering block manager worker3:38924 with 117.0 MB RAM, BlockManagerId(1, worker3, 38924, None)
18/06/15 14:53:42 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on worker1:56820 (size: 1278.0 B, free: 117.0 MB)
18/06/15 14:53:43 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on worker1:56820 (size: 26.5 KB, free: 116.9 MB)
18/06/15 14:53:57 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 23730 ms on worker1 (executor 0) (1/1)
18/06/15 14:53:57 INFO scheduler.DAGScheduler: ResultStage 0 (count at SparkConnectHbaseTest.java:37) finished in 51.886 s
18/06/15 14:53:57 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/06/15 14:53:58 INFO scheduler.DAGScheduler: Job 0 finished: count at SparkConnectHbaseTest.java:37, took 52.827846 s
************SPARK from hbase count *************** 1 18/06/15 14:53:58 INFO spark.SparkContext: Starting job: foreach at SparkConnectHbaseTest.java:40
18/06/15 14:53:58 INFO scheduler.DAGScheduler: Got job 1 (foreach at SparkConnectHbaseTest.java:40) with 1 output partitions
18/06/15 14:53:58 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (foreach at SparkConnectHbaseTest.java:40)
18/06/15 14:53:58 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/06/15 14:53:58 INFO scheduler.DAGScheduler: Missing parents: List()
18/06/15 14:53:58 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (NewHadoopRDD[0] at newAPIHadoopRDD at SparkConnectHbaseTest.java:35), which has no missing parents
18/06/15 14:53:58 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.2 KB, free 413.6 MB)
18/06/15 14:53:58 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1430.0 B, free 413.6 MB)
18/06/15 14:53:58 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on master:43464 (size: 1430.0 B, free: 413.9 MB)
18/06/15 14:53:58 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006
18/06/15 14:53:58 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (NewHadoopRDD[0] at newAPIHadoopRDD at SparkConnectHbaseTest.java:35) (first 15 tasks are for partitions Vector(0))
18/06/15 14:53:58 INFO scheduler.TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
18/06/15 14:53:58 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, worker1, executor 0, partition 0, NODE_LOCAL, 4879 bytes)
18/06/15 14:53:58 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on worker1:56820 (size: 1430.0 B, free: 116.9 MB)
18/06/15 14:53:59 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 507 ms on worker1 (executor 0) (1/1)
18/06/15 14:53:59 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
18/06/15 14:53:59 INFO scheduler.DAGScheduler: ResultStage 1 (foreach at SparkConnectHbaseTest.java:40) finished in 0.508 s
18/06/15 14:53:59 INFO scheduler.DAGScheduler: Job 1 finished: foreach at SparkConnectHbaseTest.java:40, took 0.533378 s
18/06/15 14:58:02 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on master:43464 in memory (size: 1430.0 B, free: 413.9 MB)
18/06/15 14:58:02 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on worker1:56820 in memory (size: 1430.0 B, free: 116.9 MB)
console截图如下:
spark web截图如下: