第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战

 

 

114课程 scala 版本开始了

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第1张图片

 

 

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第2张图片

 

 

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第3张图片

 

 

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第4张图片

 

 

 

 

 

114 scala 改写的很java ,请读者谅解 

 

 


 

1. 114重写了scala代码

2. 在虚拟机master 下载 安装scalaide

3 导入spark kafka spark streaming的 jar包

4. 先改写第一个功能模块 黑名单识别 

遇到的问题及解决:

1.找不到kafka的jar 重新加载了一次
2. 找不到类,改代码 setjar,将 spark kafka的包  全部加上

3.运行报序列化错误,114的scala代码中 和java的语法搞混了 ,scala直接返回true false就可以了,不用return;

 


 

运行结果


第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第5张图片


第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第6张图片





第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第7张图片




root@master:/usr/local/spark-1.6.1-bin-hadoop2.6/sbin# service mysql stop
mysql stop/waiting
root@master:/usr/local/spark-1.6.1-bin-hadoop2.6/sbin# service mysql start
mysql start/running, process 12025
root@master:/usr/local/spark-1.6.1-bin-hadoop2.6/sbin# mysql -uroot -proot
ERROR 1040 (HY000): Too many connections
root@master:/usr/local/spark-1.6.1-bin-hadoop2.6/sbin# mysql -uroot -proot
ERROR 1040 (HY000): Too many connections
root@master:/usr/local/spark-1.6.1-bin-hadoop2.6/sbin# mysqladmin -uroot -proot processlist
mysqladmin: connect to server at 'localhost' failed
error: 'Too many connections'
root@master:/usr/local/spark-1.6.1-bin-hadoop2.6/sbin# mysqladmin -uroot -proot processlist
mysqladmin: connect to server at 'localhost' failed
error: 'Too many connections'
root@master:/usr/local/spark-1.6.1-bin-hadoop2.6/sbin# mysqladmin -uroot -proot processlist
mysqladmin: connect to server at 'localhost' failed
error: 'Too many connections'
root@master:/usr/local/spark-1.6.1-bin-hadoop2.6/sbin# 


问题解决:

单例模式


第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第8张图片




第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第9张图片


问题 kafka找不到leader

16/08/31 20:32:40 ERROR JobScheduler: Error generating jobs for time 1472646760000 ms
org.apache.spark.SparkException: ArrayBuffer(org.apache.spark.SparkException: Couldn't find leaders for Set([AdClicked,0]))
at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.latestLeaderOffsets(DirectKafkaInputDStream.scala:123)
at org.apache.spark.streaming.kafka.DirectKafkaInputDStream.compute(DirectKafkaInputDStream.scala:145)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:351)


第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第10张图片


问题解决 :kafka重启,重新建立topic

producer.send( new KeyedMessage("ScalaAdClicked",clickedAd));








/usr/local/kafka_2.10-0.8.2.1/bin/kafka-topics.sh --create --zookeeper master:2181,worker1:2181,worker2:2181 --replication-factor 1 --partitions 1 --topic ScalaAdClicked 






/usr/local/kafka_2.10-0.8.2.1/bin/kafka-topics.sh --list  --zookeeper master:2181,worker1:2181,worker2:2181  






root@worker1:/usr/local/kafka_2.10-0.8.2.1/bin# kafka-topics.sh --delete --zookeeper master:2181,worker1:2181,worker2:2181 --topic  ScalaAdClicked
Topic ScalaAdClicked is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.


ScalaAdClicked






root@worker1:/usr/local/kafka_2.10-0.8.2.1/bin# kafka-topics.sh --create --zookeeper master:2181,worker1:2181,worker2:2181 --replication-factor 1 --partitions 1 --topic IMFScalaAdClicked
Created topic "IMFScalaAdClicked".



kill -9 













root@worker1:/usr/local/setup_scripts# kafka-console-producer.sh --broker-list master:9092,worker1:9092,worker2:9092 --topic IMFScalaAdClicked
[2016-08-31 21:34:04,467] WARN Property topic is not valid (kafka.utils.VerifiableProperties)


dfg
dfg
gfh
fgj
ghj
ghk
fggfjghjgfjgjfg




root@worker2:~# /usr/local/kafka_2.10-0.8.2.1/bin/kafka-console-consumer.sh --zookeeper master:2181,worker1:2181,worker2:2181 --from-beginning --topic  IMFScalaAdClicked


dfg
dfg
gfh
fgj
ghj
ghk
fggfjghjgfjgjfg





第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第11张图片


 

 




16/09/03 15:31:52 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
16/09/03 15:31:52 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/09/03 15:31:52 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/09/03 15:32:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on worker2:50202 (size: 2014.0 B, free: 517.4 MB)
16/09/03 15:32:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on worker3:42333 (size: 1150.0 B, free: 517.4 MB)
16/09/03 15:32:30 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, worker2): java.lang.ClassNotFoundException: com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$6$$anonfun$7
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:64)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)



java.lang.IndexOutOfBoundsException: 1
        at scala.collection.mutable.ListBuffer.apply(ListBuffer.scala:120)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper$$anonfun$doBatch$1$$anonfun$apply$2.apply$mcVI$sp(AdClickedStreamingStats.scala:607)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper$$anonfun$doBatch$1.apply(AdClickedStreamingStats.scala:605)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper$$anonfun$doBatch$1.apply(AdClickedStreamingStats.scala:604)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
        at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper.doBatch(AdClickedStreamingStats.scala:604)
        at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$main$1$$anonfun$apply$5.apply(AdClickedStreamingStats.scala:229)
        at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$main$1$$anonfun$apply$5.apply(AdClickedStreamingStats.scala:159)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)











java.lang.IndexOutOfBoundsException: 4
        at scala.collection.mutable.ListBuffer.apply(ListBuffer.scala:120)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper$$anonfun$doBatch$1$$anonfun$apply$2.apply$mcVI$sp(AdClickedStreamingStats.scala:607)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper$$anonfun$doBatch$1.apply(AdClickedStreamingStats.scala:605)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper$$anonfun$doBatch$1.apply(AdClickedStreamingStats.scala:604)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
        at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper.doBatch(AdClickedStreamingStats.scala:604)
        at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$main$1$$anonfun$apply$5.apply(AdClickedStreamingStats.scala:229)
        at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$main$1$$anonfun$apply$5.apply(AdClickedStreamingStats.scala:159)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
        at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
        at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)




16/08/27 21:04:00 INFO streaming.CheckpointWriter: Submitted checkpoint of time 1472303040000 ms writer queue
16/08/27 21:04:08 INFO streaming.CheckpointWriter: Deleting hdfs://master:9000/usr/local/IMF_testdata/IMFcheckpoint114/checkpoint-1471954880000
16/08/27 21:04:08 INFO streaming.CheckpointWriter: Checkpoint for time 1472303020000 ms saved to file 'hdfs://master:9000/usr/local/IMF_testdata/IMFcheckpoint114/checkpoint-1472303020000', took 4452 bytes and 14715 ms
16/08/27 21:04:08 INFO streaming.CheckpointWriter: Saving checkpoint for time 1472303030000 ms to file 'hdfs://master:9000/usr/local/IMF_testdata/IMFcheckpoint114/checkpoint-1472303030000'
com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Data source rejected establishment of connection,  message from server: "Too many connections"
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:409)
        at com.mysql.jdbc.Util.getInstance(Util.java:384)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1015)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:989)
        at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:984)
        at com.mysql.jdbc.MysqlIO.doHandshake(MysqlIO.java:1104)
        at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2312)
        at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2122)
        at com.mysql.jdbc.ConnectionImpl.(ConnectionImpl.java:774)
        at com.mysql.jdbc.JDBC4Connection.(JDBC4Connection.java:49)
        at sun.reflect.GeneratedConstructorAccessor17.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:409)
        at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:375)
        at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:289)
        at java.sql.DriverManager.getConnection(DriverManager.java:664)
        at java.sql.DriverManager.getConnection(DriverManager.java:247)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper$$anonfun$1.apply$mcVI$sp(AdClickedStreamingStats.scala:159)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
        at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper.(AdClickedStreamingStats.scala:157)
        at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$2.apply(AdClickedStreamingStats.scala:53)
        at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$2.apply(AdClickedStreamingStats.scala:46)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$transform$1$$anonfun$apply$21.apply(DStream.scala:700)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$transform$1$$anonfun$apply$21.apply(DStream.scala:700)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$transform$2$$anonfun$5.apply(DStream.scala:714)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$transform$2$$anonfun$5.apply(DStream.scala:712)
        at org.apache.spark.streaming.dstream.TransformedDStream.compute(TransformedDStream.scala:46)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:352)



16/08/29 20:55:14 INFO Executor: Finished task 4.0 in stage 39.0 (TID 51). 1165 bytes result sent to driver
16/08/29 20:55:14 INFO TaskSetManager: Finished task 4.0 in stage 39.0 (TID 51) in 489 ms on localhost (2/5)
java.lang.ArrayIndexOutOfBoundsException: 3
	at scala.runtime.ScalaRunTime$.array_apply(ScalaRunTime.scala:71)
	at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper$$anonfun$doQuery$1.apply$mcVI$sp(AdClickedStreamingStats.scala:641)
	at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
	at com.dt.spark.streaming114.AdClickedStreamingStats$JDBCWrapper.doQuery(AdClickedStreamingStats.scala:640)
	at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$main$1$$anonfun$apply$4$$anonfun$apply$5.apply(AdClickedStreamingStats.scala:196)
	at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$main$1$$anonfun$apply$4$$anonfun$apply$5.apply(AdClickedStreamingStats.scala:182)
	at scala.collection.immutable.List.foreach(List.scala:318)
	at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
	at scala.collection.mutable.ListBuffer.foreach(ListBuffer.scala:45)
	at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$main$1$$anonfun$apply$4.apply(AdClickedStreamingStats.scala:182)
	at com.dt.spark.streaming114.AdClickedStreamingStats$$anonfun$main$1$$anonfun$apply$4.apply(AdClickedStreamingStats.scala:159)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
	at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
java.lang.ArrayIndexOutOfBoundsException: 3
	at scala.runtime.ScalaRunTime$.array_apply(ScalaRunTime.scala:71)




16/09/03 20:13:27 INFO cluster.SparkDeploySchedulerBackend: Shutting down all executors
16/09/03 20:13:27 INFO cluster.SparkDeploySchedulerBackend: Asking each executor to shut down
16/09/03 20:13:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(1,WrappedArray())
16/09/03 20:13:28 INFO client.AppClient$ClientEndpoint: Executor updated: app-20160903201241-0004/7 is now FAILED (java.io.IOException: No space left on device)
16/09/03 20:13:28 INFO cluster.SparkDeploySchedulerBackend: Executor app-20160903201241-0004/7 removed: java.io.IOException: No space left on device
16/09/03 20:13:28 WARN netty.NettyRpcEndpointRef: Error sending message [message = RemoveExecutor(7,java.io.IOException: No space left on device)] in 1 attempts
org.apache.spark.SparkException: Could not find CoarseGrainedScheduler or it has been stopped.
        at org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:161)
        at org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:126)
        at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:227)
        at org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:511)
        at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:100)
        at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
        at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:359)
        at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.executorRemoved(SparkDeploySchedulerBackend.scala:144)
        at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:186)
        at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
        at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
        at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
        at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

16/09/03 20:13:12 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/09/03 20:13:23 INFO utils.VerifiableProperties: Verifying properties
16/09/03 20:13:23 INFO utils.VerifiableProperties: Property group.id is overridden to 
16/09/03 20:13:23 INFO utils.VerifiableProperties: Property zookeeper.connect is overridden to 
16/09/03 20:13:24 INFO consumer.SimpleConsumer: Reconnect due to socket error: java.nio.channels.ClosedChannelException
Exception in thread "main" org.apache.spark.SparkException: org.apache.spark.SparkException: Couldn't find leaders for Set([IMFScalaAdClicked,0])
        at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
        at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
        at scala.util.Either.fold(Either.scala:97)
        at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:365)
        at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:222)
        at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
        at com.dt.spark.streaming114.AdClickedStreamingStats$.main(AdClickedStreamingStats.scala:48)
        at com.dt.spark.streaming114.AdClickedStreamingStats.main(AdClickedStreamingStats.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


mysql> select *  from adprovincetopn     limit 10   ;
+------------+------+----------+--------------+
| timestamp  | adID | province | clickedCount |
+------------+------+----------+--------------+
| 2016-09-03 | 93   | Jiangsu  |            5 |
| 2016-09-03 | 65   | Jiangsu  |            4 |
| 2016-09-03 | 89   | Jiangsu  |            4 |
| 2016-09-03 | 61   | Jiangsu  |            4 |
| 2016-09-03 | 4    | Jiangsu  |            3 |
| 2016-09-03 | 18   | Fujian   |            5 |
| 2016-09-03 | 0    | Fujian   |            4 |
| 2016-09-03 | 28   | Fujian   |            4 |
| 2016-09-03 | 41   | Fujian   |            4 |
| 2016-09-03 | 7    | Fujian   |            4 |
+------------+------+----------+--------------+
10 rows in set (0.49 sec)

mysql> select *  from adclickedtrend   limit 10   ;
+---------------+---------------+---------------+------+--------------+
| date          | hour          | minute        | adID | clickedCount |
+---------------+---------------+---------------+------+--------------+
| 1472875756481 | 1472875756481 | 1472875756481 | 81   |            1 |
| 1472875794559 | 1472875794559 | 1472875794559 | 71   |            1 |
| 1472875692339 | 1472875692339 | 1472875692339 | 46   |            1 |
| 1472875742455 | 1472875742455 | 1472875742455 | 2    |            1 |
| 1472875744457 | 1472875744457 | 1472875744457 | 5    |            1 |
| 1472875720391 | 1472875720391 | 1472875720391 | 82   |            1 |
| 1472875696344 | 1472875696344 | 1472875696344 | 47   |            1 |
| 1472875746465 | 1472875746465 | 1472875746465 | 7    |            1 |
| 1472875772517 | 1472875772517 | 1472875772517 | 18   |            1 |
| 1472875802576 | 1472875802576 | 1472875802576 | 84   |            1 |
+---------------+---------------+---------------+------+--------------+
10 rows in set (0.00 sec)

mysql> select *  from adclickedcount  limit 10   ;
+---------------+------+-----------+-----------+--------------+
| timestamp     | adID | province  | city      | clickedCount |
+---------------+------+-----------+-----------+--------------+
| 1472868077139 | 25   | Jiangsu   | WuXi      |            1 |
| 1472868075138 | 34   | Zhejiang  | Hangzhou  |            1 |
| 1472868079140 | 4    | Guangdong | Shenzhen  |            1 |
| 1472868113158 | 96   | Guangdong | DongGuan  |            1 |
| 1472868109156 | 71   | Zhejiang  | Wenzhou   |            1 |
| 1472868089146 | 81   | Fujian    | Fuzhou    |            1 |
| 1472868121163 | 0    | Guangdong | Guangzhou |            1 |
| 1472868081141 | 63   | Fujian    | Sanming   |            1 |
| 1472868091147 | 55   | Fujian    | Fuzhou    |            1 |
| 1472868103153 | 87   | Fujian    | Ximen     |            1 |
+---------------+------+-----------+-----------+--------------+
10 rows in set (0.02 sec)

mysql> select *  from adclicked limit 10   ;
+---------------+-----------------+--------+------+-----------+----------+--------------+
| timestamp     | ip              | userID | adID | province  | city     | clickedCount |
+---------------+-----------------+--------+------+-----------+----------+--------------+
| 1472867838979 | 192.168.112.254 | 6233   | 70   | Guangdong | Shenzhen |            0 |
| 1472867842981 | 192.168.112.248 | 657    | 96   | Jiangsu   | Suzhou   |            0 |
| 1472867846983 | 192.168.112.251 | 6864   | 64   | Jiangsu   | WuXi     |            0 |
| 1472867840980 | 192.168.112.239 | 1742   | 9    | Guangdong | Shenzhen |            0 |
| 1472867848984 | 192.168.112.252 | 333    | 47   | Fujian    | Sanming  |            0 |
| 1472867844982 | 192.168.112.254 | 7528   | 73   | Fujian    | Fuzhou   |            0 |
| 1472867852986 | 192.168.112.254 | 2162   | 70   | Jiangsu   | Suzhou   |            0 |
| 1472867854987 | 192.168.112.254 | 2976   | 29   | Jiangsu   | Suzhou   |            0 |
| 1472867858990 | 192.168.112.250 | 7361   | 57   | Zhejiang  | Wenzhou  |            0 |
| 1472867850985 | 192.168.112.251 | 4063   | 85   | Fujian    | Sanming  |            0 |
+---------------+-----------------+--------+------+-----------+----------+--------------+
10 rows in set (0.00 sec)

mysql> select *  from blacklisttable    limit 10   ;
+------+
| name |
+------+
| 7308 |
| 5984 |
| 4354 |
| 8894 |
| 4708 |
| 4430 |
| 6663 |
| 6424 |
| 1303 |
| 7128 |
+------+
10 rows in set (0.00 sec)


    

 

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第12张图片

 

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第13张图片

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第14张图片

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第15张图片

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第16张图片

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第17张图片

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第18张图片

 

第114课(Scala版本)SparkStreaming+Kafka+Spark SQL+TopN+Mysql 电商广告点击综合案例实战_第19张图片

 

 

 

你可能感兴趣的:(Hadoop)