spark异常汇总
1、输出目录已存在
diagnostics: Application application_1444384383185_2518 failed 2 times due to AM Container for appattempt_1444384383185_2518_000002 exited with exitCode: 15 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:279)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
main : command provided 1
main : user is yule
main : requested yarn user is yule
Container exited with a non-zero exit code 15
.Failing this attempt.. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.default
start time: 1447662442842
final status: FAILED
tracking URL: a01.master.spark.hadoop.qingdao.youku:8088/cluster/app/application_1444384383185_2518
user: yule
Exception in thread "main" org.apache.spark.SparkException: Application finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:622)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:647)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
解决:删除输出目录
hadoop fs -rmr /workspace/yule/test/sparkoutput1
spark sql中遇到的问题:
http://www.cnblogs.com/shishanyuan/p/4723604.html?utm_source=tuicool
http://www.aboutyun.com/forum.php?mod=viewthread&tid=12358&page=1
http://dataunion.org/13433.html
http://www.csdn.net/article/2015-07-10/2825184
https://spark.apache.org/docs/latest/sql-programming-guide.html
【01】Spark 简单实例
http://my.oschina.net/scipio/blog/284957
整理对Spark SQL的理解
http://www.360doc.com/content/14/0722/22/2459_396377522.shtml
重点:
SparkSql 使用
http://blog.csdn.net/escaflone/article/details/43272477
SparkStream 使用
http://blog.csdn.net/escaflone/article/details/43341275
Spark MLlib
http://blog.csdn.net/escaflone/article/details/43371505
Scala 自学笔记
http://blog.csdn.net/escaflone/article/details/43485345
Spark-1.3.1与Hive整合实现查询分析
在大数据应用场景下,使用过Hive做查询统计分析的应该知道,计算的延迟性非常大,可能一个非常复杂的统计分析需求,需要运行1个小时以上,但是比之于使用MySQL之类关系数据库做分析,执行速度快很多很多。使用HiveQL写类似SQL的查询分析语句,最终经过Hive查询解析器,翻译成Hadoop平台上的MapReduce程序进行运行,这也是MapReduce计算引擎的特点带来的延迟问题:Map中间结果写文件。如果一个HiveQL语句非常复杂,会被翻译成多个MapReduce Job,那么就会有很多的Map输出中间结果数据到文件中,基本没有数据的共享。
如果使用Spark计算平台,基于Spark RDD数据集模型计算,可以减少计算过程中产生中间结果数据写文件的开销,Spark会把数据直接放到内存中供后续操作共享数据,减少了读写磁盘I/O操作带来的延时。另外,如果基于Spark on YARN部署模式,可以充分利用数据在Hadoop集群DataNode节点的本地性(Locality)特点,减少数据传输的通信开销。
http://shiyanjun.cn/archives/1113.html
Spark 附带示例完整解释
http://www.ibm.com/developerworks/cn/opensource/os-cn-spark-code-samples/index.html
SparkSQL: no typetag available for xxxx
case class 类要定义在Object类的上面
如果cass class类放在了Object类里面,就会报标题的异常
6:Tips 上面介绍了sparkSQL的基础应用,sparkSQL还在高速发展中,存在者不少缺陷,如:
scala2.10.4本身对case class有22列的限制,在使用RDD数据源的时候就会造成不方便;sqlContext中3个表不能同时join,需要两两join后再join一次;sqlContext中不能直接使用values插入数据;。。。 总的来说,hiveContext还是令人满意,sqlContext就有些差强人意了。另外,顺便提一句,在编写sqlContext应用程序的时候,case class要定义在object之外。
http://www.it165.net/database/html/201409/8106.html