Shark运维常见问题

Shark部署常见问题
1、readObject can't find class org.apache.hadoop.hive.conf.HiveConf
org.apache.spark.SparkException: Job aborted: Task 0.0:3 failed 4 times (most recent failure: Exception failure: java.lang.RuntimeException: readObject can't find class org.apache.hadoop.hive.conf.HiveConf)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
解决方法:
这时候需要在classpath里面寻找是否有匹配的hive版本的jar包。
在shark/lib_managed/jars/edu.berkeley.cs.shark/下检查是否有编译好的amplab hive 0.11 的jar包,要注意版本号。


2、Caused by: java.lang.ClassNotFoundException: org.apache.hive.builtins.BuiltinUtils
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:190)
        at org.apache.hadoop.hive.ql.exec.Utilities.getBuiltinUtilsClass(Utilities.java:2318)
        at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:197)
        ... 3 more

解决方法:,将hive下面的lib的到shark下面的
/home/hadoop/shengli/shark/lib_managed/jars/edu.berkeley.cs.shark/
新建一个文件夹叫hive-builtins
cp /home/hadoop/shengli/hive/lib/hive-builtins-0.11.0-shark-0.9.1.jar ./


3、 ERROR hive.log: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:401)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
        at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:136)
        at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:151)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getDefaultDatabasePath(HiveMetaStore.java:475)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:353)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:371)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:278)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:248)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:114)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1009)
        at shark.memstore2.TableRecovery$.reloadRdds(TableRecovery.scala:46)
        at shark.SharkCliDriver.<init>(SharkCliDriver.scala:269)
        at shark.SharkCliDriver$.main(SharkCliDriver.scala:161)

        at shark.SharkCliDriver.main(SharkCliDriver.scala)


解决方法: 
1、到该路径下,将/home/hadoop/shengli/shark/lib_managed/jars/org.apache.hadoop
drwxr-xr-x 2 hadoop games 4096 06-11 17:43 hadoop-client
drwxr-xr-x 2 hadoop games 4096 06-11 17:41 hadoop-core
文件夹中的hadoop jar的版本替换为当前集群的版本
find . -name *hadoop*jar
2、
既然是shark构建在spark之上,所以痴线这种错误,可能是spark的问题
进入sparkshell试下读取一个HDFS的文件,如果不行的话,重新编译一下。


4、Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/thirdparty/guava/common/collect/LinkedListMultimap
        at org.apache.hadoop.hdfs.SocketCache.<init>(SocketCache.java:48)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:253)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:220)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1611)
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:68)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1645)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1627)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183)
        at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104)
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:136)
        at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:151)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getDefaultDatabasePath(HiveMetaStore.ja

解决方法: 还是当前路径/home/hadoop/shengli/shark/lib_managed/jars/org.apache.hadoop,新建一个thirdparty文件夹,将hadoop_home/lib/xxx_guava_xxx.jar这个包考到当前目录下



5、Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
14/06/11 17:48:27 ERROR CliDriver: Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
java.io.IOException: java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork!
        at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:341)


解决方法:
将lzo的jar包拷贝到上面的路径
cp /home/hadoop/src/hadoop/lib/hadoop-lzo-0.4.15.jar /home/hadoop/shengli/shark/lib_managed/jars/org.apache.hadoop/lzo


6、连接不上Master
14/06/11 17:58:53 ERROR client.Client$ClientActor: All masters are unresponsive! Giving up.
14/06/11 17:58:53 ERROR cluster.SparkDeploySchedulerBackend: Spark cluster looks dead, giving up.
14/06/11 17:58:53 ERROR cluster.ClusterScheduler: Exiting due to error from cluster scheduler: Spark cluster looks down
1)WARN cluster.ClusterScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

     ERROR client.Client$ClientActor: All masters are unresponsive! Giving up.

解决方法:
1、vi spark-env.sh,添加export SPARK_MASTER_IP=xxx.xxx.xxx.xxx, 其中后面的ip是master节点的ip
2、如果还是不行,发现是版本不一致导致的,该shark下面的spark jar是0.81的
cp spark-assembly-0.9.1-hadoop0.20.2-cdh3u5.jar /home/hadoop/shengli/shark/lib_managed/jars/org.apache.spark/


还是重新编译一下shark吧。修改project/ScalaBuild.scala 将版本号对应一下。
sbt/sbt assembly


7、akka.remote.EndpointAssociationException: Association failed
解决方法:
spark://116.211.20.207:7077
这个连不上是因为我设置了SPARK_MASTER_IP是IP地址
而又设置了SPARK_MASTER_IP为域名,导致连接不上。还是需要看webui上sparkmaster是多少才行。


8、app-20140224234441-0003/1 is now FAILED (Command exited with code 1)
解决方法:

可能是hosts没配,通讯的时候失败了,/etc/hosts里面要配置每个集群的域名和IP,要有集群内所有的。


9、CLASS NOT FOUND
shark运行找不到jar包,这里可以将shark的lib都找出来,放到一个文件夹内,配上java的classpath共集群所以机器使用
#!/bin/bash
for jar in `find /home/hadoop/shengli/shark/lib -name '*jar'`; do
      cp $jar /home/hadoop/shengli/sharklib/
done
for jar in `find /home/hadoop/shengli/shark/lib_managed/jars -name '*jar'`; do
      cp $jar /home/hadoop/shengli/sharklib/
done
for jar in `find /home/hadoop/shengli/shark/lib_managed/bundles -name '*jar'`; do
  cp $jar /home/hadoop/shengli/sharklib/
done

10、shark.execution.HadoopTableReader not found
又找不到jar,肯定是classpath的问题了
去shark的源码查看一下,有这个文件,于是想到肯定是shark的包没有加到classpath

去WebUI里面的executor的std输出里面查看下CLASSPATH,看是否加进来没有,遂将shark编译好的jar放到sharklib里面,重启spark集群。


原创文章,转载请注明出自http://blog.csdn.net/oopsoom/article/details/32152585

-EOF-

你可能感兴趣的:(集群,spark,运维,shark)