shark 安装 遇到的问题

声明版本号:

hadoop:  apache  2.2.0

spark:     0.9.1

shark:     0.9.1

hive:       0.12.0


shark官网:http://shark.cs.berkeley.edu/

shark on cluster 文档:https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster

按照文档进行配置,最后启动shark,出现以下问题:

[html]  view plain copy print ?
  1. Exception in thread "main" org.apache.spark.SparkException: YARN mode not available ?  
  2.         at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1275)  
  3.         at org.apache.spark.SparkContext.<init>(SparkContext.scala:201)  
  4.         at shark.SharkContext.<init>(SharkContext.scala:42)  
  5.         at shark.SharkContext.<init>(SharkContext.scala:61)  
  6.         at shark.SharkEnv$.initWithSharkContext(SharkEnv.scala:78)  
  7.         at shark.SharkEnv$.init(SharkEnv.scala:38)  
  8.         at shark.SharkCliDriver.<init>(SharkCliDriver.scala:278)  
  9.         at shark.SharkCliDriver$.main(SharkCliDriver.scala:162)  
  10.         at shark.SharkCliDriver.main(SharkCliDriver.scala)  
  11. Caused by: java.lang.ClassNotFoundException: org.apache.spark.scheduler.cluster.YarnClientClusterScheduler  
  12.         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)  
  13.         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)  
  14.         at java.security.AccessController.doPrivileged(Native Method)  
  15.         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)  
  16.         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)  
  17.         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)  
  18.         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)  
  19.         at java.lang.Class.forName0(Native Method)  
  20.         at java.lang.Class.forName(Class.java:190)  
  21.         at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:1269)  
  22.         ... 8 more  

自己猜测应该是 SPARK_ASSEMBLY_JAR 没有加载,通过追代码,确实是这个问题:

在 $SHARK_HOME/run 脚本中加入下面的代码:

[html]  view plain copy print ?
  1. if [ -f "$SPARK_JAR" ] ; then  
  2.     SPARK_CLASSPATH+=":$SPARK_JAR"  
  3.     echo "SPARK CLASSPATH : "$SPARK_CLASSPATH  
  4. fi  

但是又出现下面的问题:

[html]  view plain copy print ?
  1. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.reflect.InvocationTargetException  
  2.         at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.getClient(RpcClientFactoryPBImpl.java:79)  
  3.         at org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC.getProxy(HadoopYarnProtoRPC.java:48)  
  4.         at org.apache.hadoop.yarn.client.RMProxy$1.run(RMProxy.java:134)  
  5.         at java.security.AccessController.doPrivileged(Native Method)  
  6.         at javax.security.auth.Subject.doAs(Subject.java:356)  
  7.         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1528)  
  8.         at org.apache.hadoop.yarn.client.RMProxy.getProxy(RMProxy.java:130)  
  9.         at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)  
  10.         at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:70)  
  11.         at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceStart(YarnClientImpl.java:114)  
  12.         at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)  
  13.         at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:76)  
  14.         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:78)  
  15.         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:126)  
  16.         at org.apache.spark.SparkContext.<init>(SparkContext.scala:202)  
  17.         at shark.SharkContext.<init>(SharkContext.scala:42)  
  18.         at shark.SharkContext.<init>(SharkContext.scala:61)  
  19.         at shark.SharkEnv$.initWithSharkContext(SharkEnv.scala:78)  
  20.         at shark.SharkEnv$.init(SharkEnv.scala:38)  
  21.         at shark.SharkCliDriver.<init>(SharkCliDriver.scala:278)  
  22.         at shark.SharkCliDriver$.main(SharkCliDriver.scala:162)  
  23.         at shark.SharkCliDriver.main(SharkCliDriver.scala)  
  24. Caused by: java.lang.reflect.InvocationTargetException  
  25.         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)  
  26.         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)  
  27.         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  
  28.         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)  
  29.         at org.apache.hadoop.yarn.factories.impl.pb.RpcClientFactoryPBImpl.getClient(RpcClientFactoryPBImpl.java:76)  
  30.         ... 21 more  
  31. Caused by: java.lang.VerifyError: class org.apache.hadoop.security.proto.SecurityProtos$GetDelegationTokenRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; (我当时就是犯了这个错误,搞了一天) 
  32.         at java.lang.ClassLoader.defineClass1(Native Method)  
  33.         at java.lang.ClassLoader.defineClass(ClassLoader.java:792)  
  34.         at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)  
  35.         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)  
  36.         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)  
  37.         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)  
  38.         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)  
  39.         at java.security.AccessController.doPrivileged(Native Method)  
  40.         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)  
  41.         at java.lang.ClassLoader.loadClass(ClassLoader.java:424)  
  42.         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)  
  43.         at java.lang.ClassLoader.loadClass(ClassLoader.java:357)  
  44.         at java.lang.Class.getDeclaredMethods0(Native Method)  
  45.         at java.lang.Class.privateGetDeclaredMethods(Class.java:2521)  
  46.         at java.lang.Class.privateGetPublicMethods(Class.java:2641)  
  47.         at java.lang.Class.privateGetPublicMethods(Class.java:2651)  
  48.         at java.lang.Class.getMethods(Class.java:1457)  
  49.         at sun.misc.ProxyGenerator.generateClassFile(ProxyGenerator.java:426)  
  50.         at sun.misc.ProxyGenerator.generateProxyClass(ProxyGenerator.java:323)  
  51.         at java.lang.reflect.Proxy.getProxyClass0(Proxy.java:636)  
  52.         at java.lang.reflect.Proxy.newProxyInstance(Proxy.java:722)  
  53.         at org.apache.hadoop.ipc.ProtobufRpcEngine.getProxy(ProtobufRpcEngine.java:92)  
  54.         at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:537)  
  55.         at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:482)  
  56.         at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:447)  
  57.         at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:600)  
  58.         at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:557)  
  59.         at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.<init>(ApplicationClientProtocolPBClientImpl.java:111)  
  60.         ... 26 more  

这是protobuf版本冲突造成的,首先查了一下 shark 目录下的 protobuf 的jar 包:

find . -name "proto*.jar"

发现只有: ./lib_managed/bundles/com.google.protobuf/protobuf-java/protobuf-java-2.5.0.jar

我使用的 shark-0.9.1 版本的,之前使用过 0.9.0 版本,里面有 protobuf-java-2.4.1-shaded.jar

查了半天,终于找到问题的解决办法:

找到 ./lib_managed/jars/edu.berkeley.cs.shark/hive-exec/hive-exec-0.11.0-shark-0.9.1.jar这个jar 包

然后将这个jar包解压: jar tf hive-exec-0.11.0-shark-0.9.1.jar 

将  com/google/protobuf 目录的class文件全部干掉,重新打包即可。

ok,继续运行 bin/shark-withinfo

又遇到问题:

[html]  view plain copy print ?
  1. 14/04/16 16:01:44 INFO yarn.Client: Setting up the launch environment  
  2. Exception in thread "main" java.lang.NullPointerException  
  3.         at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)  
  4.         at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)  
  5.         at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)  
  6.         at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)  
  7.         at org.apache.spark.deploy.yarn.Client$.populateHadoopClasspath(Client.scala:498)  
  8.         at org.apache.spark.deploy.yarn.Client$.populateClasspath(Client.scala:519)  
  9.         at org.apache.spark.deploy.yarn.Client.setupLaunchEnv(Client.scala:333)  
  10.         at org.apache.spark.deploy.yarn.Client.runApp(Client.scala:94)  
  11.         at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:78)  
  12.         at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:126)  
  13.         at org.apache.spark.SparkContext.<init>(SparkContext.scala:202)  
  14.         at shark.SharkContext.<init>(SharkContext.scala:42)  
  15.         at shark.SharkContext.<init>(SharkContext.scala:61)  
  16.         at shark.SharkEnv$.initWithSharkContext(SharkEnv.scala:78)  
  17.         at shark.SharkEnv$.init(SharkEnv.scala:38)  
  18.         at shark.SharkCliDriver.<init>(SharkCliDriver.scala:278)  
  19.         at shark.SharkCliDriver$.main(SharkCliDriver.scala:162)  
  20.         at shark.SharkCliDriver.main(SharkCliDriver.scala)  

原来是shark的一个小bug: http://mail-archives.apache.org/mod_mbox/spark-user/201404.mbox/%3CCAM9h1cfrSmwczCMobHZxvVPLoP-syrvVCAsF9ohokRdwhUwrBQ@mail.gmail.com%3E

在 yarn-site.xml 文件中设置  yarn.application.classpath  的默认值试一试

这个问题 ok 了

但是,又有下面的问题了:

[html]  view plain copy print ?
  1. Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient  
  2.         at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1072)  
  3.         at shark.memstore2.TableRecovery$.reloadRdds(TableRecovery.scala:49)  
  4.         at shark.SharkCliDriver.<init>(SharkCliDriver.scala:283)  
  5.         at shark.SharkCliDriver$.main(SharkCliDriver.scala:162)  
  6.         at shark.SharkCliDriver.main(SharkCliDriver.scala)  
  7. Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient  
  8.         at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1139)  
  9.         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:51)  
  10.         at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:61)  
  11.         at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2288)  
  12.         at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2299)  
  13.         at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1070)  
  14.         ... 4 more  
  15. Caused by: java.lang.reflect.InvocationTargetException  
  16.         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)  
  17.         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)  
  18.         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  
  19.         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)  
  20.         at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1137)  
  21.         ... 9 more  
  22. Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory  
  23. NestedThrowables:  
  24. java.lang.reflect.InvocationTargetException  
  25.         at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587)  
  26.         at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:781)  
  27.         at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:326)  
  28.         at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:195)  
  29.         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
  30.         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
  31.         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
  32.         at java.lang.reflect.Method.invoke(Method.java:606)  
  33.         at javax.jdo.JDOHelper$16.run(JDOHelper.java:1958)  
  34.         at java.security.AccessController.doPrivileged(Native Method)  
  35.         at javax.jdo.JDOHelper.invoke(JDOHelper.java:1953)  
  36.         at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)  
  37.         at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803)  
  38.         at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698)  
  39.         at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:270)  
  40.         at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:299)  
  41.         at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:229)  
  42.         at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:204)  
  43.         at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)  
  44.         at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)  
  45.         at org.apache.hadoop.hive.metastore.RetryingRawStore.<init>(RetryingRawStore.java:62)  
  46.         at org.apache.hadoop.hive.metastore.RetryingRawStore.getProxy(RetryingRawStore.java:71)  
  47.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:413)  
  48.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:401)  
  49.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:439)  
  50.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:325)  
  51.         at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:285)  
  52.         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54)  
  53.         at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)  
  54.         at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4102)  
  55.         at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:121)  
  56.         ... 14 more  
  57. Caused by: java.lang.reflect.InvocationTargetException  
  58.         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)  
  59.         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)  
  60.         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  
  61.         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)  
  62.         at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)  
  63.         at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)  
  64.         at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:281)  
  65.         at org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:239)  
  66.         at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:292)  
  67.         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)  
  68.         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)  
  69.         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  
  70.         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)  
  71.         at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)  
  72.         at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)  
  73.         at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1069)  
  74.         at org.datanucleus.NucleusContext.initialise(NucleusContext.java:359)  
  75.         at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:768)  
  76.         ... 43 more  
  77. Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "DBCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.  
  78.         at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:237)  
  79.         at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:110)  
  80.         at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:82)  
  81.         ... 61 more  
  82. Caused by: org.datanucleus.store.rdbms.datasource.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.  
  83.         at org.datanucleus.store.rdbms.datasource.AbstractDataSourceFactory.loadDriver(AbstractDataSourceFactory.java:58)  
  84.         at org.datanucleus.store.rdbms.datasource.DBCPDataSourceFactory.makePooledDataSource(DBCPDataSourceFactory.java:55)  
  85.         at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:217)  
  86.         ... 63 more  

将 mysql-connector.jar  文件添加到 $SHARK_HOME/lib_managed/jars/edu.berkeley.cs.shark/hive-jdbc 目录下 即可。

或许 还会遇到下面的问题:

[html]  view plain copy print ?
  1. 14/04/16 17:03:44 ERROR DataNucleus.Datastore: Error thrown executing CREATE TABLE `SERDE_PARAMS`  
  2. (  
  3.     `SERDE_ID` BIGINT NOT NULL,  
  4.     `PARAM_KEY` VARCHAR(256) BINARY NOT NULL,  
  5.     `PARAM_VALUE` VARCHAR(4000) BINARY NULL,  
  6.     CONSTRAINT `SERDE_PARAMS_PK` PRIMARY KEY (`SERDE_ID`,`PARAM_KEY`)  
  7. ENGINE=INNODB : Specified key was too long; max key length is 767 bytes  
  8. com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was too long; max key length is 767 bytes  
  9.         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)  
  10.         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)  
  11.         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)  

这是由于hive meta 数据库的字符集问题造成的,可以将hive meta 数据库的字符集设置成 latin1

http://hao3721.iteye.com/blog/1522392


每个slaver节点都需要安装配置hive、spark、shark , 否则会出问题。

[html]  view plain copy print ?
  1. org.apache.spark.SparkException: Job aborted: Task 1.0:0 failed 4 times (most recent failure: Exception failure: java.lang.RuntimeException: readObject can't find class org.apache.hadoop.hive.conf.HiveConf)  
  2.         at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)  
  3.         at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)  
  4.         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)  
  5.         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)  
  6.         at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)  
  7.         at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)  
  8.         at org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)  
  9.         at scala.Option.foreach(Option.scala:236)  
  10.         at org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)  
  11.         at org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)  
  12.         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)  
  13.         at akka.actor.ActorCell.invoke(ActorCell.scala:456)  
  14.         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)  
  15.         at akka.dispatch.Mailbox.run(Mailbox.scala:219)  
  16.         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)  
  17.         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)  
  18.         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)  
  19.         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)  
  20.         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)  

这就是因为没有在slave节点安装hive 造成的


简单测试:

>CREATE TABLE src(key INT, value STRING);

>LOAD DATA LOCAL INPATH'${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src;

>SELECT COUNT(1) FROM src;   

>CREATE TABLE src_cached AS SELECT * FROM SRC;

>SELECT COUNT(1) FROM src_cached;

shark 用户手册: https://github.com/amplab/shark/wiki/Shark-User-Guide


在shark启动的时候,你会发现他向yarn提交的application的设置跟你用手动提交spark任务的参数是类似的,比如说 --worker-memory ;这个很重要,因为我们想控制spark on yarn 所占用的资源数,但是我们如何设置这些参数呢?

经过一番查找,追踪代码,后来找到解决办法:http://spark.apache.org/docs/latest/configuration.html#environment-variables

将这些配置配到 shark-env.sh 中就可以了

你可能感兴趣的:(shark)