Running Shark Locally 及可能出现的问题

Shark本地安装  

1.下载scala  
wget  http://www.scala-lang.org/files/archive/scala-2.9.3.tgz  
最新有2.10.2.tgz文件 
tar xvfz scala-2.9.3.tgz 

2.下载shark and hive压缩包  
wget  http://spark-project.org/download/shark-0.7.0-hadoop1-bin.tgz  (cdh3) 
tar xvfz shark-0.7.0-*-bin.tgz 

3. 配置环境变量  
cd shark-0.7.0/conf 
cp shark-env.sh.template shark-env.sh 
vi shark-env.sh 
export HIVE_HOME=/path/to/hive-0.9.0-bin 
export SCALA_HOME=/path/to/scala-2.9.3 

4.测试数据  
CREATE TABLE src(key INT, value STRING); 
LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src; 
SELECT COUNT(1) FROM src; 
OK 
500 
Time taken: 2.149 seconds 
没有了hive中的mr,速度快了不少 
CREATE TABLE src_cached AS SELECT * FROM SRC; 
SELECT COUNT(1) FROM src_cached; 

安装过程中可能出现的问题及解决  

1.CREATE TABLE src(key INT, value STRING);  
FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RPC$VersionMismatch Protocol org.apache.hadoop.hdfs.protocol.ClientProtocol version 

mismatch. (client = 61, server = 63)) 
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask 

ERROR exec.Task: FAILED: Error in metadata: MetaException(message:Got exception: org.apache.hadoop.ipc.RPC$VersionMismatch Protocol 

org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)) 
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: org.apache.hadoop.ipc.RPC$VersionMismatch Protocol 

org.apache.hadoop.hdfs.protocol.ClientProtocol version mismatch. (client = 61, server = 63)) 
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:544) 
        at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3313) 
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242) 
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134) 
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) 
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1312) 
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1104) 
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937) 
        at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:288) 
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) 
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) 
        at shark.SharkCliDriver$.main(SharkCliDriver.scala:203) 
        at shark.SharkCliDriver.main(SharkCliDriver.scala) 
reason :Hadoop版本与SHARK的Hadoop core jar包版本不一致引起的。 
解决 :将${HADOOP_HOME}/hadoop-core-*.jar copy 到${SHARK_HOME}/lib_managed/jars/org.apache.hadoop/hadoop-core/目录下面,rm原来的hadoop-core-*.jar 
重新进入Shark 

2.出现java.lang.NoClassDefFoundError  
/app/hadoop/shark/shark-0.7.0/lib_managed/jars/org.apache.hadoop/hadoop-core/ 
java.lang.NoClassDefFoundError: org/apache/hadoop/thirdparty/guava/common/collect/LinkedListMultimap 
        at org.apache.hadoop.hdfs.SocketCache.<init>(SocketCache.java:48) 
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:253) 
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:220) 
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89) 
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1611) 
        at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:68) 
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1645) 
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1627) 
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254) 
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123) 
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:238) 
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:183) 
        at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:104) 
        at org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:136) 
        at org.apache.hadoop.hive.metastore.Warehouse.getWhRoot(Warehouse.java:151) 
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getDefaultDatabasePath(HiveMetaStore.java:475)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB_core(HiveMetaStore.java:353)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:371) 
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:278) 
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:248) 
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:114) 
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) 
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) 
        at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:538) 
        at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3313) 
        at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242) 
        at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134) 
        at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) 
        at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1312) 
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1104) 
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:937) 
        at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:288) 
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) 
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) 
        at shark.SharkCliDriver$.main(SharkCliDriver.scala:203) 
        at shark.SharkCliDriver.main(SharkCliDriver.scala) 
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.thirdparty.guava.common.collect.LinkedListMultimap 
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202) 
        at java.security.AccessController.doPrivileged(Native Method) 
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190) 
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307) 
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) 
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248) 
        ... 36 more 
reason :CDH版本的缺少一个第三方包guava-*.jar 
解决 :建一个目录${SHARK_HOME}/lib_managed/jars/org.apache.hadoop/thirdparty,拷贝${HADOOP_HOME}/lib/guava-r09-jarjar.jar到这个目录 
重新进入Shark 

3.show tables出现问题  
Failed with exception java.io.IOException:java.io.IOException: Cannot create an instance of InputFormat class org.apache.hadoop.mapred.TextInputFormat as specified in 

mapredWork! 
reason :缺少hadoop-lzo-*.jar引起的 
解决 :建一个目录${SHARK_HOME}/lib_managed/jars/org.apache.hadoop/lib, 拷贝${HADOOP_HOME}/lib/hadoop-lzo-*.jar到这个目录 
重新进入Shark 

4.SELECT count(1) FROM src_cached出现问题  
spark.SparkException: Job failed: ShuffleMapTask(6, 0) failed: ExceptionFailure(java.lang.NoSuchMethodError: sun.misc.Unsafe.copyMemory(Ljava/lang/Object;JLjava/lang/Object;JJ) 
V)at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642) 
        at spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640) 
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60) 
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) 
        at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640) 
        at spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:601) 
        at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:300) 
        at spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364) 
        at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107) 
FAILED: Execution Error, return code -101 from shark.execution.SparkTask 

reason :java1.6版本低,需要安装jdk7. 
解决 :安装jdk7, JAVA_HOME指向新的JDK7,问题解决 
tar xvfz jdk-7u25-linux-x64.tar.gz -C /usr/java/ 
export JAVA_HOME=/usr/java/jdk1.7.0_25 
export CLASSPATH=/usr/java/jdk1.7.0_25/lib 
重新进入Shark 

你可能感兴趣的:(spark,大数据,shark)