shark 0.9.1 整理安装笔记

一 安装环境:

组件的版本:

    hadoop :2.3.0 

    spark :0.9.0

    shark:0.9.1-hadoop2

    hive :0.11.0

    jdk :orcal HotSpot 1.7.0_55  (ps:特别注意 jdk1.6 报错提示版本不兼容

系统:Ubuntu 12.04  x86

二 安装过程:

1)hadoop 和spark  hive 的安装配置略过。。

2)shark 安装:

  •  下载
      参考链接:https://github.com/amplab/shark/releases
      下载实在太慢了 我传了一个到网盘 日期是:2014-4-22 ,下载的话注意是不是最新的
      地址:http://pan.baidu.com/s/1sjlS6rN
  
  • 关于JDK 特别注意 jdk1.6 报错提示版本不兼容
    
  • 配置:
        1  配置hadoop中yarn_site.xml 添加如下内容:
 
  
      yarn.application.classpath
      
          $HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*,$SHARK_HIVE/hive-hbase-handler/hive-hbase-handler-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-anttasks/hive-anttasks-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-service/hive-service-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-serde/hive-serde-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-metastore/hive-metastore-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-hwi/hive-hwi-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-exec/hive-exec-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-beeline/hive-beeline-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-shims/hive-shims-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-common/hive-common-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-cli/hive-cli-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-jdbc/hive-jdbc-0.11.0-shark-0.9.1.jar
	 
  
        
               其中: $SHARK_HIVE的是添加shark里面的自带的hive包,其他的是拷贝的yarn-default.xml里面的
         yarn-env.sh 添加如下内容:
export SHARK_HIVE=$SHARK_HOME/lib_managed/jars/edu.berkeley.cs.shark
       2     shark 安装包里面的 /lib_managed/jars/edu.berkeley.cs.shark/hive-exec/hive-exec-0.11.0-shark-0.9.1.jar 里面包含的protobuf 会和已有的protobuf*.jar 版本不兼容,所以要手动把hive-exec-0.11.0-shark-0.9.1.jar中com/google/protobuf 下的内容全部删掉,我是直接在windows下用rar工具删除的,也可以使用jar命令解压然后重新打包。
     
  3  修改$SHARK_HOME/run  把shark的spark-assembly.jar 加到classpath里面
       
if [ -f "$SPARK_JAR" ] ; then
      SPARK_CLASSPATH+=":$SPARK_JAR"
 fi

    

 
      4  修改$SHARK_HOME/conf/shark-env.sh
        
export SPARK_MEM=1g
export JAVA_HOME=/opt/jdk1.6.0_45
# (Required) Set the master program's memory
export SHARK_MASTER_MEM=1g


# (Optional) Specify the location of Hive's configuration directory. By default,
# Shark run scripts will point it to $SHARK_HOME/conf
export HIVE_CONF_DIR="$INSTALL_HOME/hive-0.11.0-bin/conf"


# For running Shark in distributed mode, set the following:
export HADOOP_HOME="$INSTALL_HOME/hadoop-2.2.0"
export HIVE_HOME="$INSTALL_HOME/hive-0.11.0-bin"
export SPARK_HOME="$INSTALL_HOME/spark-0.9.1-bin-hadoop2"
export MASTER="spark://xxxxx:7077"
# Only required if using Mesos:
#export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so


# Only required if run shark with spark on yarn
export SHARK_EXEC_MODE=yarn
export SPARK_ASSEMBLY_JAR="$INSTALL_HOME/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar"
export SHARK_ASSEMBLY_JAR="$INSTALL_HOME/shark-0.9.1-bin-hadoop2/target/scala-2.10/shark_2.10-0.9.1.jar"


# (Optional) Extra classpath
#export SPARK_LIBRARY_PATH=""


# Java options
# On EC2, change the local.dir to /mnt/tmp
SPARK_JAVA_OPTS=" -Dspark.local.dir=/tmp "
SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 "
SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps "
export SPARK_JAVA_OPTS


# (Optional) Tachyon Related Configuration
#export TACHYON_MASTER=""                     # e.g. "localhost:19998"
#export TACHYON_WAREHOUSE_PATH=/sharktables   # Could be any valid path name



      5 把hive 和shark 分发到每一个slaves下面相同的目录
      

      6 另外hive升级的时候我是直接把旧版的的conf文件拿过来了提示错误:
       ClassNotFoundException org.apache.hadoop.log.metrics.EventCounter 
       后来直接把11带的log4j拿过来改了一下就能用了

三  使用shark 运行查询
  ./bin/shark-withinfo   
  ./bin/shark-withdebug 
  上面两个命令就是设置hive的日志输出级别不一样
 
shark> select count(1) from test.item_basic_info where 1=1;
23.959: [GC 272587K->25855K(1005312K), 0.0207550 secs]
OK
39793
Time taken: 6.307 seconds
shark> select count(1) from test.item_basic_info where 1=1;
OK
39793
Time taken: 1.22 seconds
好像还挺快的。

四  参考资料:
https://github.com/amplab/shark/releases
官网安装介绍:
https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster
问题解决: 
http://blog.csdn.net/baiyangfu_love/article/details/23769147
 
   


 



你可能感兴趣的:(shark)