一 安装环境:
组件的版本:
hadoop :2.3.0
spark :0.9.0
shark:0.9.1-hadoop2
hive :0.11.0
jdk :orcal HotSpot 1.7.0_55 (ps:特别注意 jdk1.6 报错提示版本不兼容)
系统:Ubuntu 12.04 x86
二 安装过程:
1)hadoop 和spark hive 的安装配置略过。。
2)shark 安装:
yarn.application.classpath
$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*,$SHARK_HIVE/hive-hbase-handler/hive-hbase-handler-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-anttasks/hive-anttasks-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-service/hive-service-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-serde/hive-serde-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-metastore/hive-metastore-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-hwi/hive-hwi-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-exec/hive-exec-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-beeline/hive-beeline-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-shims/hive-shims-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-common/hive-common-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-cli/hive-cli-0.11.0-shark-0.9.1.jar,$SHARK_HIVE/hive-jdbc/hive-jdbc-0.11.0-shark-0.9.1.jar
export SHARK_HIVE=$SHARK_HOME/lib_managed/jars/edu.berkeley.cs.shark
2 shark 安装包里面的 /lib_managed/jars/edu.berkeley.cs.shark/hive-exec/hive-exec-0.11.0-shark-0.9.1.jar 里面包含的protobuf 会和已有的protobuf*.jar 版本不兼容,所以要手动把hive-exec-0.11.0-shark-0.9.1.jar中com/google/protobuf 下的内容全部删掉,我是直接在windows下用rar工具删除的,也可以使用jar命令解压然后重新打包。
if [ -f "$SPARK_JAR" ] ; then
SPARK_CLASSPATH+=":$SPARK_JAR"
fi
export SPARK_MEM=1g
export JAVA_HOME=/opt/jdk1.6.0_45
# (Required) Set the master program's memory
export SHARK_MASTER_MEM=1g
# (Optional) Specify the location of Hive's configuration directory. By default,
# Shark run scripts will point it to $SHARK_HOME/conf
export HIVE_CONF_DIR="$INSTALL_HOME/hive-0.11.0-bin/conf"
# For running Shark in distributed mode, set the following:
export HADOOP_HOME="$INSTALL_HOME/hadoop-2.2.0"
export HIVE_HOME="$INSTALL_HOME/hive-0.11.0-bin"
export SPARK_HOME="$INSTALL_HOME/spark-0.9.1-bin-hadoop2"
export MASTER="spark://xxxxx:7077"
# Only required if using Mesos:
#export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
# Only required if run shark with spark on yarn
export SHARK_EXEC_MODE=yarn
export SPARK_ASSEMBLY_JAR="$INSTALL_HOME/spark-0.9.1-bin-hadoop2/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar"
export SHARK_ASSEMBLY_JAR="$INSTALL_HOME/shark-0.9.1-bin-hadoop2/target/scala-2.10/shark_2.10-0.9.1.jar"
# (Optional) Extra classpath
#export SPARK_LIBRARY_PATH=""
# Java options
# On EC2, change the local.dir to /mnt/tmp
SPARK_JAVA_OPTS=" -Dspark.local.dir=/tmp "
SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 "
SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps "
export SPARK_JAVA_OPTS
# (Optional) Tachyon Related Configuration
#export TACHYON_MASTER="" # e.g. "localhost:19998"
#export TACHYON_WAREHOUSE_PATH=/sharktables # Could be any valid path name
shark> select count(1) from test.item_basic_info where 1=1;
23.959: [GC 272587K->25855K(1005312K), 0.0207550 secs]
OK
39793
Time taken: 6.307 seconds
shark> select count(1) from test.item_basic_info where 1=1;
OK
39793
Time taken: 1.22 seconds
好像还挺快的。