spark sql thrift server搭建及踩过的坑

如何配置

  1. 配置hadoop和yarn
  2. 配置HADOOP_CONF_DIR
  3. copy hive-site.xml到 spark_home/conf
  4. 在spark_env.sh中配置mysql的路径

如何启动

./start-thriftserver.sh \
–name olap.thriftserver \
–master yarn-client \
–queue bi \
–conf spark.driver.memory=3G \
–conf spark.shuffle.service.enabled=true \
–conf spark.dynamicAllocation.enabled=true \
–conf spark.dynamicAllocation.minExecutors=1 \
–conf spark.dynamicAllocation.maxExecutors=30 \
–conf spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=5s \
–jars /usr/local/spark-1.6.1-bin-hadoop2.6/lib/*

如何使用

  1. shell进入beeline
  2. !connect jdbc:hive2://storage8.test.lan:10000
  3. 输入hive-site.xml中的javax.jdo.option.ConnectionUserNamejavax.jdo.option.ConnextionPassWord

hive的hbase外部表问题

很多文档中都没提到一个jar,这下坑惨了,大部分文档提到了以下jar:

  1. hbase-client-0.98.18-hadoop2.jar
  2. hbase-common-0.98.18-hadoop2.jar
  3. hbase-server-0.98.18-hadoop2.jar
  4. hbase-protocol-0.98.18-hadoop2.jar
  5. protobuf-java-2.5.0.jar
  6. guava-12.0.1.jar
  7. /hive-hbase-handler-1.2.1.jar

但是hbase坑爹的在35次retry之后抛出了一个ClassNotFind,时间长达7分钟我也是醉了,后面根据报错加了一个监控的jar,正常运行,最后给出我的配置的spark-env,sh:
spark sql thrift server搭建及踩过的坑_第1张图片

export SPARK_CLASSPATH=$SPARK_HOME/lib/mysql-connector-java-5.1.38.jar:$SPARK_HOME/lib/hbase-server-0.98.18-hadoop2.jar:$SPARK_HOME/lib/hbase-common-0.98.18-hadoop2.jar:$SPARK_HOME/lib/hbase-client-0.98.18-hadoop2.jar:$SPARK_HOME/lib/hbase-protocol-0.98.18-hadoop2.jar:$SPARK_HOME/lib/htrace-core-2.04.jar:$SPARK_HOME/lib/protobuf-java-2.5.0.jar:$SPARK_HOME/lib/guava-12.0.1.jar:$SPARK_HOME/lib/hive-hbase-handler-1.2.1-xima.jar:$SPARK_HOME/lib/com.yammer.metrics.metrics-core-2.2.0.jar:${SPARK_CLASSPATH}

另一个hive启动的问题

我们会发现hive每次启动最终都用derby数据库,网上看hive.metastore.schema.verification in hive-site.xml改为false,取消check schema,参考资料,但我们不想改配置,后来 跟了下源码貌似少初始化了一个类,所以我们现在先保证当前ClassLoad里面有这个类:

package com.ximalaya.spark.thriftserver

import org.apache.hive.service.server.HiveServer2

/**
  * @author todd.chen at 7/23/16 8:49 PM.
  *         email : todd.chen@ximalaya.com
  */
object XimaThriftServer {
  def main(args: Array[String]): Unit = {
    Class.forName("org.apache.hadoop.hive.ql.metadata.Hive")
    HiveServer2.main(args)

  }

}

然后改一下start-thriftserver.sh:

#!/usr/bin/env bash

#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#
# Shell script for starting the Spark SQL Thrift server

# Enter posix mode for bash
set -o posix

if [ -z "${SPARK_HOME}" ]; then
  export SPARK_HOME="$(cd "`dirname "$0"`"/..; pwd)"
fi

# NOTE: This exact class name is matched downstream by SparkSubmit.
# Any changes need to be reflected there.
CLASS="com.ximalaya.spark.thriftserver.XimaSparkThriftServer"

function usage {
  echo "Usage: ./sbin/start-thriftserver [options] [thrift server options]"
  pattern="usage"
  pattern+="\|Spark assembly has been built with Hive"
  pattern+="\|NOTE: SPARK_PREPEND_CLASSES is set"
  pattern+="\|Spark Command: "
  pattern+="\|======="
  pattern+="\|--help"

  "${SPARK_HOME}"/bin/spark-submit --help 2>&1 | grep -v Usage 1>&2
  echo
  echo "Thrift server options:"
  "${SPARK_HOME}"/bin/spark-class $CLASS --help 2>&1 | grep -v "$pattern" 1>&2
}

if [[ "$@" = *--help ]] || [[ "$@" = *-h ]]; then
  usage
  exit 0
fi

export SUBMIT_USAGE_FUNCTION=usage

exec "${SPARK_HOME}"/sbin/spark-daemon.sh submit $CLASS 1 "$@"

好了,正常使用了

你可能感兴趣的:(spark,hive)