ubuntu 16.04 搭建完全分布式之:HIVE搭建

对于hadoop集群来说,任何一个服务器按我理解都是可以弄hive的,反正hive就是个关系数据库,应该都是可以的
反正我在namenode机器上面弄的

哎……昨天写了好多,然后我以为相同的提交页面也是可以用的,结果我就把HIVE的那个页面提交了一下FLUME。。结果就TMD覆盖啊,我悔恨啊!

我就大概记录一下HIVE的搭建过程,然后记录一下坑有哪些吧

hadoop 2.7.7

介绍

HIVE和HBASE不同,HIVE是一个基于hadoop的数据仓库工具,他可以将结构化的数据映射为一张数据库表,并且提供完整的SQL查询功能,可以将SQL语句转换为MAPREDUCE任务进行运行

优势

可以直接通过类SQL语句快速实现mapreduce统计,不用开发专门的mapreduce应用

安装

安装mysql

sudo apt install mysql-server
sudo mysql_secure_installation
sudo mysql -uroot -p 进去看看

安装mysql连接器

查询对应hadoop版本使用的mysql连接器,最好不要使用apt安装自带的,我装了一下,然后添加了链接,并没有卵用
我用的是5.1.47,注意不要去安装connector8了,会出错
然后把connector jar文件放入到/usr/app/HIVE/lib里面

安装HIVE

设置好HIVE环境变量到profile里面,不放也行
设置conf/hive-env.sh

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI etc.) is available via the environment
# variable SERVICE


# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in 
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
#   if [ -z "$DEBUG" ]; then
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
#   else
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
#   fi
# fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions. 
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be 
# appropriate for hive server.


# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop

# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=

export JAVA_HOME=/usr/java/jdk1.8.0_221
export HADOOP_HOME=/usr/local/hadoop
export HIVE_HOME=/usr/app/hive

设置conf/hive-site.xml




  
    javax.jdo.option.ConnectionURL
    jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
    JDBC connect string for a JDBC metastore
  
  
    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
    Driver class name for a JDBC metastore
  
  
    javax.jdo.option.ConnectionUserName
    root
    username to use against metastore database
  
  
    javax.jdo.option.ConnectionPassword
    sl159753
    password to use against metastore database
  

  
    datanucleus.autoCreateSchema
    true
    password to use against metastore database
  
  
    hive.server2.thrift.sasl.qop
    auth
    password to use against metastore database
  
  
    hive.metastore.schema.verification
    false
    password to use against metastore database
  


注意点

我遇到的问题是链接不上mysql,排除两个问题:

  1. java connector版本选不对,选好了之后还是不行
  2. mysql默认只对localhost开启,所以如果是本机连接的话,hive-site.xml里面地址修改成localhost,或者把mysql对外开放端口权限打开就ok了
  3. 没有第三了

测试

/bin/./hive

create table test(id int, name string);
show tables;

这两个命令执行OK了,才说明mysql->connector->hive的链接已经打通了

数据倾斜问题

这个问题讲了就大了,有空拿个实例来扩展一下,挖个坑先

你可能感兴趣的:(hadoop)