转载请注明出处:https://blog.csdn.net/l1028386804/article/details/88014099
Hive 是基于 Hadoop 的一个数据仓库,可以将结构化的数据文件映射为一张表,并提供类 sql 查询功能,Hive 底层将 sql 语句转化为 MapReduce 任务运行。
下载 Hive2.3.4 到 maste r的 /home/dc2-user 并解压
wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz
tar zxvf apache-hive-2.3.4-bin.tar.gz
编辑 /etc/profile 文件, 在其中添加以下内容。
sudo vi /etc/profile
export HIVE_HOME=/home/dc2-user/apache-hive-2.3.4-bin
export PATH=$PATH:$HIVE_HOME/bin
使环境变量生效:
source /etc/profile
重命名以下配置文件:
cd apache-hive-2.3.4-bin/conf/
cp hive-env.sh.template hive-env.sh
cp hive-default.xml.template hive-site.xml
cp hive-log4j2.properties.template hive-log4j2.properties
cp hive-exec-log4j2.properties.template hive-exec-log4j2.properties
修改 hive-env.sh:
export JAVA_HOME=/home/dc2-user/java/jdk1.8.0_191 ##Java路径
export HADOOP_HOME=/home/dc2-user/hadoop-2.7.7 ##Hadoop安装路径
export HIVE_HOME=/home/dc2-user/apache-hive-2.3.4-bin ##Hive安装路径
export HIVE_CONF_DIR=$HIVE_HOME/conf ##Hive配置文件路径
修改对应属性的 value 值
vi hive-site.xml
hive.exec.scratchdir
/tmp/hive-${user.name}
HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/ is created, with ${hive.scratch.dir.permission}.
hive.exec.local.scratchdir
/tmp/${user.name}
Local scratch space for Hive jobs
hive.downloaded.resources.dir
/tmp/hive/resources
Temporary local directory for added resources in the remote file system.
hive.querylog.location
/tmp/${user.name}
Location of Hive run time structured log file
hive.server2.logging.operation.log.location
/tmp/${user.name}/operation_logs
Top level directory where operation logs are stored if logging functionality is enabled
Hive Metastore 是用来获取 Hive 表和分区的元数据,本例中使用 mariadb 来存储此类元数据。
将 mysql-connector-java-5.1.40-bin.jar 放入 $HIVE_HOME/lib 下并在 hive-site.xml 中配置 MySQL 数据库连接信息。
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
hive
javax.jdo.option.ConnectionPassword
hive
start-dfs.s #如果在安装配置hadoop是已经启动,则此命令可省略
hdfs dfs -mkdir /tmp
hdfs dfs -mkdir -p /usr/hive/warehouse
hdfs dfs -chmod g+w /tmp
hdfs dfs -chmod g+w /usr/hive/warehouse
本例中使用的是 mariadb。
sudo yum install -y mariadb-server
sudo systemctl start mariadb
登录 mysql,初始无密码,创建 Hve 用户并设置密码。
mysql -uroot
MariaDB [(none)]> create user'hive'@'localhost' identified by 'hive';
Query OK, 0 rows affected (0.00 sec)
MariaDB [(none)]> grant all privileges on *.* to hive@localhost identified by 'hive';
Query OK, 0 rows affected (0.00 sec)
运行 Hive 之前必须保证 HDFS 已经启动,可以使用 start-dfs.sh 来启动,如果之前安装 Hadoop 是已启动,次步骤可略过。
从 Hive 2.1 版本开始, 在启动 Hive 之前需运行 schematool 命令来执行初始化操作:
schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/dc2-user/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/dc2-user/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hive
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed schemaTool completed
启动 Hive,输入命令 Hive
hive
which: no hbase in (/home/dc2-user/java/jdk1.8.0_191/bin:/home/dc2-user/hadoop-2.7.7/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/local/bin:/home/dc2-user/apache-hive-2.3.4-bin/bin:/home/dc2-user/.local/bin:/home/dc2-user/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/dc2-user/apache-hive-2.3.4-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/dc2-user/hadoop-2.7.7/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in file:/home/dc2-user/apache-hive-2.3.4-bin/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>
在 Hive中创建一个表:
hive> create table test_hive(id int, name string)
> row format delimited fields terminated by '\t' #字段之间用tab键进行分割
> stored as textfile; # 设置加载数据的数据类型,默认是TEXTFILE,如果文件数据是纯文本,就是使用 [STORED AS TEXTFILE],然后从本地直接拷贝到HDFS上,hive直接可以识别数据
OK
Time taken: 10.857 seconds
hive> show tables;
OK test_hive
Time taken: 0.396 seconds, Fetched: 1 row(s)
可以看到表已经创建成功,输入 quit ; 退出 Hive,接下来以文本形式创建数据:
vi test_tb.txt
101 aa
102 bb
103 cc
进入 Hive,导入数据:
hive> load data local inpath '/home/dc2-user/test_db.txt' into table test_hive;
Loading data to table default.test_hive
OK
Time taken: 6.679 seconds
hive> select * from test_hive;
101 aa
102 bb
103 cc
Time taken: 2.814 seconds, Fetched: 3 row(s)
可以看到数据插入成功并且可以正常查询。