Hadoop集成Hive

一、环境与软件准备

说明:服务器已用主机名代替,可根据自己的需求,改为IP地址

环境

服务器 组件
master NameNode、DataNode、Nodemanager、ResourceManager、Hive、Hive的metastore、Hive的hiveserver2、mysql
Secondary SecondaryNameNode、DataNode、NodeManager
Datanode DataNode、NodeManager、Hive的beeline访问方式

1、java版本1.8

下载地址:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
linux$:] cd /soft
linux$:] tar -zxvf  jdk-8u321-linux-x64.tar.gz
linux$:] cp -r  jdk1.8.0_321  /usr/bin/jdk

linux$:] vi /etc/profile

export JAVA_HOME=/usr/bin/jdk    # jdk1.8.0_311为解压缩的目录名称
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib

linux$:] source /etc/profile

2、Rsync CentOS中默认存在

3、zstd、openssl、autoconf、automake、libtool、ca-certificates安装

linux$:] yum -y install zstd,yum -y install openssl-devel autoconf automake libtool ca-certificates

4、ISA-L

下载地址:https://github.com/intel/isa-l
linux$:] cd /soft
linux$:] unzip  master.zip
linux$:] cd  master
linux$:] ./autogen.sh
linux$:] ./configure
linux$:] make && make install && make -f Makefile.unx
其它操作,可省略(后面有解释)
make check : create and run tests
make tests : create additional unit tests
make perfs : create included performance tests
make ex : build examples
make other : build other utilities such as compression file tests
make doc : build API manual

5、nasm与yasm

yasm组件
linux$:] curl -O -L http://www.tortall.net/projects/yasm/releases/yasm-1.3.0.tar.gz
linux$:] tar -zxvf yasm-1.3.0.tar.gz
linux$:] cd yasm
linux$:] ./configure;make -j 8;make install
nasm组件
linux$:] wget  http://www.nasm.us/pub/nasm/releasebuilds/2.14.02/nasm-2.14.02.tar.xz
linux$:] cd nasm
linux$:] tar xf nasm-2.14.02.tar.xz
linux$:] ./configure;make -j 8;make install

6、ssh

linux$:] ssh-keygen -t rsa
所有主机之间互通后,本机与本机间也需要进行
linux$:] ssh-copy-id -i ~/.ssh/id_rsa.pub root@IP

7、hadoop

官网地址:https://hadoop.apache.org/
【Getting started】=>【Download】=>【Apache Download Mirrors】=>【HTTP】
linux$:] cd /soft
linux$:] wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1.tar.gz
linux$:] tar -zxvf hadoop-3.3.1.tar.gz
linux$:] mv hadoop-3.3.1 hadoop

8、Linux环境变量配置

linux$:] vi /etc/hosts
IP地址 Master
IP地址 Secondary
IP地址 Datanode

linux$:] vi /etc/profile
export JAVA_HOME=/usr/bin/jdk
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib
export HADOOP_HOME=/soft/hadoop  #配置Hadoop安装路径
export PATH=$HADOOP_HOME/bin:$PATH  #配置Hadoop的hdfs命令路径
export PATH=$HADOOP_HOME/sbin:$PATH  #配置Hadoop的命令路径
export HIVE_HOME=/soft/hive
export PATH=$PATH:$HIVE_HOME/bin
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root

linux$:] source /etc/profile

9、hadoop的各类文件配置

配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/bin/jdk

配置文件信息【可一条命令启动以下全部机器start-all.sh/stop-all.sh】
linux$:] vi /soft/hadoop/etc/hadoop/workers
Master
Secondary
Datanode

配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/core-site.xml

<!-- hdfs访问地址 -->
    
        fs.defaultFS</name>
        hdfs://Master:9000</value>
    </property>
<!-- hadoop运行时临时文件存储路径 -->
    
         hadoop.tmp.dir</name>
         /hadoop/tmp</value>
    </property>
<!-- hadoop验证 -->
    
         hadoop.security.authorization</name>
         false</value>
    </property>
<!-- hadoop代理用户,主机用户是root,可自定义 -->
    
        hadoop.proxyuser.root.hosts</name>
        *</value>
    </property>
<!-- hadoop代理用户组,主机用户组是root,可自定义 -->
    
        hadoop.proxyuser.root.groups</name>
        *</value>
    </property>
</configuration>

配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/hdfs-site.xml


<!-- namenode Linux本地信息存储路径 -->
   
     dfs.namenode.name.dir</name>
     /hadoop/namenodedata</value>
   </property>
<!-- 定义块大小 -->
   
     dfs.blocksize</name>
     256M</value>
   </property>
<!-- namenode能处理的来之datanode 节点的Threads -->
   
     dfs.namenode.handler.count</name>
     100</value>
   </property>
<!-- datanode Linux 本地存储路径 -->
   
     dfs.datanode.data.dir</name>
     /hadoop/datanodedata</value>
   </property>
   
     dfs.replication</name>
     3</value>
   </property>
<!-- hdfs启动时,不启动的机器 -->
   
     dfs.hosts.exclude</name>
     /soft/hadoop/etc/hadoop/workers.exclude</value>
   </property>
<!-- 指定Secondary服务器,不指定则默认有NodeName同一主机 -->
   
     dfs.secondary.http.address</name>
     econdary:50070</value>
   </property>
<!-- hdfs权限验证 -->
   
     dfs.permissions</name>
     false</value>
   </property>
</configuration>

配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/mapred-site.xml


  
    mapreduce.framework.name</name>
    yarn</value>
  </property>
  
    mapreduce.map.memory.mb</name>
    125</value>
  </property>
  
    mapreduce.map.java.opts</name>
    -Xmx512M</value>
  </property>
  
    mapreduce.reduce.memory.mb</name>
    512</value>
  </property>
  
    mapreduce.reduce.java.opts</name>
    -Xmx512M</value>
  </property>
  
    mapreduce.task.io.sort.mb</name>
    125</value>
  </property>
  
    mapreduce.task.io.sort.factor</name>
    100</value>
  </property>
  
    mapreduce.reduce.shuffle.parallelcopies</name>
    50</value>
  </property>
  
    mapreduce.jobhistory.address</name>
    Master:10020</value>
  </property>
  
    mapreduce.jobhistory.webapp.address</name>
    Master:19888</value>
  </property>
  
    mapreduce.jobhistory.intermediate-done-dir</name>
    /hadoop/hislog</value>
  </property>
    
    mapreduce.jobhistory.done-dir</name>
    /hadoop/hisloging</value>
  </property>

配置文件信息
linux$:] vi /soft/hadoop/etc/hadoop/yarn-site.xml


  
    yarn.acl.enable</name>
    false</value>
  </property>
  
    yarn.admin.acl</name>
    *</value>
  </property>
  
    yarn.log-aggregation-enable</name>
    true</value>
  </property>
  
    yarn.resourcemanager.address</name>
    Master:8032</value>
  </property>
  
    yarn.resourcemanager.scheduler.address</name>
    Master:8030</value>
  </property>
  
    yarn.resourcemanager.resource-tracker.address</name>
    Master:8031</value>
  </property>
  
    yarn.resourcemanager.admin.address</name>
    Master:8033</value>
  </property>
  
    yarn.resourcemanager.webapp.address</name>
    Master:8088</value>
  </property>
  
    yarn.resourcemanager.hostname</name>
    Master</value>
  </property>
  
    yarn.resourcemanager.scheduler.class</name>
    org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
  </property>
  
    yarn.scheduler.minimum-allocation-mb</name>
    4</value>
  </property>
  
    yarn.scheduler.maxmum-allocation-mb</name>
    125</value>
  </property>
  
    yarn.nodemanager.resource.memory-mb</name>
    2048</value>
  </property>
  
    yarn.nodemanager.vmem-pmem-ratio</name>
    2.1</value>
  </property>
  
    yarn.nodemanager.local-dirs</name>
    /hadoop/temppackage</value>
  </property>
  
    yarn.nodemanager.aux-services</name>
    mapreduce_shuffle</value>
  </property>
  
    yarn.nodemanager.env-whitelist</name>  JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
  </property>
  
    yarn.log-aggregation.retain-seconds</name>
    -1</value>
  </property>
  
    yarn.log-aggregation.retian-check-interval-seconds</name>
     -1 </value>
  </property>
  
    yarn.resourcemanage.node.exclude-path</name>
    /soft/hadoop/etc/hadoop/workers.exclude</value>
  </property>
</configuration>

二、启动hadoop集群

$HADOOP_HOME/bin/hdfs namenode -format
start-all.sh
$HADOOP_HOME/bin/yarn --daemon start proxyserver
$HADOOP_HOME/bin/mapred --daemon start historyserver
四、webapp访问
hdfs
http://Master:9870/
yarn_node
http://Master:8088/

三、Hive的安装

1、Mysql的安装

linux$:] touch /etc/yum.repos.d/mysql.repo
linux$:] cat >/etc/yum.repos.d/mysql.repo  <[mysql57-community]
name=MySQL 5.7 Community Server
baseurl=https://mirrors.cloud.tencent.com/mysql/yum/mysql-5.7-community-el7-x86_64/
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-mysql
EOF
linux$:] yum clean all
linux$:] yum makecache
linux$:] yum -y install mysql-community-server
linux$:] systemctl start mysqld
linux$:] systemctl enable mysqld
linux$:] grep "temporary password is generated" /var/log/mysqld.log
linux$:] mysql -uroot -p
Mysql 5.7.6以后的版本用下面的命令进行账号密码初始化
  SQL>ALTER USER USER() IDENTIFIED BY 'Twcx@2023';
  SQL>FLUSH PRIVILEGES;
linux$:] systemctl restart mysqld
linux$:] ystemctl enable mysqld

2、Hive安装

linux$:] cd /soft
linux$:] wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
linux$:] tar -zxvf apache-hive-3.1.3-bin.tar.gz
linux$:] mv apache-hive-3.1.3-bin hive
linux$:] cd /soft/hive/conf
linux$:] mv hive-env.sh.template  hive-env.sh
linux$:] echo '' > hive-env.sh
linux$:] mv hive-default.xml.template  hive-site.xml
linux$:] echo '' > hive-site.xml

解决hadoop与hive包之间jar冲突的问题
linux$:] cd /soft/hive/lib
linux$:] rm -rf guava-19.0.jar
linux$:] cp /soft/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./

解决Mysql 关联,依赖包
mysql驱动下载地址
https://dev.mysql.com/downloads/connector/j/
mysql 8.0驱动下载地址
linux$:] wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-java-8.0.11.tar.gz
linux$:] tar -zxvf mysql-connector-java-8.0.11.tar.gz
linux$:] cd mysql-connector-java-8.0.11
linux$:] cp mysql-connector-java-8.0.11.jar  /soft/hive/lib/

mysql 5.7驱动下载地址[当前用的此驱动]
linux$:] wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/6.0.6/mysql-connector-java-6.0.6.jar
linux$:] cp mysql-connector-java-6.0.6.jar  /soft/hive/lib/

3、Hive配置

配置文件
linux$:] vi /soft/hive/conf/hive-env.sh
export HADOOP_HOME=/soft/hadoop
export HIVE_CONF_DIR=/soft/hive/conf
export HIVE_AUX_JARS_PATH=/soft/hive/lib

配置日志文件,可以更改级别为DEBUG,用于调试
linux$:] vi /soft/hive/conf/hive-log4j2.properties
linux$:] cp hive-log4j2.properties.template hive-log4j2.properties
linux$:] vi hive-log4j2.properties
property.hive.log.dir = /user/hive/log

配置文件:
注意:配置mysql访问的时候,就算是指定了字符集,mysql初始化时的字符集依然为latin
linux$:] vi /soft/hive/conf/hive-site.xml

"1.0"?>
type="text/xsl" href="configuration.xsl"?>

  
    hive.metastore.warehouse.dir</name>
    /user/hive/warehouse</value>
  </property>
  
    javax.jdo.option.ConnectionURL</name>
    jdbc:mysql://Master:3306/hive?createDatabaseIfNotExist=true&;useSSL=false</value>
  </property>
  
    javax.jdo.option.ConnectionDriverName</name>
    com.mysql.cj.jdbc.Driver</value>
  </property>
  
    javax.jdo.option.ConnectionUserName</name>
    pyroot</value>
  </property>
  
    javax.jdo.option.ConnectionPassword</name>
    Twcx@2023</value>
  </property>
  
    hive.metastore.uris</name>
    thrift://Master:9083</value>
  </property>
  
    hive.metastore.event.db.notification.api.auth</name>
    false</value>
  </property>
  
    hive.metastore.schema.verification</name>
    false</value>
  </property>
  
    hive.server2.thrift.bind.host</name>
    Master</value>
  </property>
  
    hive.server2.thrift.port</name>
    10000</value>
  </property>
  
    hive.cli.print.header</name>
    true</value>
  </property>
  
    hive.cli.print.current.db</name>
    true</value>
  </property>
  
    beeline.hs2.connection.user</name>
    root</value>
  </property>
  
    beeline.hs2.connection.password</name>
    root</value>
  </property>
</configuration>

4、启动Hive
说明:
命令行客户端:
bin/hive 不推荐使用,是shell客户端
bin/beeline
强烈推荐使用,是jdbc的客户端,可以在嵌入式与远程客户端使用,且访问的hiveServer2,通过hiveServer2访问metastore,再Hive mysql数据。
HiveServer2支持多客户端的并发和身份证认证,旨在为开发API客户端如JDBC,ODBC提供更好的支持

重启hdfs
linux$:] stop-all.sh
linux$:] start-all.sh

初始化hive元数据信息到mysql中
linux$:] schematool -dbType mysql -initSchema  #初始化schema

检查mysql是否存在hive库,hive库的74张表
linux$:] mysql -uroot -p
  SQL> show databases;
  SQL> use hive
  SQL> show tables;

启动metastore
linux$:] mkdir -p /soft/hive/metastorelog
linux$:] cd /soft/hive/metastorelog
linux$:] nohup hive --service metastore --hiveconf hive.root.logger=DEBUG,console &

启动hiveserver2
linux$:] mkdir -p /soft/hive/hiveserver2log
linux$:] cd /soft/hive/hiveserver2log
linux$:] nohup $HIVE_HOME/bin/hive --service hiveserver2 &

5、远程测试metastore与hiveserver2【可在Datanode主机上搭建客户端】

安装Hive软件
linux$:] cd /soft
linux$:] wget https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
linux$:] tar -zxvf apache-hive-3.1.3-bin.tar.gz
linux$:] mv apache-hive-3.1.3-bin hive

解决hadoop与hive包之间jar冲突的问题
linux$:] cd /soft/hive/lib
linux$:] rm -rf guava-19.0.jar
linux$:] cp /soft/hadoop/share/hadoop/common/lib/guava-27.0-jre.jar ./

驱动部署,远程可不需要
linux$:] wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/6.0.6/mysql-connector-java-6.0.6.jar
linux$:] cp mysql-connector-java-6.0.6.jar  /soft/hive/lib/

配置Hive文件
配置文件
linux$:] vi /soft/hive/conf/hive-env.sh
export HADOOP_HOME=/soft/hadoop
export HIVE_CONF_DIR=/soft/hive/conf
export HIVE_AUX_JARS_PATH=/soft/hive/lib

配置文件
linux$:] vi /soft/hive/conf/hive-site.xml

  
    hive.metastore.warehouse.dir</name>
    /user/hive/warehouse</value>
  </property>
  
    hive.metastore.uris</name>
    thrift://Master:9083</value>
  </property>
</configuration>

测试metastore,不加主机与IP,默认是访问的metastore的暴露端口 9083
linux$:] beeline -u jdbc:hive2://
> show databases;

测试hiveserver2,端口10000,是访问的是hiverserver2的暴露端口
linux$:] beeline -u jdbc:hive2://Master:10000
> show databases;

其它测试:
win 环境,下载DBeaver,通过10000号进行访问链接。账号默认为hive,密码为空或者填入hive。

6、webapp的访问

http://Master:10002/

你可能感兴趣的:(hadoop,hive,大数据)