Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)

Hadoop集群搭建—step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)

软件版本:

Hive:hive-1.1.0-cdh5.14.0.tar.gz

Mysql:mysql-5.1.71-1.el6.x86_64

Flume:flume-ng-1.6.0-cdh5.14.0.tar.gz

Azkaban:azkaban3.51.0

sqoop:sqoop-1.4.6-cdh5.14.0.tar.gz

1.Hive的安装部署

Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,提供类SQL查询功能。

方式一:derby版hive直接使用

  1. 解压hive

    cd /export/softwares
    tar -zxvf hive-1.1.0-cdh5.14.0.tar.gz -C ../servers/
    

Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第1张图片

  1. 直接启动 bin/hive
cd ../servers/
cd hive-1.1.0-cdh5.14.0/
bin/hive
hive> create database mytest;

Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第2张图片

缺点:多个地方安装hive后,每一个hive是拥有一套自己的元数据,库、表就不统一,很难找到表的位置

方式二:使用mysql共享hive元数据

1.安装mysql数据库:(使用yum源进行安装)

不建议使用rpm包的方式进行安装,因为有些依赖包系统没有。

第一步:在线安装mysql相关的软件包

yum  install  mysql  mysql-server  mysql-devel

第二步:启动mysql的服务

/etc/init.d/mysqld start

第三步:通过mysql安装自带脚本进行设置

/usr/bin/mysql_secure_installation

第四步:进入mysql的客户端然后进行授权

 grant all privileges on *.* to 'root'@'%' identified by '123456' with grant option;

 flush privileges;

Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第3张图片

2.修改hive的配置文件

A.修改hive-env.sh:

添加我们的hadoop的环境变量

cd  /export/servers/hive-1.1.0-cdh5.14.0/conf
cp hive-env.sh.template hive-env.sh
vim hive-env.sh

添加以下内容:

HADOOP_HOME=/export/servers/hadoop-2.6.0-cdh5.14.0
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/export/servers/hive-1.1.0-cdh5.14.0/conf

在这里插入图片描述

B.修改hive-site.xml

cd /export/servers/hive-1.1.0-cdh5.14.0/conf
vim hive-site.xml

<configuration>
        <property>
                <name>javax.jdo.option.ConnectionURLname>
                <value>jdbc:mysql://node03.hadoop.com:3306/hive?createDatabaseIfNotExist=truevalue>
        property>

        <property>
                <name>javax.jdo.option.ConnectionDriverNamename>
                <value>com.mysql.jdbc.Drivervalue>
        property>
        <property>
                <name>javax.jdo.option.ConnectionUserNamename>
                <value>rootvalue>
        property>
        <property>
                <name>javax.jdo.option.ConnectionPasswordname>
                <value>123456value>
        property>
        <property>
                <name>hive.cli.print.current.dbname>
                <value>truevalue>
        property>
        <property>
                <name>hive.cli.print.headername>
                <value>truevalue>
        property>
        <property>
                <name>hive.server2.thrift.bind.hostname>
                <value>node03.hadoop.comvalue>
        property>

configuration>

3.上传mysql的lib驱动包

将mysql的lib驱动包(mysql-connector-java-5.1.38.jar)上传到hive的lib目录下

cd /export/servers/hive-1.1.0-cdh5.14.0/lib

4.Hive交互shell

cd /export/servers/hive-1.1.0-cdh5.14.0
bin/hive

A.查看所有的数据库

hive (default)> show databases;

B.创建一个数据库

hive (default)> create database myhive;

C.使用该数据库并创建数据库表

hive (default)> use myhive;

hive (myhive)> create table test(id int,name string);

以上命令操作完成之后,一定要确认mysql里面出来一个数据库hive

2. Flume的安装部署

Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。(数据的搬运工)

Flume的安装非常简单,只需要解压即可,当然,前提是已有hadoop环境

tar -zxvf flume-ng-1.6.0-cdh5.14.0.tar.gz -C /export/servers/
cd  /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
cp  flume-env.sh.template flume-env.sh
vim flume-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141

3.Azkaban安装部署

Azkaban是由Linkedin开源的一个批量工作流任务调度器。用于在一个工作流内以一个特定的顺序运行一组工作和流程。

由于azkaban只有源码包,所以需要进行编译。安装分为单服务模式(适用于学习)和双服务模式(实际生产中)

3.1 azkaban的编译

A.编译:

注意:我们这里选用azkaban3.51.0这个版本,编译需要使用jdk1.8的版本来进行编译,如果编译服务器使用的jdk版本是1.7的,记得切换成jdk1.8,我们这里使用的是jdk8u141这个版本来进行编译

cd /export/softwares/
wget https://github.com/azkaban/azkaban/archive/3.51.0.tar.gz
tar -zxvf 3.51.0.tar.gz -C ../servers/
cd /export/servers/azkaban-3.51.0/
yum -y install git
yum -y install gcc-c++
./gradlew build installDist -x test

B.编译之后需要用到的安装文件及其所在目录:

azkaban-exec-server

​ /export/servers/azkaban-3.51.0/azkaban-exec-server/build/distributions

azkaban-web-server

​ /export/servers/azkaban-3.51.0/azkaban-web-server/build/distributions

azkaban-solo-server

​ /export/servers/azkaban-3.51.0/azkaban-solo-server/build/distributions

execute-as-user.c

​ /export/servers/azkaban-3.51.0/az-exec-util/src/main/c

数据库脚本文件

​ /export/servers/azkaban-3.51.0/azkaban-db/build/install/azkaban-db
Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第4张图片

3.2 azkaban单服务模式安装与使用

单节点的模式,只需要一个azkaban-solo-server-0.1.0-SNAPSHOT.tar.gz的安装包即可启动,所有的数据信息都是保存在H2这个azkaban默认的数据当中

A.解压:

cd /export/softwares
tar -zxvf azkaban-solo-server-0.1.0-SNAPSHOT.tar.gz -C ../servers/

B.修改配置文件:

时区配置文件

cd /export/servers/azkaban-solo-server-0.1.0-SNAPSHOT/conf
vim azkaban.properties

default.timezone.id=Asia/Shanghai
Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第5张图片

修改commonprivate.properties配置文件

cd /export/servers/azkaban-solo-server-0.1.0-SNAPSHOT/plugins/jobtypes
vim commonprivate.properties

Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第6张图片
C.启动solo-server

cd  /export/servers/azkaban-solo-server-0.1.0-SNAPSHOT
bin/start-solo.sh

D.浏览器页面访问(查看是否成功)

http://node03:8081/(密码为:azkaban)
Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第7张图片

3.3 azkaban两个服务模式安装

A.确认所需软件:

  • Azkaban Web服务安装包 azkaban-web-server-0.1.0-SNAPSHOT.tar.gz
  • Azkaban执行服务安装包 azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz
  • 编译之后的sql脚本 create-all-sql-0.1.0-SNAPSHOT.sql
  • C程序文件脚本 execute-as-user.c

B.数据库准备:

  1. 进入mysql的客户端执行

    mysql  -uroot -p
    

    2.执行以下命令

CREATE DATABASE azkaban;
CREATE USER 'azkaban'@'%' IDENTIFIED BY 'azkaban';    
GRANT all privileges ON azkaban.* to 'azkaban'@'%' identified by 'azkaban' WITH GRANT OPTION; 
flush privileges;
use azkaban; 
source /export/softwares/create-all-sql-0.1.0-SNAPSHOT.sql;

Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第8张图片
C.解压软件安装包:

  1. 解压azkaban-web-server

    cd /export/softwares
    tar -zxvf azkaban-web-server-0.1.0-SNAPSHOT.tar.gz -C ../servers/
    cd /export/servers
    mv azkaban-web-server-0.1.0-SNAPSHOT/ azkaban-web-server-3.51.0
    
  2. 解压azkaban-exec-server

    cd /export/softwares
    tar -zxvf azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz -C ../servers/
    cd /export/servers
    mv azkaban-exec-server-0.1.0-SNAPSHOT/ azkaban-exec-server-3.51.0
    

D.安装SSL安全认证:

​ 安装ssl安全认证,允许我们使用https的方式访问我们的azkaban的web服务

cd /export/servers/azkaban-web-server-3.51.0
keytool -keystore keystore -alias jetty -genkey -keyalg RSA

E.azkaban web server安装:

修改azkaban-web-server的配置文件

cd /export/servers/azkaban-web-server-3.51.0/conf
vim azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Azkaban
azkaban.label=My Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=true
jetty.maxThreads=25
jetty.port=8081

jetty.ssl.port=8443
jetty.keystore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.password=azkaban
jetty.keypassword=azkaban
jetty.truststore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.trustpassword=azkaban


# Azkaban Executor settings
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=node03
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
#Multiple Executor
azkaban.use.multiple.executors=true
#azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1

azkaban.activeexecutor.refresh.milisecinterval=10000
azkaban.queueprocessing.enabled=true
azkaban.activeexecutor.refresh.flowinterval=10
azkaban.executorinfo.refresh.maxThreads=10

F.azkaban web server安装:

修改azkaban-web-server的配置文件

cd /export/servers/azkaban-web-server-3.51.0/conf
vim azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Azkaban
azkaban.label=My Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=true
jetty.maxThreads=25
jetty.port=8081

jetty.ssl.port=8443
jetty.keystore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.password=azkaban
jetty.keypassword=azkaban
jetty.truststore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.trustpassword=azkaban


# Azkaban Executor settings
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=node03
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
#Multiple Executor
azkaban.use.multiple.executors=true
#azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1

azkaban.activeexecutor.refresh.milisecinterval=10000
azkaban.queueprocessing.enabled=true
azkaban.activeexecutor.refresh.flowinterval=10
azkaban.executorinfo.refresh.maxThreads=10

G.azkaban executor server 安装:

  1. 修改azkaban-exex-server配置文件

    cd /export/servers/azkaban-exec-server-3.51.0/conf
    vim azkaban.properties
    
# Azkaban Personalization Settings
azkaban.name=Azkaban
azkaban.label=My Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=true
jetty.maxThreads=25
jetty.port=8081


jetty.keystore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.password=azkaban
jetty.keypassword=azkaban
jetty.truststore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.trustpassword=azkaban


# Where the Azkaban web server is located
azkaban.webserver.url=https://node03:8443
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=node03
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30

2.添加插件

将我们编译后的C文件execute-as-user.c上传到这个目录来/export/servers/azkaban-exec-server-3.51.0/plugins/jobtypes

cp /export/softwares/execute-as-user.c /export/servers/azkaban-exec-server-3.51.0/plugins/jobtypes/

然后执行以下命令生成execute-as-user

yum -y install gcc-c++
cd /export/servers/azkaban-exec-server-3.51.0/plugins/jobtypes
gcc execute-as-user.c -o execute-as-user
chown root execute-as-user
chmod 6050 execute-as-user

3.修改配置文件

cd  /export/servers/azkaban-exec-server-3.47.0/plugins/jobtypes
vim commonprivate.properties
execute.as.user=false
memCheck.enabled=false
azkaban.native.lib=/export/servers/azkaban-exec-server-3.51.0/plugins/jobtypes

最终生成:
Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第9张图片

H.启动服务:

  1. 启动azkaban exec server

    cd /export/servers/azkaban-exec-server-3.51.0
    bin/start-exec.sh
    
  2. 激活我们的exec-server

    curl -G "node03:$(<./executor.port)/executor?action=activate" && echo
    
  3. 启动azkaban-web-server

    cd /export/servers/azkaban-web-server-3.51.0/
    bin/start-web.sh
    
  4. 检测

    访问地址:https://node03:8443

4.sqoop安装

sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。

4.1 下载并解压

下载地址:

  • sqoop1版本详细下载地址

    http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.14.0.tar.gz

  • sqoop2版本详细下载地址

    http://archive.cloudera.com/cdh5/cdh/5/sqoop2-1.99.5-cdh5.14.0.tar.gz
    我们这里使用sqoop1的版本
    解压:

cd /export/softwares
tar -zxvf sqoop-1.4.6-cdh5.14.0.tar.gz -C ../servers/

4.2 修改配置文件

cd /export/servers/sqoop-1.4.6-cdh5.14.0/conf/
cp sqoop-env-template.sh  sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/export/servers/hadoop-2.6.0-cdh5.14.0
export HADOOP_MAPRED_HOME=/export/servers/hadoop-2.6.0-cdh5.14.0
export HIVE_HOME=/export/servers/hive-1.1.0-cdh5.14.0

4.3 加入额外的依赖包

sqoop的使用需要添加两个额外的依赖包,一个是mysql的驱动包,一个是java-json的的依赖包,不然就会报错。

  • mysql-connector-java-5.1.40.jar
  • java-json.jar

将这个两个jar包添加到sqoop的lib目录下。

4.4 验证启动

cd /export/servers/sqoop-1.4.6-cdh5.14.0
bin/sqoop-version
cd /export/softwares
tar -zxvf sqoop-1.4.6-cdh5.14.0.tar.gz -C ../servers/

Hadoop集群搭建---step4(Hive、Flume、Azkaban、Sqoop的安装以及环境搭建)_第10张图片

你可能感兴趣的:(big,data)