软件版本:
Hive:hive-1.1.0-cdh5.14.0.tar.gz
Mysql:mysql-5.1.71-1.el6.x86_64
Flume:flume-ng-1.6.0-cdh5.14.0.tar.gz
Azkaban:azkaban3.51.0
sqoop:sqoop-1.4.6-cdh5.14.0.tar.gz
Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,提供类SQL查询功能。
方式一:derby版hive直接使用
解压hive
cd /export/softwares
tar -zxvf hive-1.1.0-cdh5.14.0.tar.gz -C ../servers/
cd ../servers/
cd hive-1.1.0-cdh5.14.0/
bin/hive
hive> create database mytest;
缺点:多个地方安装hive后,每一个hive是拥有一套自己的元数据,库、表就不统一,很难找到表的位置
方式二:使用mysql共享hive元数据
1.安装mysql数据库:(使用yum源进行安装)
不建议使用rpm包的方式进行安装,因为有些依赖包系统没有。
第一步:在线安装mysql相关的软件包
yum install mysql mysql-server mysql-devel
第二步:启动mysql的服务
/etc/init.d/mysqld start
第三步:通过mysql安装自带脚本进行设置
/usr/bin/mysql_secure_installation
第四步:进入mysql的客户端然后进行授权
grant all privileges on *.* to 'root'@'%' identified by '123456' with grant option;
flush privileges;
2.修改hive的配置文件
A.修改hive-env.sh:
添加我们的hadoop的环境变量
cd /export/servers/hive-1.1.0-cdh5.14.0/conf
cp hive-env.sh.template hive-env.sh
vim hive-env.sh
添加以下内容:
HADOOP_HOME=/export/servers/hadoop-2.6.0-cdh5.14.0
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/export/servers/hive-1.1.0-cdh5.14.0/conf
B.修改hive-site.xml
cd /export/servers/hive-1.1.0-cdh5.14.0/conf
vim hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURLname>
<value>jdbc:mysql://node03.hadoop.com:3306/hive?createDatabaseIfNotExist=truevalue>
property>
<property>
<name>javax.jdo.option.ConnectionDriverNamename>
<value>com.mysql.jdbc.Drivervalue>
property>
<property>
<name>javax.jdo.option.ConnectionUserNamename>
<value>rootvalue>
property>
<property>
<name>javax.jdo.option.ConnectionPasswordname>
<value>123456value>
property>
<property>
<name>hive.cli.print.current.dbname>
<value>truevalue>
property>
<property>
<name>hive.cli.print.headername>
<value>truevalue>
property>
<property>
<name>hive.server2.thrift.bind.hostname>
<value>node03.hadoop.comvalue>
property>
configuration>
3.上传mysql的lib驱动包
将mysql的lib驱动包(mysql-connector-java-5.1.38.jar)上传到hive的lib目录下
cd /export/servers/hive-1.1.0-cdh5.14.0/lib
4.Hive交互shell
cd /export/servers/hive-1.1.0-cdh5.14.0
bin/hive
A.查看所有的数据库
hive (default)> show databases;
B.创建一个数据库
hive (default)> create database myhive;
C.使用该数据库并创建数据库表
hive (default)> use myhive;
hive (myhive)> create table test(id int,name string);
以上命令操作完成之后,一定要确认mysql里面出来一个数据库hive
Flume是一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统。(数据的搬运工)
Flume的安装非常简单,只需要解压即可,当然,前提是已有hadoop环境
tar -zxvf flume-ng-1.6.0-cdh5.14.0.tar.gz -C /export/servers/
cd /export/servers/apache-flume-1.6.0-cdh5.14.0-bin/conf
cp flume-env.sh.template flume-env.sh
vim flume-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141
Azkaban是由Linkedin开源的一个批量工作流任务调度器。用于在一个工作流内以一个特定的顺序运行一组工作和流程。
由于azkaban只有源码包,所以需要进行编译。安装分为单服务模式(适用于学习)和双服务模式(实际生产中)
A.编译:
注意:我们这里选用azkaban3.51.0这个版本,编译需要使用jdk1.8的版本来进行编译,如果编译服务器使用的jdk版本是1.7的,记得切换成jdk1.8,我们这里使用的是jdk8u141这个版本来进行编译
cd /export/softwares/
wget https://github.com/azkaban/azkaban/archive/3.51.0.tar.gz
tar -zxvf 3.51.0.tar.gz -C ../servers/
cd /export/servers/azkaban-3.51.0/
yum -y install git
yum -y install gcc-c++
./gradlew build installDist -x test
B.编译之后需要用到的安装文件及其所在目录:
azkaban-exec-server
/export/servers/azkaban-3.51.0/azkaban-exec-server/build/distributions
azkaban-web-server
/export/servers/azkaban-3.51.0/azkaban-web-server/build/distributions
azkaban-solo-server
/export/servers/azkaban-3.51.0/azkaban-solo-server/build/distributions
execute-as-user.c
/export/servers/azkaban-3.51.0/az-exec-util/src/main/c
数据库脚本文件
/export/servers/azkaban-3.51.0/azkaban-db/build/install/azkaban-db
单节点的模式,只需要一个azkaban-solo-server-0.1.0-SNAPSHOT.tar.gz的安装包即可启动,所有的数据信息都是保存在H2这个azkaban默认的数据当中
A.解压:
cd /export/softwares
tar -zxvf azkaban-solo-server-0.1.0-SNAPSHOT.tar.gz -C ../servers/
B.修改配置文件:
时区配置文件
cd /export/servers/azkaban-solo-server-0.1.0-SNAPSHOT/conf
vim azkaban.properties
default.timezone.id=Asia/Shanghai
修改commonprivate.properties配置文件
cd /export/servers/azkaban-solo-server-0.1.0-SNAPSHOT/plugins/jobtypes
vim commonprivate.properties
cd /export/servers/azkaban-solo-server-0.1.0-SNAPSHOT
bin/start-solo.sh
D.浏览器页面访问(查看是否成功)
http://node03:8081/(密码为:azkaban)
A.确认所需软件:
B.数据库准备:
进入mysql的客户端执行
mysql -uroot -p
2.执行以下命令
CREATE DATABASE azkaban;
CREATE USER 'azkaban'@'%' IDENTIFIED BY 'azkaban';
GRANT all privileges ON azkaban.* to 'azkaban'@'%' identified by 'azkaban' WITH GRANT OPTION;
flush privileges;
use azkaban;
source /export/softwares/create-all-sql-0.1.0-SNAPSHOT.sql;
解压azkaban-web-server
cd /export/softwares
tar -zxvf azkaban-web-server-0.1.0-SNAPSHOT.tar.gz -C ../servers/
cd /export/servers
mv azkaban-web-server-0.1.0-SNAPSHOT/ azkaban-web-server-3.51.0
解压azkaban-exec-server
cd /export/softwares
tar -zxvf azkaban-exec-server-0.1.0-SNAPSHOT.tar.gz -C ../servers/
cd /export/servers
mv azkaban-exec-server-0.1.0-SNAPSHOT/ azkaban-exec-server-3.51.0
D.安装SSL安全认证:
安装ssl安全认证,允许我们使用https的方式访问我们的azkaban的web服务
cd /export/servers/azkaban-web-server-3.51.0
keytool -keystore keystore -alias jetty -genkey -keyalg RSA
E.azkaban web server安装:
修改azkaban-web-server的配置文件
cd /export/servers/azkaban-web-server-3.51.0/conf
vim azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Azkaban
azkaban.label=My Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=true
jetty.maxThreads=25
jetty.port=8081
jetty.ssl.port=8443
jetty.keystore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.password=azkaban
jetty.keypassword=azkaban
jetty.truststore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.trustpassword=azkaban
# Azkaban Executor settings
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=node03
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
#Multiple Executor
azkaban.use.multiple.executors=true
#azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1
azkaban.activeexecutor.refresh.milisecinterval=10000
azkaban.queueprocessing.enabled=true
azkaban.activeexecutor.refresh.flowinterval=10
azkaban.executorinfo.refresh.maxThreads=10
F.azkaban web server安装:
修改azkaban-web-server的配置文件
cd /export/servers/azkaban-web-server-3.51.0/conf
vim azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Azkaban
azkaban.label=My Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=true
jetty.maxThreads=25
jetty.port=8081
jetty.ssl.port=8443
jetty.keystore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.password=azkaban
jetty.keypassword=azkaban
jetty.truststore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.trustpassword=azkaban
# Azkaban Executor settings
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=node03
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
#Multiple Executor
azkaban.use.multiple.executors=true
#azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
azkaban.executorselector.comparator.NumberOfAssignedFlowComparator=1
azkaban.executorselector.comparator.Memory=1
azkaban.executorselector.comparator.LastDispatched=1
azkaban.executorselector.comparator.CpuUsage=1
azkaban.activeexecutor.refresh.milisecinterval=10000
azkaban.queueprocessing.enabled=true
azkaban.activeexecutor.refresh.flowinterval=10
azkaban.executorinfo.refresh.maxThreads=10
G.azkaban executor server 安装:
修改azkaban-exex-server配置文件
cd /export/servers/azkaban-exec-server-3.51.0/conf
vim azkaban.properties
# Azkaban Personalization Settings
azkaban.name=Azkaban
azkaban.label=My Azkaban
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
web.resource.dir=web/
default.timezone.id=Asia/Shanghai
# Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
user.manager.xml.file=conf/azkaban-users.xml
# Loader for projects
executor.global.properties=conf/global.properties
azkaban.project.dir=projects
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.use.ssl=true
jetty.maxThreads=25
jetty.port=8081
jetty.keystore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.password=azkaban
jetty.keypassword=azkaban
jetty.truststore=/export/servers/azkaban-web-server-3.51.0/keystore
jetty.trustpassword=azkaban
# Where the Azkaban web server is located
azkaban.webserver.url=https://node03:8443
# mail settings
mail.sender=
mail.host=
# User facing web server configurations used to construct the user facing server URLs. They are useful when there is a reverse proxy between Azkaban web servers and users.
# enduser -> myazkabanhost:443 -> proxy -> localhost:8081
# when this parameters set then these parameters are used to generate email links.
# if these parameters are not set then jetty.hostname, and jetty.port(if ssl configured jetty.ssl.port) are used.
# azkaban.webserver.external_hostname=myazkabanhost.com
# azkaban.webserver.external_ssl_port=443
# azkaban.webserver.external_port=8081
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
# JMX stats
jetty.connector.stats=true
executor.connector.stats=true
# Azkaban plugin settings
azkaban.jobtype.plugin.dir=plugins/jobtypes
# Azkaban mysql settings by default. Users should configure their own username and password.
database.type=mysql
mysql.port=3306
mysql.host=node03
mysql.database=azkaban
mysql.user=azkaban
mysql.password=azkaban
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.flow.threads=30
2.添加插件
将我们编译后的C文件execute-as-user.c上传到这个目录来/export/servers/azkaban-exec-server-3.51.0/plugins/jobtypes
cp /export/softwares/execute-as-user.c /export/servers/azkaban-exec-server-3.51.0/plugins/jobtypes/
然后执行以下命令生成execute-as-user
yum -y install gcc-c++
cd /export/servers/azkaban-exec-server-3.51.0/plugins/jobtypes
gcc execute-as-user.c -o execute-as-user
chown root execute-as-user
chmod 6050 execute-as-user
3.修改配置文件
cd /export/servers/azkaban-exec-server-3.47.0/plugins/jobtypes
vim commonprivate.properties
execute.as.user=false
memCheck.enabled=false
azkaban.native.lib=/export/servers/azkaban-exec-server-3.51.0/plugins/jobtypes
H.启动服务:
启动azkaban exec server
cd /export/servers/azkaban-exec-server-3.51.0
bin/start-exec.sh
激活我们的exec-server
curl -G "node03:$(<./executor.port)/executor?action=activate" && echo
启动azkaban-web-server
cd /export/servers/azkaban-web-server-3.51.0/
bin/start-web.sh
检测
访问地址:https://node03:8443
sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。
下载地址:
sqoop1版本详细下载地址
http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.14.0.tar.gz
sqoop2版本详细下载地址
http://archive.cloudera.com/cdh5/cdh/5/sqoop2-1.99.5-cdh5.14.0.tar.gz
我们这里使用sqoop1的版本
解压:
cd /export/softwares
tar -zxvf sqoop-1.4.6-cdh5.14.0.tar.gz -C ../servers/
cd /export/servers/sqoop-1.4.6-cdh5.14.0/conf/
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/export/servers/hadoop-2.6.0-cdh5.14.0
export HADOOP_MAPRED_HOME=/export/servers/hadoop-2.6.0-cdh5.14.0
export HIVE_HOME=/export/servers/hive-1.1.0-cdh5.14.0
sqoop的使用需要添加两个额外的依赖包,一个是mysql的驱动包,一个是java-json的的依赖包,不然就会报错。
将这个两个jar包添加到sqoop的lib目录下。
cd /export/servers/sqoop-1.4.6-cdh5.14.0
bin/sqoop-version
cd /export/softwares
tar -zxvf sqoop-1.4.6-cdh5.14.0.tar.gz -C ../servers/