适用于Hadoop的Apache Oozie Workflow Scheduler
总览
Oozie是一个工作流调度程序系统,用于管理Apache Hadoop作业。
Oozie Workflow作业是操作的有向无环图(DAG)。
Oozie Coordinator作业是由时间(频率)和数据可用性触发的Oozie Workflow周期性作业。
Oozie与其余Hadoop堆栈集成在一起,支持开箱即用的几种类型的Hadoop作业(例如Java map-reduce,Streaming map-reduce,Pig,Hive,Sqoop和Distcp)以及系统特定的作业(例如Java程序和Shell脚本)。
Oozie是一个可扩展,可靠且可扩展的系统。
系统要求:
Unix box(在Mac OS X和Linux上测试)
Java JDK 1.8以上
Maven 3.0.1+
Hadoop 2.6.0以上
pig 0.10.1+
其中jdk和mvn需要配置环境变量
从 Oozie网站 上的“Releases”下拉菜单中下载Oozie的源发行版。(我用的5.2.0版本)
在本地解压发行包需要自己进行编译:
建议: 先将自己的maven换成国内的镜像源,不然下载到你怀疑人生!具体自己百度吧!
$ bin/mkdistro.sh -Dhadoop.version=2.7.5 -Dhive.version=2.3.6 -Dhbase.version=1.4.13 -Dtez.version=0.9.2 -Dspark.version=2.4.5 -Dspark.scala.binary.version=2.11 -Dmaven.test.skip=true -Puber
执行成功会在 oozie-5.2.0/distro/target/ 路径下生成编译的tar包,这样我们就能使用这个oozie了。
fs.defaultFS
hdfs://hadoop-master:9000/
hadoop.proxyuser.root.hosts
*
hadoop.proxyuser.root.groups
*
fs.trash.interval
1440
Number of minutes between trash checkpoints.
If zero, the trash feature is disabled.
Dockerfile文件
FROM ubuntu:18.04
MAINTAINER tianwei<[email protected]>
RUN apt-get update && apt-get install -y openssh-server openjdk-8-jdk
RUN apt-get update && apt-get install unzip
RUN apt-get update && apt-get install -y locales && rm -rf /var/lib/apt/lists/* \
&& localedef -i zh_CN -c -f UTF-8 -A /usr/share/locale/locale.alias zh_CN.UTF-8
RUN apt-get update
ENV LANG zh_CN.UTF-8
ADD oozie-5.2.0-distro.tar.gz /usr/local/
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
ENV OOZIE_HOME=/usr/local/oozie-5.2.0
ENV PATH=$PATH:$OOZIE_HOME/bin
COPY oozie-site.xml $OOZIE_HOME/conf/
COPY oozie-env.sh $OOZIE_HOME/conf/
COPY oozie-log4j.properties $OOZIE_HOME/conf/
RUN mkdir -p $OOZIE_HOME/libext
COPY ext-2.2.zip $OOZIE_HOME/libext/
# 构建好容器进去解压 ext-2.2.zip.zip
COPY mysql-connector-java-8.0.19.jar $OOZIE_HOME/libext/
COPY core-site.xml /usr/local/hadoop/etc/hadoop/
COPY mapred-site.xml /usr/local/hadoop/etc/hadoop/
COPY yarn-site.xml /usr/local/hadoop/etc/hadoop/
WORKDIR $OOZIE_HOME
RUN ln -s $OOZIE_HOME/embedded-oozie-server/webapp/WEB-INF/lib lib
WORKDIR $OOZIE_HOME/embedded-oozie-server/webapp/WEB-INF/lib
RUN mv javax.servlet-3.0.0.v201112011016.jar javax.servlet-3.0.0.v201112011016.jar.bak
RUN mv javax.servlet-api-3.1.0.jar javax.servlet-api-3.1.0.jar.bak
RUN mv jetty-servlet-9.3.27.v20190418.jar jetty-servlet-9.3.27.v20190418.jar.bak
RUN mv jsp-2.1-6.1.14.jar jsp-2.1-6.1.14.jar.bak
RUN mv jsp-api-2.0.jar jsp-api-2.0.jar.bak
RUN mv log4j-1.2-api-2.6.2.jar log4j-1.2-api-2.6.2.jar.bak
RUN mv slf4j-log4j12-1.6.6.jar slf4j-log4j12-1.6.6.jar.bak
WORKDIR /
ext-2.2文件
官网下载地址:http://archive.cloudera.com/gplextras/misc/ext-2.2.zip
mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
hadoop-master:10020
mapreduce.jobhistory.webapp.address
hadoop-master:19888
mysql-connector-java-8.0.19.jar文件
数据库链接的jar包,自己百度
oozie-5.2.0-distro.tar.gz文件
编译好的oozie包
oozie-env.sh
备用
oozie-log4j.properties文件(oozie自带)
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. See accompanying LICENSE file.
#
# If the Java System property 'oozie.log.dir' is not defined at Oozie start up time
# XLogService sets its value to '${oozie.home}/logs'
# The appender that Oozie uses must be named 'oozie' (i.e. log4j.appender.oozie)
# Using the RollingFileAppender with the OozieRollingPolicy will roll the log file every hour and retain up to MaxHistory number of
# log files. If FileNamePattern ends with ".gz" it will create gzip files.
log4j.appender.oozie=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.oozie.RollingPolicy=org.apache.oozie.util.OozieRollingPolicy
log4j.appender.oozie.File=${oozie.log.dir}/oozie.log
log4j.appender.oozie.Append=true
log4j.appender.oozie.layout=org.apache.log4j.PatternLayout
log4j.appender.oozie.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - SERVER[${oozie.instance.id}] %m%n
# The FileNamePattern must end with "-%d{yyyy-MM-dd-HH}.gz" or "-%d{yyyy-MM-dd-HH}" and also start with the
# value of log4j.appender.oozie.File
log4j.appender.oozie.RollingPolicy.FileNamePattern=${log4j.appender.oozie.File}-%d{yyyy-MM-dd-HH}
# The MaxHistory controls how many log files will be retained (720 hours / 24 hours per day = 30 days); -1 to disable
log4j.appender.oozie.RollingPolicy.MaxHistory=720
log4j.appender.oozieError=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.oozieError.RollingPolicy=org.apache.oozie.util.OozieRollingPolicy
log4j.appender.oozieError.File=${oozie.log.dir}/oozie-error.log
log4j.appender.oozieError.Append=true
log4j.appender.oozieError.layout=org.apache.log4j.PatternLayout
log4j.appender.oozieError.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - SERVER[${oozie.instance.id}] %m%n
# The FileNamePattern must end with "-%d{yyyy-MM-dd-HH}.gz" or "-%d{yyyy-MM-dd-HH}" and also start with the
# value of log4j.appender.oozieError.File
log4j.appender.oozieError.RollingPolicy.FileNamePattern=${log4j.appender.oozieError.File}-%d{yyyy-MM-dd-HH}
# The MaxHistory controls how many log files will be retained (720 hours / 24 hours per day = 30 days); -1 to disable
log4j.appender.oozieError.RollingPolicy.MaxHistory=720
log4j.appender.oozieError.filter.1 = org.apache.log4j.varia.LevelMatchFilter
log4j.appender.oozieError.filter.1.levelToMatch = WARN
log4j.appender.oozieError.filter.2 = org.apache.log4j.varia.LevelMatchFilter
log4j.appender.oozieError.filter.2.levelToMatch = ERROR
log4j.appender.oozieError.filter.3 = org.apache.log4j.varia.LevelMatchFilter
log4j.appender.oozieError.filter.3.levelToMatch = FATAL
log4j.appender.oozieError.filter.4 = org.apache.log4j.varia.DenyAllFilter
# Uncomment the below two lines to use the DailyRollingFileAppender instead
# The DatePattern must end with either "dd" or "HH"
#log4j.appender.oozie=org.apache.log4j.DailyRollingFileAppender
#log4j.appender.oozie.DatePattern='.'yyyy-MM-dd-HH
log4j.appender.oozieops=org.apache.log4j.DailyRollingFileAppender
log4j.appender.oozieops.DatePattern='.'yyyy-MM-dd
log4j.appender.oozieops.File=${oozie.log.dir}/oozie-ops.log
log4j.appender.oozieops.Append=true
log4j.appender.oozieops.layout=org.apache.log4j.PatternLayout
log4j.appender.oozieops.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n
log4j.appender.oozieinstrumentation=org.apache.log4j.DailyRollingFileAppender
log4j.appender.oozieinstrumentation.DatePattern='.'yyyy-MM-dd
log4j.appender.oozieinstrumentation.File=${oozie.log.dir}/oozie-instrumentation.log
log4j.appender.oozieinstrumentation.Append=true
log4j.appender.oozieinstrumentation.layout=org.apache.log4j.PatternLayout
log4j.appender.oozieinstrumentation.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n
log4j.appender.oozieaudit=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.oozieaudit.RollingPolicy=org.apache.oozie.util.OozieRollingPolicy
log4j.appender.oozieaudit.File=${oozie.log.dir}/oozie-audit.log
log4j.appender.oozieaudit.Append=true
log4j.appender.oozieaudit.layout=org.apache.log4j.PatternLayout
log4j.appender.oozieaudit.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n
log4j.appender.oozieaudit.RollingPolicy.FileNamePattern=${log4j.appender.oozieaudit.File}.%d{yyyy-MM-dd}
log4j.appender.oozieaudit.RollingPolicy.MaxHistory=30
log4j.appender.openjpa=org.apache.log4j.DailyRollingFileAppender
log4j.appender.openjpa.DatePattern='.'yyyy-MM-dd
log4j.appender.openjpa.File=${oozie.log.dir}/oozie-jpa.log
log4j.appender.openjpa.Append=true
log4j.appender.openjpa.layout=org.apache.log4j.PatternLayout
log4j.appender.openjpa.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n
log4j.category.openjpa.Tool=INFO
log4j.category.openjpa.Runtime=INFO
log4j.category.openjpa.Remote=WARN
log4j.category.openjpa.DataCache=WARN
log4j.category.openjpa.MetaData=WARN
log4j.category.openjpa.Enhance=WARN
log4j.category.openjpa.Query=WARN
log4j.category.openjpa.jdbc.SQL=WARN
log4j.category.openjpa.jdbc.JDBC=WARN
log4j.category.openjpa.jdbc.Schema=WARN
log4j.appender.jetty=org.apache.log4j.DailyRollingFileAppender
log4j.appender.jetty.DatePattern='.'yyyy-MM-dd
log4j.appender.jetty.File=${oozie.log.dir}/jetty.log
log4j.appender.jetty.Append=true
log4j.appender.jetty.layout=org.apache.log4j.PatternLayout
log4j.appender.jetty.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n
log4j.appender.none=org.apache.log4j.varia.NullAppender
# Explicitly switch off root logger: anything interesting goes
# already either to jetty or one of the oozie appenders
log4j.rootLogger=NONE, none
log4j.logger.org.eclipse.jetty=INFO, jetty
log4j.logger.openjpa=INFO, openjpa
log4j.logger.oozieops=INFO, oozieops
log4j.logger.oozieinstrumentation=ALL, oozieinstrumentation
log4j.logger.oozieaudit=ALL, oozieaudit
log4j.logger.org.apache.oozie=INFO, oozie, oozieError
log4j.logger.org.apache.hadoop=WARN, oozie
log4j.logger.org.mortbay=WARN, oozie
log4j.logger.org.hsqldb=WARN, oozie
log4j.logger.org.apache.hadoop.security.authentication.server=WARN, oozie
oozie-site.xml文件
oozie.service.ProxyUserService.proxyuser.root.hosts
*
List of hosts the '#USER#' user is allowed to perform 'doAs'
operations.
The '#USER#' must be replaced with the username o the user who is
allowed to perform 'doAs' operations.
The value can be the '*' wildcard or a list of hostnames.
For multiple users copy this property and replace the user name
in the property name.
oozie.service.ProxyUserService.proxyuser.root.groups
*
List of groups the '#USER#' user is allowed to impersonate users
from to perform 'doAs' operations.
The '#USER#' must be replaced with the username o the user who is
allowed to perform 'doAs' operations.
The value can be the '*' wildcard or a list of groups.
For multiple users copy this property and replace the user name
in the property name.
oozie.http.hostname
hadoop-oozie
oozie.db.schema.name
oozie
oozie.service.JPAService.create.db.schema
false
oozie.service.JPAService.validate.db.connection
false
oozie.service.JPAService.jdbc.url
jdbc:mysql://beimei6-mysql/oozie?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useUnicode=true&useLegacyDatetimeCode=false&serverTimezone=GMT
oozie.service.JPAService.jdbc.username
root
oozie.service.JPAService.jdbc.password
password
oozie.service.JPAService.jdbc.driver
com.mysql.cj.jdbc.Driver
oozie.service.JPAService.pool.max.active.conn
10
oozie.service.HadoopAccessorService.hadoop.configurations
*=/usr/local/hadoop/etc/hadoop/
oozie.service.HadoopAccessorService.action.configurations
*=/usr/local/hadoop/etc/hadoop/
yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.hostname
hadoop-master
yarn.nodemanager.vmem-check-enabled
false
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.scheduler.fair.preemption
true
yarn.scheduler.fair.preemption.cluster-utilization-threshold
1.0
yarn.log-aggregation-enable
true
yarn.log.server.url
http://hadoop-master:19888/jobhistory/logs
在day2(上一级)中编写oozie脚本
docker-compose-oozie.yml
version: "3"
services:
hadoop-oozie:
image: beimei6/oozie
container_name: hadoop-oozie
networks:
beimei6-net:
ipv4_address: 172.11.24.15
stdin_open: true # -i interactive
tty: true # -t tty
entrypoint: ["sh" ,"-c","bash"]
install-oozie.sh
#!/bin/bash
cd oozie
docker build -t beimei6/oozie .
cd -
docker-compose -f docker-compose.yml -f docker-compose-oozie.yml up -d hadoop-oozie
最后在day2目录执行
bash install-oozie.sh
使用Docker构建的oozie容器就成功了
准备:启动mysql
启动hadoop的dfs、yarn、historyserver
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
打开oozie容器:
安装unzip:
apt-get update
apt-get install unzip
进入$OOZIE_HOME/libext/ 将 ext-2.2.zip文件解压:
cd $OOZIE_HOME/libext/
unzip ext-2.2.zip
在 $OOZIE_HOME 执行:
运行oozie-setup.sh将新的sharelib上载到hdfs中
cd $OOZIE_HOME
bin/oozie-setup.sh sharelib create -fs hdfs://hadoop-master:9000
执行后会在hdfs上生成文件
使用“ ooziedb.sh”命令行工具创建Oozie DB:
bin/ooziedb.sh scare -sqlfile oozie.sql -run
启动Oozie,后台运行:
$ bin/oozied.sh start
启动Oozie,前台运行(方便发现错误):
$ bin/oozied.sh run
检查Oozie日志文件logs / oozie.log,以确保Oozie正确启动。
使用Oozie命令行工具检查Oozie的状态:
$ bin/oozie admin -oozie http://hadoop-oozie:11000/oozie -status
使用浏览器转到Oozie Web控制台,Oozie的状态应为NORMAL
浏览器输入:
http://localhost:11000/oozie.html
案例存放在和编译完成的oozie同级目录中,如下图解压:
案例目录…/oozie-5.2.0-distro/oozie-5.2.0/examples/apps/
修改map-reduce里面的job.properties:
这里用到修改namenode和jobtracker的配置。
这两个是参考自己hadoop的配置的,
namenode的配置是在hadoop的core-site.xml里面配置的,
jobtracker的配置是在hadoop的yarn-site.xml里面配置的yarn.resourcemanager.address属性。
以上配置我在上面已经配置好,自己需要在hadoop节点上分别修改加上这些配置,然后重启。
(oozie中的hadoop配置存放在/usr/local/hadoop/etc/hadoop/)
然后,把这个example上传到hdfs里面:
自己本机需要配置有hdfs climt 才能执行
hdfs dfs -put examples examples
然后就可以执行啦:
cd $OOZIE_HOME
bin/oozie job -oozie http://hadoop-oozie:11000/oozie -config examples/apps/map-reduce/job.properties -run
job提高成功之后就会给一个job ID。之后就可以通过如下命令查看job的状态,最后那一串就是刚刚给的job ID:
bin/oozie job -oozie http://hadoop-oozie:11000/oozie -info 0000000-200709041917672-oozie-root-W