安装Oozie的一路坎坷之九九八十一

这里写目录标题

  • Oozie梗概
  • 安装Oozie前的申明
  • 使用Docker构建Oozie
    • 1、编译发行包
    • 2、 使用docker构建Oozie
  • 运行Oozie
  • 运行案例测试, oozie安装包里自带的例子:map-reduce

Oozie梗概

适用于Hadoop的Apache Oozie Workflow Scheduler
总览
Oozie是一个工作流调度程序系统,用于管理Apache Hadoop作业。

Oozie Workflow作业是操作的有向无环图(DAG)。

Oozie Coordinator作业是由时间(频率)和数据可用性触发的Oozie Workflow周期性作业。

Oozie与其余Hadoop堆栈集成在一起,支持开箱即用的几种类型的Hadoop作业(例如Java map-reduce,Streaming map-reduce,Pig,Hive,Sqoop和Distcp)以及系统特定的作业(例如Java程序和Shell脚本)。

Oozie是一个可扩展,可靠且可扩展的系统。

安装Oozie前的申明

系统要求:
Unix box(在Mac OS X和Linux上测试)
Java JDK 1.8以上
Maven 3.0.1+
Hadoop 2.6.0以上
pig 0.10.1+
其中jdk和mvn需要配置环境变量

使用Docker构建Oozie

1、编译发行包

从 Oozie网站 上的“Releases”下拉菜单中下载Oozie的源发行版。(我用的5.2.0版本)
安装Oozie的一路坎坷之九九八十一_第1张图片
在本地解压发行包需要自己进行编译:
建议: 先将自己的maven换成国内的镜像源,不然下载到你怀疑人生!具体自己百度吧!

$ bin/mkdistro.sh -Dhadoop.version=2.7.5 -Dhive.version=2.3.6 -Dhbase.version=1.4.13 -Dtez.version=0.9.2 -Dspark.version=2.4.5 -Dspark.scala.binary.version=2.11 -Dmaven.test.skip=true -Puber

执行成功会在 oozie-5.2.0/distro/target/ 路径下生成编译的tar包,这样我们就能使用这个oozie了。
安装Oozie的一路坎坷之九九八十一_第2张图片

2、 使用docker构建Oozie

其中oozie的配置文件有:
安装Oozie的一路坎坷之九九八十一_第3张图片
core-site.xml:





    
        fs.defaultFS
        hdfs://hadoop-master:9000/
    
    
        hadoop.proxyuser.root.hosts
        *
    
    
        hadoop.proxyuser.root.groups
        *
    

    
        fs.trash.interval
        1440
        Number of minutes between trash checkpoints.
            If zero, the trash feature is disabled.
        
    



Dockerfile文件

FROM ubuntu:18.04
MAINTAINER tianwei<[email protected]>

RUN apt-get update && apt-get install -y openssh-server openjdk-8-jdk
RUN apt-get update && apt-get install unzip
RUN apt-get update && apt-get install -y locales && rm -rf /var/lib/apt/lists/* \
    && localedef -i zh_CN -c -f UTF-8 -A /usr/share/locale/locale.alias zh_CN.UTF-8

RUN apt-get update

ENV LANG zh_CN.UTF-8

ADD oozie-5.2.0-distro.tar.gz /usr/local/
ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
ENV OOZIE_HOME=/usr/local/oozie-5.2.0
ENV PATH=$PATH:$OOZIE_HOME/bin

COPY oozie-site.xml $OOZIE_HOME/conf/
COPY oozie-env.sh $OOZIE_HOME/conf/
COPY oozie-log4j.properties $OOZIE_HOME/conf/

RUN mkdir -p $OOZIE_HOME/libext

COPY ext-2.2.zip $OOZIE_HOME/libext/
# 构建好容器进去解压 ext-2.2.zip.zip
COPY mysql-connector-java-8.0.19.jar $OOZIE_HOME/libext/
COPY core-site.xml /usr/local/hadoop/etc/hadoop/
COPY mapred-site.xml /usr/local/hadoop/etc/hadoop/
COPY yarn-site.xml /usr/local/hadoop/etc/hadoop/
WORKDIR $OOZIE_HOME
RUN ln -s $OOZIE_HOME/embedded-oozie-server/webapp/WEB-INF/lib lib
WORKDIR $OOZIE_HOME/embedded-oozie-server/webapp/WEB-INF/lib
RUN mv javax.servlet-3.0.0.v201112011016.jar javax.servlet-3.0.0.v201112011016.jar.bak
RUN mv javax.servlet-api-3.1.0.jar javax.servlet-api-3.1.0.jar.bak
RUN mv jetty-servlet-9.3.27.v20190418.jar jetty-servlet-9.3.27.v20190418.jar.bak
RUN mv jsp-2.1-6.1.14.jar jsp-2.1-6.1.14.jar.bak
RUN mv jsp-api-2.0.jar jsp-api-2.0.jar.bak
RUN mv log4j-1.2-api-2.6.2.jar log4j-1.2-api-2.6.2.jar.bak
RUN mv slf4j-log4j12-1.6.6.jar slf4j-log4j12-1.6.6.jar.bak
WORKDIR /

ext-2.2文件

官网下载地址:http://archive.cloudera.com/gplextras/misc/ext-2.2.zip

mapred-site.xml





    
        mapreduce.framework.name
        yarn
    
    
        mapreduce.jobhistory.address
        hadoop-master:10020
    
    
        mapreduce.jobhistory.webapp.address
        hadoop-master:19888
    


mysql-connector-java-8.0.19.jar文件

数据库链接的jar包,自己百度

oozie-5.2.0-distro.tar.gz文件

编译好的oozie包

oozie-env.sh

备用

oozie-log4j.properties文件(oozie自带)

#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#

#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. See accompanying LICENSE file.
#

# If the Java System property 'oozie.log.dir' is not defined at Oozie start up time
# XLogService sets its value to '${oozie.home}/logs'

# The appender that Oozie uses must be named 'oozie' (i.e. log4j.appender.oozie)

# Using the RollingFileAppender with the OozieRollingPolicy will roll the log file every hour and retain up to MaxHistory number of
# log files. If FileNamePattern ends with ".gz" it will create gzip files.
log4j.appender.oozie=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.oozie.RollingPolicy=org.apache.oozie.util.OozieRollingPolicy
log4j.appender.oozie.File=${oozie.log.dir}/oozie.log
log4j.appender.oozie.Append=true
log4j.appender.oozie.layout=org.apache.log4j.PatternLayout
log4j.appender.oozie.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - SERVER[${oozie.instance.id}] %m%n
# The FileNamePattern must end with "-%d{yyyy-MM-dd-HH}.gz" or "-%d{yyyy-MM-dd-HH}" and also start with the 
# value of log4j.appender.oozie.File
log4j.appender.oozie.RollingPolicy.FileNamePattern=${log4j.appender.oozie.File}-%d{yyyy-MM-dd-HH}
# The MaxHistory controls how many log files will be retained (720 hours / 24 hours per day = 30 days); -1 to disable
log4j.appender.oozie.RollingPolicy.MaxHistory=720



log4j.appender.oozieError=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.oozieError.RollingPolicy=org.apache.oozie.util.OozieRollingPolicy
log4j.appender.oozieError.File=${oozie.log.dir}/oozie-error.log
log4j.appender.oozieError.Append=true
log4j.appender.oozieError.layout=org.apache.log4j.PatternLayout
log4j.appender.oozieError.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - SERVER[${oozie.instance.id}] %m%n
# The FileNamePattern must end with "-%d{yyyy-MM-dd-HH}.gz" or "-%d{yyyy-MM-dd-HH}" and also start with the
# value of log4j.appender.oozieError.File
log4j.appender.oozieError.RollingPolicy.FileNamePattern=${log4j.appender.oozieError.File}-%d{yyyy-MM-dd-HH}
# The MaxHistory controls how many log files will be retained (720 hours / 24 hours per day = 30 days); -1 to disable
log4j.appender.oozieError.RollingPolicy.MaxHistory=720
log4j.appender.oozieError.filter.1 = org.apache.log4j.varia.LevelMatchFilter
log4j.appender.oozieError.filter.1.levelToMatch = WARN
log4j.appender.oozieError.filter.2 = org.apache.log4j.varia.LevelMatchFilter
log4j.appender.oozieError.filter.2.levelToMatch = ERROR
log4j.appender.oozieError.filter.3 = org.apache.log4j.varia.LevelMatchFilter
log4j.appender.oozieError.filter.3.levelToMatch = FATAL
log4j.appender.oozieError.filter.4 = org.apache.log4j.varia.DenyAllFilter



# Uncomment the below two lines to use the DailyRollingFileAppender instead
# The DatePattern must end with either "dd" or "HH"
#log4j.appender.oozie=org.apache.log4j.DailyRollingFileAppender
#log4j.appender.oozie.DatePattern='.'yyyy-MM-dd-HH

log4j.appender.oozieops=org.apache.log4j.DailyRollingFileAppender
log4j.appender.oozieops.DatePattern='.'yyyy-MM-dd
log4j.appender.oozieops.File=${oozie.log.dir}/oozie-ops.log
log4j.appender.oozieops.Append=true
log4j.appender.oozieops.layout=org.apache.log4j.PatternLayout
log4j.appender.oozieops.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n

log4j.appender.oozieinstrumentation=org.apache.log4j.DailyRollingFileAppender
log4j.appender.oozieinstrumentation.DatePattern='.'yyyy-MM-dd
log4j.appender.oozieinstrumentation.File=${oozie.log.dir}/oozie-instrumentation.log
log4j.appender.oozieinstrumentation.Append=true
log4j.appender.oozieinstrumentation.layout=org.apache.log4j.PatternLayout
log4j.appender.oozieinstrumentation.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n

log4j.appender.oozieaudit=org.apache.log4j.rolling.RollingFileAppender
log4j.appender.oozieaudit.RollingPolicy=org.apache.oozie.util.OozieRollingPolicy
log4j.appender.oozieaudit.File=${oozie.log.dir}/oozie-audit.log
log4j.appender.oozieaudit.Append=true
log4j.appender.oozieaudit.layout=org.apache.log4j.PatternLayout
log4j.appender.oozieaudit.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n
log4j.appender.oozieaudit.RollingPolicy.FileNamePattern=${log4j.appender.oozieaudit.File}.%d{yyyy-MM-dd}
log4j.appender.oozieaudit.RollingPolicy.MaxHistory=30


log4j.appender.openjpa=org.apache.log4j.DailyRollingFileAppender
log4j.appender.openjpa.DatePattern='.'yyyy-MM-dd
log4j.appender.openjpa.File=${oozie.log.dir}/oozie-jpa.log
log4j.appender.openjpa.Append=true
log4j.appender.openjpa.layout=org.apache.log4j.PatternLayout
log4j.appender.openjpa.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n
log4j.category.openjpa.Tool=INFO
log4j.category.openjpa.Runtime=INFO
log4j.category.openjpa.Remote=WARN
log4j.category.openjpa.DataCache=WARN
log4j.category.openjpa.MetaData=WARN
log4j.category.openjpa.Enhance=WARN
log4j.category.openjpa.Query=WARN
log4j.category.openjpa.jdbc.SQL=WARN
log4j.category.openjpa.jdbc.JDBC=WARN
log4j.category.openjpa.jdbc.Schema=WARN

log4j.appender.jetty=org.apache.log4j.DailyRollingFileAppender
log4j.appender.jetty.DatePattern='.'yyyy-MM-dd
log4j.appender.jetty.File=${oozie.log.dir}/jetty.log
log4j.appender.jetty.Append=true
log4j.appender.jetty.layout=org.apache.log4j.PatternLayout
log4j.appender.jetty.layout.ConversionPattern=%d{ISO8601} %5p %c{1}:%L - %m%n

log4j.appender.none=org.apache.log4j.varia.NullAppender

# Explicitly switch off root logger: anything interesting goes
# already either to jetty or one of the oozie appenders
log4j.rootLogger=NONE, none
log4j.logger.org.eclipse.jetty=INFO, jetty
log4j.logger.openjpa=INFO, openjpa
log4j.logger.oozieops=INFO, oozieops
log4j.logger.oozieinstrumentation=ALL, oozieinstrumentation
log4j.logger.oozieaudit=ALL, oozieaudit
log4j.logger.org.apache.oozie=INFO, oozie, oozieError
log4j.logger.org.apache.hadoop=WARN, oozie
log4j.logger.org.mortbay=WARN, oozie
log4j.logger.org.hsqldb=WARN, oozie
log4j.logger.org.apache.hadoop.security.authentication.server=WARN, oozie

oozie-site.xml文件



    
        oozie.service.ProxyUserService.proxyuser.root.hosts
        *
        
            List of hosts the '#USER#' user is allowed to perform 'doAs'
            operations.

            The '#USER#' must be replaced with the username o the user who is
            allowed to perform 'doAs' operations.

            The value can be the '*' wildcard or a list of hostnames.

            For multiple users copy this property and replace the user name
            in the property name.
        
    

    
        oozie.service.ProxyUserService.proxyuser.root.groups
        *
        
            List of groups the '#USER#' user is allowed to impersonate users
            from to perform 'doAs' operations.

            The '#USER#' must be replaced with the username o the user who is
            allowed to perform 'doAs' operations.

            The value can be the '*' wildcard or a list of groups.

            For multiple users copy this property and replace the user name
            in the property name.
        
    

    
        oozie.http.hostname
        hadoop-oozie
    
    
        oozie.db.schema.name
        oozie
    

    
        oozie.service.JPAService.create.db.schema
        false
    

    
        oozie.service.JPAService.validate.db.connection
        false
    
    
    
        oozie.service.JPAService.jdbc.url
        jdbc:mysql://beimei6-mysql/oozie?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useUnicode=true&useLegacyDatetimeCode=false&serverTimezone=GMT
    
    
    
        oozie.service.JPAService.jdbc.username
        root
    
    
    
        oozie.service.JPAService.jdbc.password
        password
    
    
    
        oozie.service.JPAService.jdbc.driver
        com.mysql.cj.jdbc.Driver
    

    
        oozie.service.JPAService.pool.max.active.conn
        10
    

    
        oozie.service.HadoopAccessorService.hadoop.configurations
        *=/usr/local/hadoop/etc/hadoop/
    
    
        oozie.service.HadoopAccessorService.action.configurations
        *=/usr/local/hadoop/etc/hadoop/
    




yarn-site.xml



    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
    
        yarn.resourcemanager.hostname
        hadoop-master
    
    
    
        yarn.nodemanager.vmem-check-enabled
        false
    
    
    
        yarn.resourcemanager.scheduler.class
        org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
    
    
    
        yarn.scheduler.fair.preemption
        true
    
    
    
        yarn.scheduler.fair.preemption.cluster-utilization-threshold
        1.0
    

    
    
        yarn.log-aggregation-enable
        true
    
    
    
        yarn.log.server.url
        http://hadoop-master:19888/jobhistory/logs
    



在day2(上一级)中编写oozie脚本

docker-compose-oozie.yml

version: "3"

services:
  hadoop-oozie:
    image: beimei6/oozie
    container_name: hadoop-oozie
    networks:
      beimei6-net:
        ipv4_address: 172.11.24.15
    stdin_open: true # -i interactive
    tty: true # -t tty
    entrypoint: ["sh" ,"-c","bash"]

install-oozie.sh

#!/bin/bash
cd oozie
docker build -t beimei6/oozie .
cd -
docker-compose -f docker-compose.yml  -f docker-compose-oozie.yml  up -d hadoop-oozie

最后在day2目录执行

bash install-oozie.sh

使用Docker构建的oozie容器就成功了

运行Oozie

准备:启动mysql
启动hadoop的dfs、yarn、historyserver

start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver

打开oozie容器:
安装unzip:

apt-get update
apt-get install unzip

进入$OOZIE_HOME/libext/ 将 ext-2.2.zip文件解压:

cd $OOZIE_HOME/libext/
unzip ext-2.2.zip

在 $OOZIE_HOME 执行:
运行oozie-setup.sh将新的sharelib上载到hdfs中

cd $OOZIE_HOME
bin/oozie-setup.sh sharelib create -fs hdfs://hadoop-master:9000

执行后会在hdfs上生成文件

使用“ ooziedb.sh”命令行工具创建Oozie DB:

bin/ooziedb.sh scare -sqlfile oozie.sql -run

执行会在mysql生成oozie库,中有12个表。
安装Oozie的一路坎坷之九九八十一_第4张图片

启动Oozie,后台运行:

$ bin/oozied.sh start

启动Oozie,前台运行(方便发现错误):

$ bin/oozied.sh run

检查Oozie日志文件logs / oozie.log,以确保Oozie正确启动。

使用Oozie命令行工具检查Oozie的状态:

$ bin/oozie admin -oozie http://hadoop-oozie:11000/oozie -status

使用浏览器转到Oozie Web控制台,Oozie的状态应为NORMAL
浏览器输入:

http://localhost:11000/oozie.html

安装Oozie的一路坎坷之九九八十一_第5张图片

运行案例测试, oozie安装包里自带的例子:map-reduce

案例存放在和编译完成的oozie同级目录中,如下图解压:
案例目录…/oozie-5.2.0-distro/oozie-5.2.0/examples/apps/
安装Oozie的一路坎坷之九九八十一_第6张图片
修改map-reduce里面的job.properties:安装Oozie的一路坎坷之九九八十一_第7张图片
这里用到修改namenode和jobtracker的配置。
这两个是参考自己hadoop的配置的,
namenode的配置是在hadoop的core-site.xml里面配置的,
jobtracker的配置是在hadoop的yarn-site.xml里面配置的yarn.resourcemanager.address属性。
以上配置我在上面已经配置好,自己需要在hadoop节点上分别修改加上这些配置,然后重启。
(oozie中的hadoop配置存放在/usr/local/hadoop/etc/hadoop/)

然后,把这个example上传到hdfs里面:
安装Oozie的一路坎坷之九九八十一_第8张图片
自己本机需要配置有hdfs climt 才能执行

hdfs dfs -put examples examples

然后就可以执行啦:

cd $OOZIE_HOME
bin/oozie job -oozie  http://hadoop-oozie:11000/oozie -config examples/apps/map-reduce/job.properties -run

job提高成功之后就会给一个job ID。之后就可以通过如下命令查看job的状态,最后那一串就是刚刚给的job ID:

 bin/oozie job -oozie http://hadoop-oozie:11000/oozie -info   0000000-200709041917672-oozie-root-W

安装Oozie的一路坎坷之九九八十一_第9张图片
然后就能看到SUCCEED的信息啦!!!!

你可能感兴趣的:(安装那些事情)