azkaban的部署过程中遇到的一些坑(部署篇)

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

1.azkaban源码下载

maven下载地址3.0.0版本:https://gitee.com/wenhaijin_830_8756/MyAzkaban

maven地址下载3.35.0版本:https://gitee.com/wenhaijin_830_8756/azkaban.git

官网下载地址:https://azkaban.github.io/downloads.html

2.azkaban的安装部署

下载完成MyAzkaban项目后,里面有一份部署文档“MyAzkaban-3.0.0使用文档.doc”,参照着该文档进行操作

安装完成后输入一下网址进行访问:https://ip:8443

azkaban的部署过程中遇到的一些坑(部署篇)_第1张图片

azkaban的部署过程中遇到的一些坑(部署篇)_第2张图片

3.部署过程中可能会遇到的一些坑

在进行项目部署的时候,遇到了一些坑,花了很长时间才解决,这边分享给大家,希望大家在部署的时候能够少走一些弯路

3.1官网项目非maven项目

官方提供的源码并不是maven项目,不支持maven编译及打包构建,如果想采用maven进行构建,则通过上面的第一个源码链接进行下载

3.2 安装完进行启动时候的坑

安装完成之后,一定要在bin文件的上一层目录进行启动

./bin/start-web.sh

而不能cd到bin目录里面进行启动,因为该启动脚本中引用到了当前位置目录信息

3.3 启动脚本可执行权限设置

启动脚本上传至服务器中默认是不具备可执行权限的,所以需要授予可执行权限

sudo chmod 755 xxx.sh

3.4 window和linux操作系统空格问题处理

对于shell脚本中的空格,window和linux操作系统是不兼容的,所以需要进行一个转化操作,具体转化过程可以参照以下文章:https://my.oschina.net/u/2988360/blog/868775

3.5 Multiple Executor Mode模式配置配置对executor主机内存限制

azkaban.use.multiple.executors=true
//execute主机过滤器配置
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus

其中MinimumFreeMemory过滤器会检查executor主机空余内存是否会大于6G,如果不足6G,则web-server不会将任务交由该主机执行,具体源码如下:

private static final int MINIMUM_FREE_MEMORY = 6 * 1024;


/**
   * function to register the static Minimum Reserved Memory filter.
   * NOTE : this is a static filter which means the filter will be filtering based on the system standard which is not
   *        Coming for the passed flow.
   *        This filter will filter out any executors that has the remaining  memory below 6G
   *
* */ private static FactorFilter getMinimumReservedMemoryFilter(){ return FactorFilter.create(MINIMUMFREEMEMORY_FILTER_NAME, new FactorFilter.Filter() { private static final int MINIMUM_FREE_MEMORY = 6 * 1024; public boolean filterTarget(Executor filteringTarget, ExecutableFlow referencingObject) { if (null == filteringTarget){ logger.debug(String.format("%s : filtering out the target as it is null.", MINIMUMFREEMEMORY_FILTER_NAME)); return false; } ExecutorInfo stats = filteringTarget.getExecutorInfo(); if (null == stats) { logger.debug(String.format("%s : filtering out %s as it's stats is unavailable.", MINIMUMFREEMEMORY_FILTER_NAME, filteringTarget.toString())); return false; } return stats.getRemainingMemoryInMB() > MINIMUM_FREE_MEMORY ; } }); }

CpuStatus过滤器会判断执行主机的cpu占用率是否达到95%,若达到95%,web-server也不会将任务交给该主机执行

 /**
   * 
   * function to register the static Minimum Reserved Memory filter.
   * NOTE :  this is a static filter which means the filter will be filtering based on the system standard which
   *        is not Coming for the passed flow.
   *        This filter will filter out any executors that the current CPU usage exceed 95%
   * 
* */ private static FactorFilter getCpuStatusFilter(){ return FactorFilter.create(CPUSTATUS_FILTER_NAME, new FactorFilter.Filter() { private static final int MAX_CPU_CURRENT_USAGE = 95; public boolean filterTarget(Executor filteringTarget, ExecutableFlow referencingObject) { if (null == filteringTarget){ logger.debug(String.format("%s : filtering out the target as it is null.", CPUSTATUS_FILTER_NAME)); return false; } ExecutorInfo stats = filteringTarget.getExecutorInfo(); if (null == stats) { logger.debug(String.format("%s : filtering out %s as it's stats is unavailable.", MINIMUMFREEMEMORY_FILTER_NAME, filteringTarget.toString())); return false; } return stats.getCpuUsage() < MAX_CPU_CURRENT_USAGE ; } }); }

3.6 任务执行申请不到内存

如果任务执行失败,报错信息如下

14-09-2017 13:50:01 CST A INFO - Starting job A at 1505368201283
14-09-2017 13:50:01 CST A INFO - azkaban.webserver.url property was not set
14-09-2017 13:50:01 CST A INFO - job JVM args: -Dazkaban.flowid=C -Dazkaban.execid=184 -Dazkaban.jobid=A
14-09-2017 13:50:01 CST A INFO - Building command job executor. 
14-09-2017 13:50:01 CST A ERROR - pluginLoadProps is null
14-09-2017 13:50:01 CST A ERROR - Job run failed!
java.lang.Exception: Cannot request memory (Xms 0 kb, Xmx 0 kb) from system for job A
	at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:86)
	at azkaban.execapp.JobRunner.runJob(JobRunner.java:590)
	at azkaban.execapp.JobRunner.run(JobRunner.java:443)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
14-09-2017 13:50:01 CST A ERROR - Cannot request memory (Xms 0 kb, Xmx 0 kb) from system for job A cause: null
14-09-2017 13:50:01 CST A INFO - Finishing job A attempt: 0 at 1505368201336 with status FAILED

多半是因为所有执行主机内存不足引起,azkaban源码要求执行主机可用内存必须大于3G才能满足执行任务的条件

azkaban的部署过程中遇到的一些坑(部署篇)_第3张图片

azkaban对应的源代码如下:

 private static final long LOW_MEM_THRESHOLD = 3L*1024L*1024L; //3 GB

/**
   * @param xms
   * @param xmx
   * @return System can satisfy the memory request or not
   * 
   * Given Xms/Xmx values (in kb) used by java process, determine if system can
   * satisfy the memory request
   */
  public synchronized static boolean canSystemGrantMemory(long xms, long xmx, long freeMemDecrAmt) {
    if (!memCheckEnabled) {
      return true;
    }

    //too small amount of memory left, reject
    if (freeMemAmount < LOW_MEM_THRESHOLD) {
      logger.info(String.format("Free memory amount (%d kb) is less than low mem threshold (%d kb),  memory request declined.",
              freeMemAmount, LOW_MEM_THRESHOLD));
      return false;
    }

    //let's get newest mem info
    if (freeMemAmount >= LOW_MEM_THRESHOLD && freeMemAmount < 2 * LOW_MEM_THRESHOLD) {
      logger.info(String.format("Free memory amount (%d kb) is less than 2x low mem threshold (%d kb),  re-read /proc/meminfo",
              freeMemAmount, LOW_MEM_THRESHOLD));
      readMemoryInfoFile();
    }

    //too small amount of memory left, reject
    if (freeMemAmount < LOW_MEM_THRESHOLD) {
      logger.info(String.format("Free memory amount (%d kb) is less than low mem threshold (%d kb),  memory request declined.",
              freeMemAmount, LOW_MEM_THRESHOLD));
      return false;
    }

    if (freeMemAmount - xmx < LOW_MEM_THRESHOLD) {
      logger.info(String.format("Free memory amount minus xmx (%d - %d kb) is less than low mem threshold (%d kb),  memory request declined.",
              freeMemAmount, xmx, LOW_MEM_THRESHOLD));
      return false;
    }

    if (freeMemDecrAmt > 0) {
      freeMemAmount -= freeMemDecrAmt;
      logger.info(String.format("Memory (%d kb) granted. Current free memory amount is %d kb", freeMemDecrAmt, freeMemAmount));
    } else {
      freeMemAmount -= xms;
      logger.info(String.format("Memory (%d kb) granted. Current free memory amount is %d kb", xms, freeMemAmount));
    }
    
    return true;
  }

3.7 Multiple Executor Mode模式部署目前还不支持主机及端口对应关系配置

Multiple Executor Mode模式部署目前还不支持主机及端口对应关系配置,所以需要手动执行sql往数据库表中插入数据

insert into executors(host,port) values("EXECUTOR_PORT",EXECUTOR_PORT);

azkaban的部署过程中遇到的一些坑(部署篇)_第4张图片

4.源码包在windos中直接编译(本地需要安装git客户端)

1.window命令行切换到目标目录
2.git clone https://github.com/azkaban/azkaban
3.下载完成后 执行gradlew build -x test命令构建(跳过测试)
4.构建成功后找到server以及executor的buit目录的distributions目录下

5.azkaban3.35版本信息中报错问题解决

5.1 Missing required property 'azkaban.native.lib'报错解决

    报错信息如下:

16-09-2017 19:48:28 CST A INFO - Starting job A at 1505562508575
16-09-2017 19:48:28 CST A INFO - azkaban.webserver.url property was not set
16-09-2017 19:48:28 CST A INFO - job JVM args: -Dazkaban.flowid=C -Dazkaban.execid=1 -Dazkaban.jobid=A
16-09-2017 19:48:28 CST A INFO - Building command job executor. 
16-09-2017 19:48:28 CST A INFO - Memory granted for job A
16-09-2017 19:48:28 CST A INFO - 2 commands to execute.
16-09-2017 19:48:28 CST A INFO - cwd=/app/azkaban/source_buit/azkaban-exec-server-3.35.0/executions/1
16-09-2017 19:48:28 CST A INFO - effective user is: azkaban
16-09-2017 19:48:28 CST A ERROR - Job run failed!
azkaban.utils.UndefinedPropertyException: Missing required property 'azkaban.native.lib'
	at azkaban.utils.Props.getString(Props.java:420)
	at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:234)
	at azkaban.execapp.JobRunner.runJob(JobRunner.java:748)
	at azkaban.execapp.JobRunner.doRun(JobRunner.java:591)
	at azkaban.execapp.JobRunner.run(JobRunner.java:552)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
16-09-2017 19:48:28 CST A ERROR - Missing required property 'azkaban.native.lib' cause: null
16-09-2017 19:48:28 CST A INFO - Finishing job A at 1505562508845 with status FAILED

解决方案:

配置commonprivate.properties

azkaban的部署过程中遇到的一些坑(部署篇)_第5张图片

azkaban的部署过程中遇到的一些坑(部署篇)_第6张图片

5.2 界面样式问题处理

切换完最新源码(3.35.0)进行打包后,部署出来的界面存在样式问题

azkaban的部署过程中遇到的一些坑(部署篇)_第7张图片

出现的原因,服务器中web-server目录下面的web文件夹我拷贝的是下面的目录

azkaban的部署过程中遇到的一些坑(部署篇)_第8张图片

该目录下面并没有azkaban.css样式文件

azkaban的部署过程中遇到的一些坑(部署篇)_第9张图片

所以出现了样式问题

解决办法:

使用编译后install目录下的web文件上传至服务器

azkaban的部署过程中遇到的一些坑(部署篇)_第10张图片

配置完成后重新启动,界面展示正常:

azkaban的部署过程中遇到的一些坑(部署篇)_第11张图片

说明:

Azkaban中的每个job都是一个进程,在Azkaban中判断job成功与否是根据这个进程是否成功执行完成,但是在MR 或者Spark Job执行的过程中,如果代码出错,运行在集群上的任务会停止,并不会有内容写入目标文件中,此时返回给Azkaban的进程是执行成功的,也就是job节点执行成功。这与任务执行的结果相悖。

例如:

在执行某个jar包的过程中时,出现了NullPointException,此时MR作业停止,但是最终Process 显示的为执行成功。并且节点最终执行的结果也为成功:

所以为了防止依赖的节点出现错误,其以下节点仍可运行的情况。需要换一个校验job是否正确执行的维度进行评判,比如检测MR 或者 Spark 任务的log文件是否正确执行等,或者检测集群中的任务是否执行成功。

总结:在执行结束后可以返回hdfs中查询是否有对应的文件生成,如果有则表示成功,没有则表示失败

 

 

转载于:https://my.oschina.net/u/2988360/blog/1537561

你可能感兴趣的:(azkaban的部署过程中遇到的一些坑(部署篇))