2019独角兽企业重金招聘Python工程师标准>>>
1.azkaban源码下载
maven下载地址3.0.0版本:https://gitee.com/wenhaijin_830_8756/MyAzkaban
maven地址下载3.35.0版本:https://gitee.com/wenhaijin_830_8756/azkaban.git
官网下载地址:https://azkaban.github.io/downloads.html
2.azkaban的安装部署
下载完成MyAzkaban项目后,里面有一份部署文档“MyAzkaban-3.0.0使用文档.doc”,参照着该文档进行操作
安装完成后输入一下网址进行访问:https://ip:8443
3.部署过程中可能会遇到的一些坑
在进行项目部署的时候,遇到了一些坑,花了很长时间才解决,这边分享给大家,希望大家在部署的时候能够少走一些弯路
3.1官网项目非maven项目
官方提供的源码并不是maven项目,不支持maven编译及打包构建,如果想采用maven进行构建,则通过上面的第一个源码链接进行下载
3.2 安装完进行启动时候的坑
安装完成之后,一定要在bin文件的上一层目录进行启动
./bin/start-web.sh
而不能cd到bin目录里面进行启动,因为该启动脚本中引用到了当前位置目录信息
3.3 启动脚本可执行权限设置
启动脚本上传至服务器中默认是不具备可执行权限的,所以需要授予可执行权限
sudo chmod 755 xxx.sh
3.4 window和linux操作系统空格问题处理
对于shell脚本中的空格,window和linux操作系统是不兼容的,所以需要进行一个转化操作,具体转化过程可以参照以下文章:https://my.oschina.net/u/2988360/blog/868775
3.5 Multiple Executor Mode模式配置配置对executor主机内存限制
azkaban.use.multiple.executors=true
//execute主机过滤器配置
azkaban.executorselector.filters=StaticRemainingFlowSize,MinimumFreeMemory,CpuStatus
其中MinimumFreeMemory过滤器会检查executor主机空余内存是否会大于6G,如果不足6G,则web-server不会将任务交由该主机执行,具体源码如下:
private static final int MINIMUM_FREE_MEMORY = 6 * 1024;
/**
* function to register the static Minimum Reserved Memory filter.
* NOTE : this is a static filter which means the filter will be filtering based on the system standard which is not
* Coming for the passed flow.
* This filter will filter out any executors that has the remaining memory below 6G
*
* */
private static FactorFilter getMinimumReservedMemoryFilter(){
return FactorFilter.create(MINIMUMFREEMEMORY_FILTER_NAME, new FactorFilter.Filter() {
private static final int MINIMUM_FREE_MEMORY = 6 * 1024;
public boolean filterTarget(Executor filteringTarget, ExecutableFlow referencingObject) {
if (null == filteringTarget){
logger.debug(String.format("%s : filtering out the target as it is null.", MINIMUMFREEMEMORY_FILTER_NAME));
return false;
}
ExecutorInfo stats = filteringTarget.getExecutorInfo();
if (null == stats) {
logger.debug(String.format("%s : filtering out %s as it's stats is unavailable.",
MINIMUMFREEMEMORY_FILTER_NAME,
filteringTarget.toString()));
return false;
}
return stats.getRemainingMemoryInMB() > MINIMUM_FREE_MEMORY ;
}
});
}
CpuStatus过滤器会判断执行主机的cpu占用率是否达到95%,若达到95%,web-server也不会将任务交给该主机执行
/**
*
* function to register the static Minimum Reserved Memory filter.
* NOTE : this is a static filter which means the filter will be filtering based on the system standard which
* is not Coming for the passed flow.
* This filter will filter out any executors that the current CPU usage exceed 95%
*
* */
private static FactorFilter getCpuStatusFilter(){
return FactorFilter.create(CPUSTATUS_FILTER_NAME, new FactorFilter.Filter() {
private static final int MAX_CPU_CURRENT_USAGE = 95;
public boolean filterTarget(Executor filteringTarget, ExecutableFlow referencingObject) {
if (null == filteringTarget){
logger.debug(String.format("%s : filtering out the target as it is null.", CPUSTATUS_FILTER_NAME));
return false;
}
ExecutorInfo stats = filteringTarget.getExecutorInfo();
if (null == stats) {
logger.debug(String.format("%s : filtering out %s as it's stats is unavailable.",
MINIMUMFREEMEMORY_FILTER_NAME,
filteringTarget.toString()));
return false;
}
return stats.getCpuUsage() < MAX_CPU_CURRENT_USAGE ;
}
});
}
3.6 任务执行申请不到内存
如果任务执行失败,报错信息如下
14-09-2017 13:50:01 CST A INFO - Starting job A at 1505368201283
14-09-2017 13:50:01 CST A INFO - azkaban.webserver.url property was not set
14-09-2017 13:50:01 CST A INFO - job JVM args: -Dazkaban.flowid=C -Dazkaban.execid=184 -Dazkaban.jobid=A
14-09-2017 13:50:01 CST A INFO - Building command job executor.
14-09-2017 13:50:01 CST A ERROR - pluginLoadProps is null
14-09-2017 13:50:01 CST A ERROR - Job run failed!
java.lang.Exception: Cannot request memory (Xms 0 kb, Xmx 0 kb) from system for job A
at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:86)
at azkaban.execapp.JobRunner.runJob(JobRunner.java:590)
at azkaban.execapp.JobRunner.run(JobRunner.java:443)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
14-09-2017 13:50:01 CST A ERROR - Cannot request memory (Xms 0 kb, Xmx 0 kb) from system for job A cause: null
14-09-2017 13:50:01 CST A INFO - Finishing job A attempt: 0 at 1505368201336 with status FAILED
多半是因为所有执行主机内存不足引起,azkaban源码要求执行主机可用内存必须大于3G才能满足执行任务的条件
azkaban对应的源代码如下:
private static final long LOW_MEM_THRESHOLD = 3L*1024L*1024L; //3 GB
/**
* @param xms
* @param xmx
* @return System can satisfy the memory request or not
*
* Given Xms/Xmx values (in kb) used by java process, determine if system can
* satisfy the memory request
*/
public synchronized static boolean canSystemGrantMemory(long xms, long xmx, long freeMemDecrAmt) {
if (!memCheckEnabled) {
return true;
}
//too small amount of memory left, reject
if (freeMemAmount < LOW_MEM_THRESHOLD) {
logger.info(String.format("Free memory amount (%d kb) is less than low mem threshold (%d kb), memory request declined.",
freeMemAmount, LOW_MEM_THRESHOLD));
return false;
}
//let's get newest mem info
if (freeMemAmount >= LOW_MEM_THRESHOLD && freeMemAmount < 2 * LOW_MEM_THRESHOLD) {
logger.info(String.format("Free memory amount (%d kb) is less than 2x low mem threshold (%d kb), re-read /proc/meminfo",
freeMemAmount, LOW_MEM_THRESHOLD));
readMemoryInfoFile();
}
//too small amount of memory left, reject
if (freeMemAmount < LOW_MEM_THRESHOLD) {
logger.info(String.format("Free memory amount (%d kb) is less than low mem threshold (%d kb), memory request declined.",
freeMemAmount, LOW_MEM_THRESHOLD));
return false;
}
if (freeMemAmount - xmx < LOW_MEM_THRESHOLD) {
logger.info(String.format("Free memory amount minus xmx (%d - %d kb) is less than low mem threshold (%d kb), memory request declined.",
freeMemAmount, xmx, LOW_MEM_THRESHOLD));
return false;
}
if (freeMemDecrAmt > 0) {
freeMemAmount -= freeMemDecrAmt;
logger.info(String.format("Memory (%d kb) granted. Current free memory amount is %d kb", freeMemDecrAmt, freeMemAmount));
} else {
freeMemAmount -= xms;
logger.info(String.format("Memory (%d kb) granted. Current free memory amount is %d kb", xms, freeMemAmount));
}
return true;
}
3.7 Multiple Executor Mode模式部署目前还不支持主机及端口对应关系配置
Multiple Executor Mode模式部署目前还不支持主机及端口对应关系配置,所以需要手动执行sql往数据库表中插入数据
insert into executors(host,port) values("EXECUTOR_PORT",EXECUTOR_PORT);
4.源码包在windos中直接编译(本地需要安装git客户端)
1.window命令行切换到目标目录
2.git clone https://github.com/azkaban/azkaban
3.下载完成后 执行gradlew build -x test命令构建(跳过测试)
4.构建成功后找到server以及executor的buit目录的distributions目录下
5.azkaban3.35版本信息中报错问题解决
5.1 Missing required property 'azkaban.native.lib'报错解决
报错信息如下:
16-09-2017 19:48:28 CST A INFO - Starting job A at 1505562508575
16-09-2017 19:48:28 CST A INFO - azkaban.webserver.url property was not set
16-09-2017 19:48:28 CST A INFO - job JVM args: -Dazkaban.flowid=C -Dazkaban.execid=1 -Dazkaban.jobid=A
16-09-2017 19:48:28 CST A INFO - Building command job executor.
16-09-2017 19:48:28 CST A INFO - Memory granted for job A
16-09-2017 19:48:28 CST A INFO - 2 commands to execute.
16-09-2017 19:48:28 CST A INFO - cwd=/app/azkaban/source_buit/azkaban-exec-server-3.35.0/executions/1
16-09-2017 19:48:28 CST A INFO - effective user is: azkaban
16-09-2017 19:48:28 CST A ERROR - Job run failed!
azkaban.utils.UndefinedPropertyException: Missing required property 'azkaban.native.lib'
at azkaban.utils.Props.getString(Props.java:420)
at azkaban.jobExecutor.ProcessJob.run(ProcessJob.java:234)
at azkaban.execapp.JobRunner.runJob(JobRunner.java:748)
at azkaban.execapp.JobRunner.doRun(JobRunner.java:591)
at azkaban.execapp.JobRunner.run(JobRunner.java:552)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16-09-2017 19:48:28 CST A ERROR - Missing required property 'azkaban.native.lib' cause: null
16-09-2017 19:48:28 CST A INFO - Finishing job A at 1505562508845 with status FAILED
解决方案:
配置commonprivate.properties
5.2 界面样式问题处理
切换完最新源码(3.35.0)进行打包后,部署出来的界面存在样式问题
出现的原因,服务器中web-server目录下面的web文件夹我拷贝的是下面的目录
该目录下面并没有azkaban.css样式文件
所以出现了样式问题
解决办法:
使用编译后install目录下的web文件上传至服务器
配置完成后重新启动,界面展示正常:
说明:
Azkaban中的每个job都是一个进程,在Azkaban中判断job成功与否是根据这个进程是否成功执行完成,但是在MR 或者Spark Job执行的过程中,如果代码出错,运行在集群上的任务会停止,并不会有内容写入目标文件中,此时返回给Azkaban的进程是执行成功的,也就是job节点执行成功。这与任务执行的结果相悖。
例如:
在执行某个jar包的过程中时,出现了NullPointException,此时MR作业停止,但是最终Process 显示的为执行成功。并且节点最终执行的结果也为成功:
所以为了防止依赖的节点出现错误,其以下节点仍可运行的情况。需要换一个校验job是否正确执行的维度进行评判,比如检测MR 或者 Spark 任务的log文件是否正确执行等,或者检测集群中的任务是否执行成功。
总结:在执行结束后可以返回hdfs中查询是否有对应的文件生成,如果有则表示成功,没有则表示失败