Oozie调度Pig job常见的问题及分析

Oozie调度Pig job常见的问题及分析

[email protected]

1.  Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.PigMain], exit code [7]

这个错误一开始让人一头雾水!查阅了"Programming Pig. 2011 version"才知道exit code[7]表示""ParseException thrown (can happen after parsing if variable substitution is being done)",这个解释依然让人费解。顺便在这里把所有的pig返回值都列出来,仅供参考。

Pig返回值及其意义:

返回值 意义 注释
0 成功  
1 Retriable failure  
2 Failure  
3 Partial failure Used with multiquery;
4 Illegal arguments passed to Pig  
5 IOException thrown Would usually be thrown by a UDF
6 PigException thrown Usually means a Python UDF raised an exception
7 ParseException thrown can happen after parsing if variable substitution is being done
8 Throwable thrown an unexpected exception

言归正传,既然这个错误依然费解,唯一的办法就是打开pig launchcer的job log,仔细琢磨一下,发现在日志中另有玄机:

Run pig script using PigRunner.run() for Pig version 0.8+
1332 [main] INFO  org.apache.pig.Main  - Logging error messages to: /hadoop/mapred/taskTracker/root/jobcache/job_201306261303_0109/attempt_201306261303_0109_m_000000_0/work/pig-job_201306261303_0109.log

Apache Pig version 0.8.1-cdh3u1 (rexported) 
compiled Jul 18 2011, 08:29:40

USAGE: Pig [options] [-] : Run interactively in grunt shell.
       Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
       Pig [options] [-f[ile]] file : Run cmds found in file.
  options include:
    -4, -log4jconf - Log4j configuration file, overrides log conf
    -b, -brief - Brief logging (no timestamps)
    -c, -check - Syntax check
    -d, -debug - Debug level, INFO is default
    -e, -execute - Commands to execute (within quotes)
    -f, -file - Path to the script to execute
    -h, -help - Display this message. You can specify topic to get help for that topic.
        properties is the only topic currently supported: -h properties.
    -i, -version - Display version information
.......

非常奇怪,为什么pig的help会出现在日志中呢?原来,我们在oozie的配置中调用pig job action时,可能是愚蠢的oozie将错误的参数传递给了pig。
那么如何fix这个问题呢?
查阅你所使用的oozie的版本,根据Oozie定义的schema,修改pig action,最好根据spec提供的例子,不要以为config是XML就认为元素的位置无所谓,因为oozie实在太蠢,所以要按照oozie workflow spec中的例子,元素前后顺序一个都不能差,照葫芦画瓢写pig action。比如,我碰到的情况是,configuration如果放到了script之后,oozie就不认识,就报错。下面的oozie workflow pig action是准对 Oozie 2.3.2-cdh3u4一个可用的例子:

<workflow-app name="myAnalytic" xmlns="uri:oozie:workflow:0.2">

	<start to="cleanupFailure" />

	......

	<action name="analytic_pig">
		<pig>
			<job-tracker>${jobTracker}</job-tracker>
			<name-node>${nameNode}</name-node>
			<configuration>
				<property>
					<name>oozie.launcher.mapred.child.java.opts</name>
					<value>-Xmx2048m</value>
				</property>
				<property>
					<name>pig.spill.extragc.size.threshold</name>
					<value>100000000</value>
				</property>
				<property>
					<name>mapred.child.java.opts</name>
					<value>-Xmx2048m</value>
				</property>
				<property>
					<name>mapred.user.jobconf.limit</name>
					<value>100000000</value>
				</property>
			</configuration>
			
			<script>${script}</script>
			<param>logType=${logType}</param>
			<param>avro_schema=${avroSchema}</param>
			<param>startDate=${startDate}</param>
			<param>logsDirectory=${logsDirectory}</param>
			<param>outputDirectory=${outputDirectory}</param>
			
			<!-- common libraries, need to be abstracted out in future -->
			<file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/json-simple-1.1.jar#json-simple-1.1.jar
			</file>
			<file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/avro-1.4.1.jar#avro-1.4.1.jar
			</file>
			
			<file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/pig-udfs.jar#pig-udfs.jar
			</file>
			<file>${nameNode}${appBaseFolder}/${project.artifactId}/lib/piggybank.jar#piggybank.jar
			</file>
			
		</pig>
		<ok to="unlock" />
		<error to="unlockOnError" />
	</action>
......
</workflow-app>






你可能感兴趣的:(pig,工作流,oozie)