使用oozie调度sqoop从oracle导入hbase

   最近在使用sqoop把oracle中的数据导入到hbase中, 表中的数据每个小时导入一次,使用oozie定时促发。

    hadoop版本:hadoop-2.0.0-cdh4.3.0
    oozie版本:oozie-3.3.2-cdh4.3.0
    sqoop版本:sqoop-1.4.3-cdh4.3.0

   相应的配置如下:
   coordinator.xml
  
   <coordinator-app name="cfg_check_formula-coord" frequency="${coord:hours(1)}" start="${start}" end="${end}" timezone="UTC"
                 xmlns="uri:oozie:coordinator:0.2">
    <controls>
        <concurrency>1</concurrency>
    </controls>

    <action>
        <workflow>
            <app-path>${nameNode}/user/${coord:user()}/${tescommRoot}/apps/sqoop/cfg_check_formula</app-path>
        </workflow>
    </action>
</coordinator-app>
   


    workflow.xml
<workflow-app xmlns="uri:oozie:workflow:0.2" name="sqoop-cfg_check_formula-wf">
    <start to="sqoop-node"/>

    <action name="sqoop-node">
        <sqoop xmlns="uri:oozie:sqoop-action:0.2">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
  <!--          <prepare>
                <delete path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data/sqoop"/>
                <mkdir path="${nameNode}/user/${wf:user()}/${examplesRoot}/output-data"/>
            </prepare>
  -->
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <arg>import</arg>
            <arg>--connect</arg>
            <arg>jdbc:oracle:thin:@127.0.0.1:1523:TEST</arg>
            <arg>--username</arg>
            <arg>ora</arg>
            <arg>--password</arg>
            <arg>111</arg>
            <arg>--m</arg>
            <arg>1</arg>
            <arg>--query</arg>
            <arg>SELECT ROWID, a.* FROM cfg_check_formula a WHERE $CONDITIONS</arg>
            <arg>--map-column-java</arg>
            <arg>ROWID=String</arg>
            <arg>--hbase-table</arg>
            <arg>cfg_check_formula</arg>
            <arg>--hbase-row-key</arg>
            <arg>ROWID</arg>
            <arg>--column-family</arg>
            <arg>f_cfg_check_formula</arg>
        </sqoop>
        <ok to="end"/>
        <error to="fail"/>
    </action>

    <kill name="fail">
        <message>Sqoop import cfg_check_formula failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>
</workflow-app>


job.properties

nameNode=hdfs://master:8020
jobTracker=master:8032
queueName=default
testRoot=test
oozie.use.system.libpath=true
oozie.coord.application.path=${nameNode}/user/${user.name}/${testRoot}/apps/sqoop/cfg_check_formula
#start=2013-08-29T10:00Z
#end=2013-08-29T12:00Z
start=2013-09-04T11:00+0800
end=2013-09-04T12:00+0800


上述配置完后,在workflow目录下创建lib目录,把sqoop lib下的jar拷贝至这个目录下。这样我们就可以通过oozie定时启动这个导入任务。











   

你可能感兴趣的:(java,hadoop,oozie,sqoop)