Java集成DataX

Java集成DataX

  • DataX介绍直接看官网
  • 代码集成
    • DataX编译打包
    • 依赖集成
    • 编写测试代码
      • 1.新建一个测试job文件,testjob.json
      • 2.编写一个main方法
      • 3.运行测试
    • 参考
    • demo代码下载

DataX介绍直接看官网

DataX官网

代码集成

DataX编译打包

下载了DataX源代码后,本地编译打包,核心模块是 core、common、transformer必须要达成jar包,另外其他模块都是插件包,根据实际需要进行打包,本用例使用到了streamreader和streamwriter,因此对这两个模块也进行了打包。

依赖集成

新建测试工程,引入对应的jar包

	
		com.alibaba.datax
		datax-core
		0.0.1-SNAPSHOT
	
	
		com.alibaba.datax
		streamreader
		0.0.1-SNAPSHOT
	
	
		com.alibaba.datax
		streamwriter
		0.0.1-SNAPSHOT
	

除了引入jar包,工程里还需要复制相应的配置文件,如下图所以,这些文件都可以从官方的DataX源代码工程里找到,复制过来即可。
Java集成DataX_第1张图片

编写测试代码

1.新建一个测试job文件,testjob.json

如下内容
{
“job”: {
“content”: [
{
“reader”: {
“name”: “streamreader”,
“parameter”: {
“sliceRecordCount”: 1,
“column”: [
{
“type”: “long”,
“value”: “10”
},
{
“type”: “string”,
“value”: “hello,你好,世界-DataX”
}
]
}
},
“writer”: {
“name”: “streamwriter”,
“parameter”: {
“encoding”: “UTF-8”,
“print”: true
}
}
}
],
“setting”: {
“speed”: {
“channel”: 1
}
}
}
}

2.编写一个main方法

public class DataxTest {

public static void main(String[] args) {
    System.out.println(getCurrentClasspath());
    System.setProperty("datax.home", getCurrentClasspath());
    // 替换job中的占位符
    System.setProperty("now", LocalTime.now().toString());
    String[] datxArgs = {"-job", getCurrentClasspath() + "/job/testjob.json", "-mode", "standalone", "-jobid", "-1"};
    try {
        Engine.entry(datxArgs);
    } catch (Throwable e) {
        e.printStackTrace();
    }
}

public static String getCurrentClasspath() {
    ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
    String currentClasspath = classLoader.getResource("").getPath();
    // 当前操作系统
    String osName = System.getProperty("os.name");
    if (osName.startsWith("Windows")) {
        // 删除path中最前面的/
        currentClasspath = currentClasspath.substring(1);
    }
    return currentClasspath;
}

}

3.运行测试

可以看到控制台打印的日志如下:
21:11:01.582 [main] INFO com.alibaba.datax.common.statistics.VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
21:11:01.593 [main] INFO com.alibaba.datax.core.Engine - the machine info =>

osInfo:	Oracle Corporation 1.8 25.121-b13
jvmInfo:	Mac OS X x86_64 10.14.6
cpu num:	8

totalPhysicalMemory:	-0.00G
freePhysicalMemory:	-0.00G
maxFileDescriptorCount:	-1
currentOpenFileDescriptorCount:	-1

GC Names	[PS MarkSweep, PS Scavenge]

MEMORY_NAME                    | allocation_size                | init_size                      
PS Eden Space                  | 1,344.00MB                     | 64.00MB                        
Code Cache                     | 240.00MB                       | 2.44MB                         
Compressed Class Space         | 1,024.00MB                     | 0.00MB                         
PS Survivor Space              | 10.50MB                        | 10.50MB                        
PS Old Gen                     | 2,731.00MB                     | 171.00MB                       
Metaspace                      | -0.00MB                        | 0.00MB                         

21:11:01.619 [main] INFO com.alibaba.datax.core.Engine -
{
“content”:[
{
“reader”:{
“name”:“streamreader”,
“parameter”:{
“column”:[
{
“type”:“long”,
“value”:“10”
},
{
“type”:“string”,
“value”:“hello,你好,世界-DataX”
}
],
“sliceRecordCount”:1
}
},
“writer”:{
“name”:“streamwriter”,
“parameter”:{
“encoding”:“UTF-8”,
“print”:true
}
}
}
],
“setting”:{
“speed”:{
“channel”:1
}
}
}

21:11:01.621 [main] DEBUG com.alibaba.datax.core.Engine - {“common”:{“column”:{“dateFormat”:“yyyy-MM-dd”,“datetimeFormat”:“yyyy-MM-dd HH:mm:ss”,“encoding”:“utf-8”,“extraFormats”:[“yyyyMMdd”],“timeFormat”:“HH:mm:ss”,“timeZone”:“GMT+8”}},“core”:{“container”:{“job”:{“id”:-1,“reportInterval”:10000},“taskGroup”:{“channel”:5},“trace”:{“enable”:“false”}},“dataXServer”:{“address”:“http://localhost:7001/api”,“reportDataxLog”:false,“reportPerfLog”:false,“timeout”:10000},“statistics”:{“collector”:{“plugin”:{“maxDirtyNumber”:10,“taskClass”:“com.alibaba.datax.core.statistics.plugin.task.StdoutPluginCollector”}}},“transport”:{“channel”:{“byteCapacity”:67108864,“capacity”:512,“class”:“com.alibaba.datax.core.transport.channel.memory.MemoryChannel”,“flowControlInterval”:20,“speed”:{“byte”:-1,“record”:-1}},“exchanger”:{“bufferSize”:32,“class”:“com.alibaba.datax.core.plugin.BufferedRecordExchanger”}}},“entry”:{“jvm”:"-Xms1G -Xmx1G"},“job”:{“content”:[{“reader”:{“name”:“streamreader”,“parameter”:{“column”:[{“type”:“long”,“value”:“10”},{“type”:“string”,“value”:“hello,你好,世界-DataX”}],“sliceRecordCount”:1}},“writer”:{“name”:“streamwriter”,“parameter”:{“encoding”:“UTF-8”,“print”:true}}}],“setting”:{“speed”:{“channel”:1}}},“plugin”:{“reader”:{“streamreader”:{“class”:“com.alibaba.datax.plugin.reader.streamreader.StreamReader”,“description”:{“mechanism”:“use datax framework to transport data from stream.”,“useScene”:“only for developer test.”,“warn”:“Never use it in your real job.”},“developer”:“alibaba”,“name”:“streamreader”,“path”:"/Users/zhikelin/lzk/source_zwdinding/dataxdemo/target/classes//plugin/reader/plugin.json"}},“writer”:{“streamwriter”:{“class”:“com.alibaba.datax.plugin.writer.streamwriter.StreamWriter”,“description”:{“mechanism”:“use datax framework to transport data to stream.”,“useScene”:“only for developer test.”,“warn”:“Never use it in your real job.”},“developer”:“alibaba”,“name”:“streamwriter”,“path”:"/Users/zhikelin/lzk/source_zwdinding/dataxdemo/target/classes//plugin/writer/plugin.json"}}}}
21:11:01.656 [main] WARN com.alibaba.datax.core.Engine - prioriy set to 0, because NumberFormatException, the value is: null
21:11:01.659 [main] INFO com.alibaba.datax.common.statistics.PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
21:11:01.659 [main] INFO com.alibaba.datax.core.job.JobContainer - DataX jobContainer starts job.
21:11:01.661 [main] DEBUG com.alibaba.datax.core.job.JobContainer - jobContainer starts to do preHandle …
21:11:01.661 [main] DEBUG com.alibaba.datax.core.job.JobContainer - jobContainer starts to do init …
21:11:01.661 [main] INFO com.alibaba.datax.core.job.JobContainer - Set jobId = 0
21:11:01.673 [job-0] INFO com.alibaba.datax.core.job.JobContainer - jobContainer starts to do prepare …
21:11:01.674 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Reader.Job [streamreader] do prepare work .
21:11:01.674 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Writer.Job [streamwriter] do prepare work .
21:11:01.674 [job-0] INFO com.alibaba.datax.core.job.JobContainer - jobContainer starts to do split …
21:11:01.675 [job-0] INFO com.alibaba.datax.core.job.JobContainer - Job set Channel-Number to 1 channels.
21:11:01.675 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
21:11:01.677 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
21:11:01.677 [job-0] DEBUG com.alibaba.datax.core.job.JobContainer - transformer configuration: null
21:11:01.705 [job-0] DEBUG com.alibaba.datax.core.job.JobContainer - contentConfig configuration: [{“internal”:{“reader”:{“name”:“streamreader”,“parameter”:{“column”:["{“type”:“long”,“value”:“10”}","{“type”:“string”,“value”:“hello,你好,世界-DataX”}"],“sliceRecordCount”:1}},“taskId”:0,“writer”:{“name”:“streamwriter”,“parameter”:{“encoding”:“UTF-8”,“print”:true}}},“keys”:[“writer.parameter.print”,“writer.name”,“reader.name”,“writer.parameter.encoding”,“reader.parameter.column[1]”,“reader.parameter.column[0]”,“taskId”,“reader.parameter.sliceRecordCount”],“secretKeyPathSet”:[]}]
21:11:01.705 [job-0] INFO com.alibaba.datax.core.job.JobContainer - jobContainer starts to do schedule …
21:11:01.710 [job-0] INFO com.alibaba.datax.core.job.JobContainer - Scheduler starts [1] taskGroups.
21:11:01.713 [job-0] INFO com.alibaba.datax.core.job.JobContainer - Running by standalone Mode.
21:11:01.723 [taskGroup-0] DEBUG com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroup[0]'s task configs[[{“internal”:{“reader”:{“name”:“streamreader”,“parameter”:{“column”:["{“type”:“long”,“value”:“10”}","{“type”:“string”,“value”:“hello,你好,世界-DataX”}"],“sliceRecordCount”:1}},“taskId”:0,“writer”:{“name”:“streamwriter”,“parameter”:{“encoding”:“UTF-8”,“print”:true}}},“keys”:[“writer.parameter.print”,“writer.name”,“reader.name”,“writer.parameter.encoding”,“reader.parameter.column[1]”,“reader.parameter.column[0]”,“taskId”,“reader.parameter.sliceRecordCount”],“secretKeyPathSet”:[]}]]
21:11:01.724 [taskGroup-0] INFO com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
21:11:01.729 [taskGroup-0] INFO com.alibaba.datax.core.transport.channel.Channel - Channel set byte_speed_limit to -1, No bps activated.
21:11:01.729 [taskGroup-0] INFO com.alibaba.datax.core.transport.channel.Channel - Channel set record_speed_limit to -1, No tps activated.
21:11:01.737 [job-0] DEBUG com.alibaba.datax.core.job.scheduler.AbstractScheduler - com.alibaba.datax.core.statistics.communication.Communication@a1153bc[
counter={}
message={}
state=RUNNING
throwable=
timestamp=1594905061723
]
21:11:01.747 [taskGroup-0] INFO com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
21:11:01.747 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to do init …
21:11:01.747 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to do init …
21:11:01.748 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to do prepare …
21:11:01.748 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to read …
21:11:01.748 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to do prepare …
21:11:01.748 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to write …
21:11:01.753 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to do post …
21:11:01.753 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to do destroy …
10 hello,你好,世界-DataX
21:11:01.754 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to do post …
21:11:01.754 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to do destroy …
21:11:01.851 [taskGroup-0] INFO com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[105]ms
21:11:01.852 [taskGroup-0] INFO com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroup[0] completed it’s tasks.
21:11:11.743 [job-0] DEBUG com.alibaba.datax.core.job.scheduler.AbstractScheduler - com.alibaba.datax.core.statistics.communication.Communication@34123d65[
counter={writeSucceedRecords=2, readSucceedRecords=1, totalErrorBytes=0, writeSucceedBytes=19, byteSpeed=0, totalErrorRecords=0, recordSpeed=0, waitReaderTime=0, writeReceivedBytes=19, stage=1, waitWriterTime=53545, percentage=1.0, totalReadRecords=1, writeReceivedRecords=2, readSucceedBytes=19, totalReadBytes=19}
message={}
state=SUCCEEDED
throwable=
timestamp=1594905071742
]
21:11:11.745 [job-0] INFO com.alibaba.datax.core.statistics.container.communicator.job.StandAloneJobContainerCommunicator - Total 1 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
21:11:11.745 [job-0] INFO com.alibaba.datax.core.job.scheduler.AbstractScheduler - Scheduler accomplished all tasks.
21:11:11.745 [job-0] DEBUG com.alibaba.datax.core.job.JobContainer - jobContainer starts to do post …
21:11:11.746 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Writer.Job [streamwriter] do post work.
21:11:11.746 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Reader.Job [streamreader] do post work.
21:11:11.746 [job-0] DEBUG com.alibaba.datax.core.job.JobContainer - jobContainer starts to do postHandle …
21:11:11.746 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX jobId [0] completed successfully.
21:11:11.747 [job-0] INFO com.alibaba.datax.core.container.util.HookInvoker - No hook invoked, because base dir not exists or is a file: /Users/zhikelin/lzk/source_zwdinding/dataxdemo/target/classes/hook
21:11:11.751 [job-0] INFO com.alibaba.datax.core.job.JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%

 [total gc info] => 
	 NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTime     
	 PS MarkSweep         | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             
	 PS Scavenge          | 0                  | 0                  | 0                  | 0.000s             | 0.000s             | 0.000s             

21:11:11.752 [job-0] INFO com.alibaba.datax.core.job.JobContainer - PerfTrace not enable!
21:11:11.752 [job-0] INFO com.alibaba.datax.core.statistics.container.communicator.job.StandAloneJobContainerCommunicator - Total 1 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
21:11:11.753 [job-0] INFO com.alibaba.datax.core.job.JobContainer -
任务启动时刻 : 2020-07-16 21:11:01
任务结束时刻 : 2020-07-16 21:11:11
任务总计耗时 : 10s
任务平均流量 : 1B/s
记录写入速度 : 0rec/s
读出记录总数 : 1
读写失败总数 : 0

Disconnected from the target VM, address: ‘127.0.0.1:54202’, transport: ‘socket’

Process finished with exit code 0

参考

https://developer.aliyun.com/article/642896

demo代码下载

dataxdemo下载

你可能感兴趣的:(Java,技术资料,datax)