DataX官网
下载了DataX源代码后,本地编译打包,核心模块是 core、common、transformer必须要达成jar包,另外其他模块都是插件包,根据实际需要进行打包,本用例使用到了streamreader和streamwriter,因此对这两个模块也进行了打包。
新建测试工程,引入对应的jar包
com.alibaba.datax
datax-core
0.0.1-SNAPSHOT
com.alibaba.datax
streamreader
0.0.1-SNAPSHOT
com.alibaba.datax
streamwriter
0.0.1-SNAPSHOT
除了引入jar包,工程里还需要复制相应的配置文件,如下图所以,这些文件都可以从官方的DataX源代码工程里找到,复制过来即可。
如下内容
{
“job”: {
“content”: [
{
“reader”: {
“name”: “streamreader”,
“parameter”: {
“sliceRecordCount”: 1,
“column”: [
{
“type”: “long”,
“value”: “10”
},
{
“type”: “string”,
“value”: “hello,你好,世界-DataX”
}
]
}
},
“writer”: {
“name”: “streamwriter”,
“parameter”: {
“encoding”: “UTF-8”,
“print”: true
}
}
}
],
“setting”: {
“speed”: {
“channel”: 1
}
}
}
}
public class DataxTest {
public static void main(String[] args) {
System.out.println(getCurrentClasspath());
System.setProperty("datax.home", getCurrentClasspath());
// 替换job中的占位符
System.setProperty("now", LocalTime.now().toString());
String[] datxArgs = {"-job", getCurrentClasspath() + "/job/testjob.json", "-mode", "standalone", "-jobid", "-1"};
try {
Engine.entry(datxArgs);
} catch (Throwable e) {
e.printStackTrace();
}
}
public static String getCurrentClasspath() {
ClassLoader classLoader = Thread.currentThread().getContextClassLoader();
String currentClasspath = classLoader.getResource("").getPath();
// 当前操作系统
String osName = System.getProperty("os.name");
if (osName.startsWith("Windows")) {
// 删除path中最前面的/
currentClasspath = currentClasspath.substring(1);
}
return currentClasspath;
}
}
可以看到控制台打印的日志如下:
21:11:01.582 [main] INFO com.alibaba.datax.common.statistics.VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
21:11:01.593 [main] INFO com.alibaba.datax.core.Engine - the machine info =>
osInfo: Oracle Corporation 1.8 25.121-b13
jvmInfo: Mac OS X x86_64 10.14.6
cpu num: 8
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME | allocation_size | init_size
PS Eden Space | 1,344.00MB | 64.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 10.50MB | 10.50MB
PS Old Gen | 2,731.00MB | 171.00MB
Metaspace | -0.00MB | 0.00MB
21:11:01.619 [main] INFO com.alibaba.datax.core.Engine -
{
“content”:[
{
“reader”:{
“name”:“streamreader”,
“parameter”:{
“column”:[
{
“type”:“long”,
“value”:“10”
},
{
“type”:“string”,
“value”:“hello,你好,世界-DataX”
}
],
“sliceRecordCount”:1
}
},
“writer”:{
“name”:“streamwriter”,
“parameter”:{
“encoding”:“UTF-8”,
“print”:true
}
}
}
],
“setting”:{
“speed”:{
“channel”:1
}
}
}
21:11:01.621 [main] DEBUG com.alibaba.datax.core.Engine - {“common”:{“column”:{“dateFormat”:“yyyy-MM-dd”,“datetimeFormat”:“yyyy-MM-dd HH:mm:ss”,“encoding”:“utf-8”,“extraFormats”:[“yyyyMMdd”],“timeFormat”:“HH:mm:ss”,“timeZone”:“GMT+8”}},“core”:{“container”:{“job”:{“id”:-1,“reportInterval”:10000},“taskGroup”:{“channel”:5},“trace”:{“enable”:“false”}},“dataXServer”:{“address”:“http://localhost:7001/api”,“reportDataxLog”:false,“reportPerfLog”:false,“timeout”:10000},“statistics”:{“collector”:{“plugin”:{“maxDirtyNumber”:10,“taskClass”:“com.alibaba.datax.core.statistics.plugin.task.StdoutPluginCollector”}}},“transport”:{“channel”:{“byteCapacity”:67108864,“capacity”:512,“class”:“com.alibaba.datax.core.transport.channel.memory.MemoryChannel”,“flowControlInterval”:20,“speed”:{“byte”:-1,“record”:-1}},“exchanger”:{“bufferSize”:32,“class”:“com.alibaba.datax.core.plugin.BufferedRecordExchanger”}}},“entry”:{“jvm”:"-Xms1G -Xmx1G"},“job”:{“content”:[{“reader”:{“name”:“streamreader”,“parameter”:{“column”:[{“type”:“long”,“value”:“10”},{“type”:“string”,“value”:“hello,你好,世界-DataX”}],“sliceRecordCount”:1}},“writer”:{“name”:“streamwriter”,“parameter”:{“encoding”:“UTF-8”,“print”:true}}}],“setting”:{“speed”:{“channel”:1}}},“plugin”:{“reader”:{“streamreader”:{“class”:“com.alibaba.datax.plugin.reader.streamreader.StreamReader”,“description”:{“mechanism”:“use datax framework to transport data from stream.”,“useScene”:“only for developer test.”,“warn”:“Never use it in your real job.”},“developer”:“alibaba”,“name”:“streamreader”,“path”:"/Users/zhikelin/lzk/source_zwdinding/dataxdemo/target/classes//plugin/reader/plugin.json"}},“writer”:{“streamwriter”:{“class”:“com.alibaba.datax.plugin.writer.streamwriter.StreamWriter”,“description”:{“mechanism”:“use datax framework to transport data to stream.”,“useScene”:“only for developer test.”,“warn”:“Never use it in your real job.”},“developer”:“alibaba”,“name”:“streamwriter”,“path”:"/Users/zhikelin/lzk/source_zwdinding/dataxdemo/target/classes//plugin/writer/plugin.json"}}}}
21:11:01.656 [main] WARN com.alibaba.datax.core.Engine - prioriy set to 0, because NumberFormatException, the value is: null
21:11:01.659 [main] INFO com.alibaba.datax.common.statistics.PerfTrace - PerfTrace traceId=job_-1, isEnable=false, priority=0
21:11:01.659 [main] INFO com.alibaba.datax.core.job.JobContainer - DataX jobContainer starts job.
21:11:01.661 [main] DEBUG com.alibaba.datax.core.job.JobContainer - jobContainer starts to do preHandle …
21:11:01.661 [main] DEBUG com.alibaba.datax.core.job.JobContainer - jobContainer starts to do init …
21:11:01.661 [main] INFO com.alibaba.datax.core.job.JobContainer - Set jobId = 0
21:11:01.673 [job-0] INFO com.alibaba.datax.core.job.JobContainer - jobContainer starts to do prepare …
21:11:01.674 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Reader.Job [streamreader] do prepare work .
21:11:01.674 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Writer.Job [streamwriter] do prepare work .
21:11:01.674 [job-0] INFO com.alibaba.datax.core.job.JobContainer - jobContainer starts to do split …
21:11:01.675 [job-0] INFO com.alibaba.datax.core.job.JobContainer - Job set Channel-Number to 1 channels.
21:11:01.675 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.
21:11:01.677 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.
21:11:01.677 [job-0] DEBUG com.alibaba.datax.core.job.JobContainer - transformer configuration: null
21:11:01.705 [job-0] DEBUG com.alibaba.datax.core.job.JobContainer - contentConfig configuration: [{“internal”:{“reader”:{“name”:“streamreader”,“parameter”:{“column”:["{“type”:“long”,“value”:“10”}","{“type”:“string”,“value”:“hello,你好,世界-DataX”}"],“sliceRecordCount”:1}},“taskId”:0,“writer”:{“name”:“streamwriter”,“parameter”:{“encoding”:“UTF-8”,“print”:true}}},“keys”:[“writer.parameter.print”,“writer.name”,“reader.name”,“writer.parameter.encoding”,“reader.parameter.column[1]”,“reader.parameter.column[0]”,“taskId”,“reader.parameter.sliceRecordCount”],“secretKeyPathSet”:[]}]
21:11:01.705 [job-0] INFO com.alibaba.datax.core.job.JobContainer - jobContainer starts to do schedule …
21:11:01.710 [job-0] INFO com.alibaba.datax.core.job.JobContainer - Scheduler starts [1] taskGroups.
21:11:01.713 [job-0] INFO com.alibaba.datax.core.job.JobContainer - Running by standalone Mode.
21:11:01.723 [taskGroup-0] DEBUG com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroup[0]'s task configs[[{“internal”:{“reader”:{“name”:“streamreader”,“parameter”:{“column”:["{“type”:“long”,“value”:“10”}","{“type”:“string”,“value”:“hello,你好,世界-DataX”}"],“sliceRecordCount”:1}},“taskId”:0,“writer”:{“name”:“streamwriter”,“parameter”:{“encoding”:“UTF-8”,“print”:true}}},“keys”:[“writer.parameter.print”,“writer.name”,“reader.name”,“writer.parameter.encoding”,“reader.parameter.column[1]”,“reader.parameter.column[0]”,“taskId”,“reader.parameter.sliceRecordCount”],“secretKeyPathSet”:[]}]]
21:11:01.724 [taskGroup-0] INFO com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
21:11:01.729 [taskGroup-0] INFO com.alibaba.datax.core.transport.channel.Channel - Channel set byte_speed_limit to -1, No bps activated.
21:11:01.729 [taskGroup-0] INFO com.alibaba.datax.core.transport.channel.Channel - Channel set record_speed_limit to -1, No tps activated.
21:11:01.737 [job-0] DEBUG com.alibaba.datax.core.job.scheduler.AbstractScheduler - com.alibaba.datax.core.statistics.communication.Communication@a1153bc[
counter={}
message={}
state=RUNNING
throwable=
timestamp=1594905061723
]
21:11:01.747 [taskGroup-0] INFO com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
21:11:01.747 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to do init …
21:11:01.747 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to do init …
21:11:01.748 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to do prepare …
21:11:01.748 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to read …
21:11:01.748 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to do prepare …
21:11:01.748 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to write …
21:11:01.753 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to do post …
21:11:01.753 [0-0-0-reader] DEBUG com.alibaba.datax.core.taskgroup.runner.ReaderRunner - task reader starts to do destroy …
10 hello,你好,世界-DataX
21:11:01.754 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to do post …
21:11:01.754 [0-0-0-writer] DEBUG com.alibaba.datax.core.taskgroup.runner.WriterRunner - task writer starts to do destroy …
21:11:01.851 [taskGroup-0] INFO com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[105]ms
21:11:01.852 [taskGroup-0] INFO com.alibaba.datax.core.taskgroup.TaskGroupContainer - taskGroup[0] completed it’s tasks.
21:11:11.743 [job-0] DEBUG com.alibaba.datax.core.job.scheduler.AbstractScheduler - com.alibaba.datax.core.statistics.communication.Communication@34123d65[
counter={writeSucceedRecords=2, readSucceedRecords=1, totalErrorBytes=0, writeSucceedBytes=19, byteSpeed=0, totalErrorRecords=0, recordSpeed=0, waitReaderTime=0, writeReceivedBytes=19, stage=1, waitWriterTime=53545, percentage=1.0, totalReadRecords=1, writeReceivedRecords=2, readSucceedBytes=19, totalReadBytes=19}
message={}
state=SUCCEEDED
throwable=
timestamp=1594905071742
]
21:11:11.745 [job-0] INFO com.alibaba.datax.core.statistics.container.communicator.job.StandAloneJobContainerCommunicator - Total 1 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
21:11:11.745 [job-0] INFO com.alibaba.datax.core.job.scheduler.AbstractScheduler - Scheduler accomplished all tasks.
21:11:11.745 [job-0] DEBUG com.alibaba.datax.core.job.JobContainer - jobContainer starts to do post …
21:11:11.746 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Writer.Job [streamwriter] do post work.
21:11:11.746 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX Reader.Job [streamreader] do post work.
21:11:11.746 [job-0] DEBUG com.alibaba.datax.core.job.JobContainer - jobContainer starts to do postHandle …
21:11:11.746 [job-0] INFO com.alibaba.datax.core.job.JobContainer - DataX jobId [0] completed successfully.
21:11:11.747 [job-0] INFO com.alibaba.datax.core.container.util.HookInvoker - No hook invoked, because base dir not exists or is a file: /Users/zhikelin/lzk/source_zwdinding/dataxdemo/target/classes/hook
21:11:11.751 [job-0] INFO com.alibaba.datax.core.job.JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s
21:11:11.752 [job-0] INFO com.alibaba.datax.core.job.JobContainer - PerfTrace not enable!
21:11:11.752 [job-0] INFO com.alibaba.datax.core.statistics.container.communicator.job.StandAloneJobContainerCommunicator - Total 1 records, 19 bytes | Speed 1B/s, 0 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
21:11:11.753 [job-0] INFO com.alibaba.datax.core.job.JobContainer -
任务启动时刻 : 2020-07-16 21:11:01
任务结束时刻 : 2020-07-16 21:11:11
任务总计耗时 : 10s
任务平均流量 : 1B/s
记录写入速度 : 0rec/s
读出记录总数 : 1
读写失败总数 : 0
Disconnected from the target VM, address: ‘127.0.0.1:54202’, transport: ‘socket’
Process finished with exit code 0
https://developer.aliyun.com/article/642896
dataxdemo下载