一、安装
https://github.com/WeiYe-Jing/datax-web/blob/master/doc/datax-web/datax-web-deploy.md
1、直接下载DataX工具包:DataX下载地址
下载后解压至本地某个目录,进入bin目录,即可运行同步作业:
$ cd {YOUR_DATAX_HOME}/bin $ python datax.py {YOUR_JOB.json}
自检脚本: python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json
二、测试脚本
C:\DATAX\datax\bin>CHCP 65001
Active code page: 65001
C:\DATAX\datax\bin>python datax.py C:\DATAX\datax\job\test.json
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.
2024-03-14 15:02:01.361 [main] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2024-03-14 15:02:01.363 [main] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2024-03-14 15:02:01.375 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2024-03-14 15:02:01.381 [main] INFO Engine - the machine info =>
osInfo: Windows 10 amd64 10.0
jvmInfo: Oracle Corporation 1.8 25.341-b10
cpu num: 6
totalPhysicalMemory: -0.00G
freePhysicalMemory: -0.00G
maxFileDescriptorCount: -1
currentOpenFileDescriptorCount: -1
GC Names [PS MarkSweep, PS Scavenge]
MEMORY_NAME | allocation_size | init_size
PS Eden Space | 256.00MB | 256.00MB
Code Cache | 240.00MB | 2.44MB
Compressed Class Space | 1,024.00MB | 0.00MB
PS Survivor Space | 42.50MB | 42.50MB
PS Old Gen | 683.00MB | 683.00MB
Metaspace | -0.00MB | 0.00MB
2024-03-14 15:02:01.392 [main] INFO Engine -
{
"setting":{
"speed":{
"channel":4
}
},
"content":[
{
"reader":{
"name":"oraclereader",
"parameter":{
"username":"test",
"password":"******",
"column":[
"*"
],
"connection":[
{
"table":[
"CAR.S_U"
],
"jdbcUrl":[
"jdbc:oracle:thin:@//10.33.51.231:1521/helowin"
]
}
]
}
},
"writer":{
"name":"mysqlwriter",
"parameter":{
"username":"car",
"password":"******",
"writeMode":"replace",
"column":[
"*"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://10.33.51.231:3306/car",
"table":[
"S_U"
]
}
]
}
}
}
]
}
2024-03-14 15:02:01.417 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false
2024-03-14 15:02:01.418 [main] INFO JobContainer - DataX jobContainer starts job.
2024-03-14 15:02:01.420 [main] INFO JobContainer - Set jobId = 0
2024-03-14 15:02:01.623 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:oracle:thin:@//10.33.51.231:1521/helowin.
2024-03-14 15:02:01.624 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置 读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.
2024-03-14 15:02:07.748 [job-0] INFO OriginalConfPretreatmentUtil - table:[S_BU] all columns:[
M_ROW$$,ANNLRVW_ENDT_OFFST,ANNLRVW_STDT_OFFST,BU_FLG,CONFLICT_ID,CREATED,CREATED_BY,CURR_PRD_OBJ_NAME,CURR_PRD_RVW_NAME,DYN_HRCHY_ID,LAST_UPD,LAST_UPD_BY,MODIFICATION_NUM,NAME,PAR_ROW_ID,ROW_ID,X_LANG_ID
].
2024-03-14 15:02:07.748 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写 入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2024-03-14 15:02:07.752 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
replace INTO %s (M_ROW$$,ANNLRVW_ENDT_OFFST,ANNLRVW_STDT_OFFST,BU_FLG,CONFLICT_ID,CREATED,CREATED_BY,CURR_PRD_OBJ_NAME,CURR_PRD_RVW_NAME,DYN_HRCHY_ID,LAST_UPD,LAST_UPD_BY,MODIFICATION_NUM,NAME,PAR_ROW_ID,ROW_ID,X_LANG_ID) VALUES(?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://10.33.51.231:3306/carmen?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]
2024-03-14 15:02:07.753 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2024-03-14 15:02:07.753 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do prepare work .
2024-03-14 15:02:07.754 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2024-03-14 15:02:07.755 [job-0] INFO JobContainer - jobContainer starts to do split ...
2024-03-14 15:02:07.755 [job-0] INFO JobContainer - Job set Channel-Number to 4 channels.
2024-03-14 15:02:07.757 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] splits to [1] tasks.
2024-03-14 15:02:07.758 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2024-03-14 15:02:07.778 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2024-03-14 15:02:07.779 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2024-03-14 15:02:07.781 [job-0] INFO JobContainer - Running by standalone Mode.
2024-03-14 15:02:07.786 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2024-03-14 15:02:07.790 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2024-03-14 15:02:07.790 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2024-03-14 15:02:07.796 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2024-03-14 15:02:07.798 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from CARMEN.S_BU
] jdbcUrl:[jdbc:oracle:thin:@//10.33.51.231:1521/helowin].
2024-03-14 15:02:16.306 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from CARMEN.S_BU
] jdbcUrl:[jdbc:oracle:thin:@//10.33.51.231:1521/helowin].
2024-03-14 15:02:16.845 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[9050]ms
2024-03-14 15:02:16.847 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2024-03-14 15:02:17.798 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 14644530 bytes | Speed 1.40MB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 6.239s | All Task WaitReaderTime 1.988s | Percentage 100.00%
2024-03-14 15:02:17.800 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2024-03-14 15:02:17.805 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2024-03-14 15:02:17.805 [job-0] INFO JobContainer - DataX Reader.Job [oraclereader] do post work.
2024-03-14 15:02:17.806 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2024-03-14 15:02:17.807 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: C:\DATAX\datax\hook
2024-03-14 15:02:17.808 [job-0] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | 1 | 1 | 1 | 0.018s | 0.018s | 0.018s
PS Scavenge | 5 | 5 | 5 | 0.028s | 0.028s | 0.028s
2024-03-14 15:02:17.808 [job-0] INFO JobContainer - PerfTrace not enable!
2024-03-14 15:02:17.808 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 14644530 bytes | Speed 1.40MB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 6.239s | All Task WaitReaderTime 1.988s | Percentage 100.00%
2024-03-14 15:02:17.810 [job-0] INFO JobContainer -
任务启动时刻 : 2024-03-14 15:02:01
任务结束时刻 : 2024-03-14 15:02:17
任务总计耗时 : 16s
任务平均流量 : 1.40MB/s
记录写入速度 : 10000rec/s
读出记录总数 : 100000
读写失败总数 : 0
C:\DATAX\datax\bin>
三、datax-web: DataX管理调度GitHub - WeiYe-Jing/datax-web: DataX集成可视化页面,选择数据源即可一键生成数据同步任务,支持RDBMS、Hive、HBase、ClickHouse、MongoDB等数据源,批量创建RDBMS数据同步任务,集成开源调度系统,支持分布式、增量同步数据、实时查看运行日志、监控执行器资源、KILL运行进程、数据源信息加密等。