阿里巴巴离线数据同步dataX3.0实现定时数据同步

阿里巴巴离线数据同步dataX3.0实现定时数据同步

1、熟悉dataX3.0使用,网址:https://github.com/alibaba/DataX/wiki/Quick-Start

2、建立数据同步配置,创建作业的配置文件json文件

{
    "job": {
        "setting": {
            "speed": {
                "byte":10485760
            },
            "errorLimit": {
                "record": 0,
                "percentage": 0.02
            }
        },
        "content": [
            {
                "reader": {
                    "name": "streamreader",
                    "parameter": {
                        "column" : [
                            {
                                "value": "DataX",
                                "type": "string"
                            },
                            {
                                "value": 19890604,
                                "type": "long"
                            },
                            {
                                "value": "1989-06-04 00:00:00",
                                "type": "date"
                            },
                            {
                                "value": true,
                                "type": "bool"
                            },
                            {
                                "value": "test",
                                "type": "bytes"
                            }
                        ],
                        "sliceRecordCount": 100000
                    }
                },
                "writer": {
                    "name": "streamwriter",
                    "parameter": {
                        "print": false,
                        "encoding": "UTF-8"
                    }
                }
            }
        ]
    }
}

3、测试执行数据同步,要下载编译后的版本,并且要安装python2.6以上才能执行。

 

4、编写windows下批处理文档bat执行python脚本,同步昨天的数据。

 

# -*- coding:utf-8 -*-
## windows 定时任务
## author zhujunbo
## 该文件放在datax的bin目录下

import time
import datetime
import os

def startask(path, yesterday):
    files = os.listdir(path)
    for f in files:
        if(os.path.isfile(path + '/' + f)):
            ## fileList.append(f)
            file = path + f
            #执行datax 命令
            os.system('python D:\\datax\\bin\\datax.py -p ''-Dyesterday='+str(yesterday)+'' + '  ' +  file);

            #print  'python D:\\datax\\bin\\datax.py -p ''-Dyesterday='+str(yesterday)+'' + '  ' +  file

if __name__ == "__main__":
    today = datetime.date.today();
    ##昨天日期
    yesterday = today - datetime.timedelta(1)
    startask('D:\\datax\\job\\', yesterday)

5、windows定时任务脚本编写,定时任务设置、测试、运行

@echo off
D:
cd D:\datax\bin
start python autoDataSync.py
exit

 

 

你可能感兴趣的:(大数据,数据库)