01-datax安装和简单实用

参考连接:

datax github官方地址:https://github.com/alibaba/DataX

1, 安装使用

1.1, 下载地址

http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

1.2, 使用方式(DataX工具包,非源码编译方式)
  • 下载后解压至本地某个目录,进入bin目录,即可运行同步作业:

    $ cd  {YOUR_DATAX_HOME}/bin
    $ python datax.py {YOUR_JOB.json}
    
  • 自检脚本:

    $ python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json
    
1.3, 配置示例:从stream读取数据并打印到控制台
1.3.1, 第一步, 查找配置文件模板(json格式)

可以通过命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

如:python datax.py -r streamreader -w streamwriter, 会打印如下信息:

当然也可以通过官网的github地址下载配置的模板, 里面还有具体的字段的详细解释

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the streamreader document:
     https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md 

Please refer to the streamwriter document:
     https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md 
 
Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "streamreader", 
                    "parameter": {
                        "column": [], 
                        "sliceRecordCount": ""
                    }
                }, 
                "writer": {
                    "name": "streamwriter", 
                    "parameter": {
                        "encoding": "", 
                        "print": true
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}
1.3.2, 第二步, 根据模板自定义配置文件
#stream2stream.json
{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "sliceRecordCount": 10,
            "column": [
              {
                "type": "long",
                "value": "10"
              },
              {
                "type": "string",
                "value": "hello,你好,世界-DataX"
              }
            ]
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "encoding": "UTF-8",
            "print": true
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 5
       }
    }
  }
}
1.3.3, 第三步, 启动datax,根据配置json文件执行即可
  • 如下执行后即可通过streamwriter把streamreader从内存中读取的内存打印在控制台上
$ python ../bin/datax.py stream2stream.json

你可能感兴趣的:(01-datax安装和简单实用)