https://blog.csdn.net/mynameisjinxiaokai/article/details/128716467
一、datax需要python环境,需要先安装python
打开官网 https://www.python.org/downloads/
此处下载2.6.5版本安装
添加环境变量 path C:\Python27\
安装完成后 cmd 使用 python -V 查看是否已安装成功
datax 下载后解压至本地某个目录,进入bin目录,即可运行同步作业
E:\ruanjian\JAVA\datax\datax , YOUR_JOB.json 因为datax会自动从github拉取对应插件的json的模版给我们
例子
$ cd {YOUR_DATAX_HOME}/bin
$ python datax.py {YOUR_JOB.json}
本地
cd E:\ruanjian\JAVA\datax\datax\bin
python E:\ruanjian\JAVA\datax\datax\bin\datax.py E:\ruanjian\JAVA\datax\datax\bin\local-oracle-test-json.txt
备注,1\ json文件需要检查 json格式是否正确
2 执行输出乱码,临时解决, 输入 chcp 65001
长久解决, https://cloud.tencent.com/developer/article/2084809 修改window编码
3 写入模式(writeMode)配置错误. 因为Oracle不支持配置项 writeMode: xxx, Oracle只能使用insert sql 插入数据 ,所以删除 writeMode 这行配置
4\ datax :java.sql.SQLException: No suitable driver found for ["jdbc:oracle:thin:@localhost:1521:orcl"]
jobjson的writer中的jdbcUrl 不能使用[]扩起来,因为写入的数据库只能有一个
执行转出到 es 命令
python E:\ruanjian\JAVA\datax\datax\bin\datax.py E:\ruanjian\JAVA\datax\datax\bin\oracleToES-json.txt
{
"job": {
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "test1",
"password": "Xia_12345",
"column": ["NAME","SEX","SCHOOL"],
"connection": [
{
"table": [
"DATAX1"
],
"jdbcUrl": [
"jdbc:oracle:thin:@localhost:1521:orcl"
]
}
]
}
},
"writer": {
"name": "oraclewriter",
"parameter": {
"username": "test1",
"password": "Xia_12345",
"column": [
"NAME",
"SEX",
"SCHOOL"
],
"preSql": [
"delete from DATAX2"
],
"connection": [
{
"table": [
"DATAX2"
],
"jdbcUrl": "jdbc:oracle:thin:@localhost:1521:orcl",
}
]
}
}
}
],
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 10,
"percentage": 0.05
}
}
}
}
{
"job": {
"content": [
{
"reader": {
"name": "oraclereader",
"parameter": {
"username": "test1",
"password": "Xia_12345",
"column": ["NAME","SEX","SCHOOL"],
"connection": [
{
"table": [
"DATAX1"
],
"jdbcUrl": [
"jdbc:oracle:thin:@localhost:1521:orcl"
]
}
]
}
},
"writer": {
"name": "elasticsearchwriter",
"parameter": {
"endpoint": "http://127.0.0.1:9200",
"index": "myindex",
"type": "data",
"cleanup": true, #true表示插入前清空,即覆盖同步;false则追加同步
"dynamic": true, #这里一定要指定为true,否则使用的是datax的模板(就是下面定义的字段类型),而不会使用es的模板
"settings": {"index" :{"number_of_shards": 2, "number_of_replicas": 1}},
"batchSize": 10000,
"splitter": ",",
"column": [
{"name": "NAME", "type": "text"},
{ "name": "SEX", "type": "text"},
{ "name": "SCHOOL", "type": "text"}
]
}
}
}
],
"setting": {
"speed": {
"channel": 1
},
"errorLimit": {
"record": 10,
"percentage": 0.05
}
}
}
}