数据同步中间件datalink从oracle同步数据到greenplum

之前一段时间,需要搭建数据实时同步的一个平台,了解到datalink这个已经开源的中间件,自己使用了一段时间后,学习到蛮多的,特别是整个平台的架构设计。由于已经有一段时间不接触了,怕忘记,写写自己的心得体会。
项目开源地址:https://github.com/ucarGroup/DataLink
这里已经有源码,也包括了说明文档,这里就不一一介绍了。

如果源端是mysql,这个中间已经非常友好的支持了,以及不同类型的数据源。支持各种类型数据源增量数据的实时同步。

但是对于oracle的增量数据读取,暂时还不支持,但是公司的业务需求需要,所以只能找方法。在源码中,插件reader是专门用来读取源库增量数据的,所以可以自己写一个关于oracle数据读取的插件,自己写读取工具难度太大。所有调研了几个同步中间件:

1.阿里的愚公,支持oracle增量数据的实时同步,利用的oracle物化视图的功能,每张表都要有对应的物化视图,有较大性能损失,不支持ddl
2.kafka-connect,是kafka的一个框架,支持多种数据源的增量数据读取到kafka中,轻便,配置简单,并且写入到kafka中,数据的后续处理就十分开放了,弊端是不支持ddl
3.oralce官方工具ogg,ogg的功能还是比较强大,每次同步数据都记录了日志偏移量,不过相比来说,需要了解配置文件中每个配置项的含义,以前是没有界面化工具,现在有了,只是我还没实际用过,也支持ddl。
对比了一下,想要ddl这个功能,所以还是选择了ogg。

其实如果没有一定需要ddl这个功能的话,个人比较推荐使用kafka-connect这个工具,统一将数据写入kafka中,在datalink中写reader读取kafka中数据即可。

关于ogg的搭建和基本使用这里不多说了,主要是说一下ddl读取的流程思路:
ogg读取ddl,ogg是可以获取表结构的变更数据的,但是前提是db到db的数据同步,例如源端是mysql,目的端是oracle,mysql等。对于目的端是kafka,是无法获取ddl语句的,所以需要一个转变,在ogg用户下,有个历史表GGS_DDL_HIST,每当监控的表结构发生变化的时候,这张表就会新增记录。所以你是不是想到改怎么做了?是不是把这张表加到extract抽取进程配置中就可以了?答案不是的,所以这个也是比较蛋疼的地方,需要以下步骤:
(1)编写存储过程,将GGS_DDL_HIST数据同步至自定义表中,这张表可以新建在ogg用户下。
(2)编写定时任务,定时任务是定时执行(1)中存储过程,大概每几秒扫一次。
(3)ogg中extract抽取进程和同步进程rep都配置该自定义表的增量数据。
所以最终将ddl执行语句数据同步到kafka中了,datalink中读取即可。不过对于实时性极高的场景,这样是不适合的,单纯使用ogg同步增量数据,从源端写入到目的端同步,一般至少需要5s的时间了,加上定时任务的时间差,一般5-10s内才能同步数据。

接下来是datalink中代码的编写了,主要是编写kafka客户端读取kafka数据,需要注意的是,要确保数据消费成功后才进行offset的提交,额外说一句,kafka-connect自己就有对offset的管理。下面贴出部分代码:
消费者

package com.ucar.datalink.reader.oracle.comsumer;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.ucar.datalink.domain.media.MediaSourceInfo;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;

import java.time.Duration;
import java.util.Arrays;
import java.util.Map;
import java.util.Properties;

public class KafkaRetryConsumer {

    private KafkaConsumer consumer;
    private ConsumerRecords msgList;
    private final String topic;
    private TopicPartition topicPartition;

    public KafkaRetryConsumer(MediaSourceInfo mediaSourceInfo) {

        JSONObject jsonObject = JSON.parseObject( mediaSourceInfo.getParameter());
        String groupId = jsonObject.getString("groupId");
        String topicName = jsonObject.getString("topic");
        String servers = jsonObject.getString("bootstrapServers");
        Properties props = new Properties();
        props.put("bootstrap.servers", servers);
        props.put("group.id", groupId);
        props.put("enable.auto.commit", "false");
        props.put("session.timeout.ms", "30000");
        props.put("max.poll.interval.ms", "30000");
        props.put("max.poll.records", 1);//一次获取最大条数
        //要发送自定义对象,需要指定对象的反序列化类
        props.put("key.deserializer", StringDeserializer.class.getName());
        props.put("value.deserializer", StringDeserializer.class.getName());
        this.consumer = new KafkaConsumer(props);
        this.topic = topicName;

        //简单起见,就获取第一个partition.kafka配置文件设置一个topic一个partition:num.partitions=1
        this.topicPartition = new TopicPartition(topicName, 0);
        this.consumer.assign(Arrays.asList(topicPartition));
    }

    public ConsumerRecords poll(){
        msgList = consumer.poll(Duration.ofMillis(100));//一次获取最大时间
        return msgList;
    }

    //设置消费数据的偏移量
    public void seek(Long offset){
        consumer.seek(topicPartition, offset);
    }

    //获取最开始的偏移量开始读取
    public Long beginningOffsets(){
        Map map = consumer.beginningOffsets(Arrays.asList(topicPartition));
        Long offset = map.get(topicPartition);
        return offset == null ? 0L : offset;
    }

    //获取最大偏移量
    public Long endOffsets(){
        Map map = consumer.endOffsets(Arrays.asList(topicPartition));
        Long offset = map.get(topicPartition);
        return offset == null ? 0L : offset;
    }

    public void close(){
        consumer.close();
    }

    public void commit(){
        consumer.commitAsync();
    }
}

读取

package com.ucar.datalink.reader.oracle;

import com.ucar.datalink.biz.service.MediaService;
import com.ucar.datalink.contract.log.rdbms.EventType;
import com.ucar.datalink.contract.log.rdbms.RdbEventRecord;
import com.ucar.datalink.domain.media.MediaSourceInfo;
import com.ucar.datalink.domain.media.MediaSourcesRel;

import com.ucar.datalink.domain.plugin.reader.oracle.OracleReaderParameter;
import com.ucar.datalink.reader.oracle.comsumer.KafkaRetryConsumer;
import com.ucar.datalink.reader.oracle.translator.KafkaTransToRecord;
import com.ucar.datalink.worker.api.model.TaskCacheModel;
import com.ucar.datalink.worker.api.task.RecordChunk;
import com.ucar.datalink.worker.api.task.TaskReader;
import com.ucar.datalink.worker.api.task.TaskReaderContext;
import com.ucar.datalink.worker.api.task.TaskCache;
import com.ucar.datalink.worker.api.util.statistic.BaseReaderStatistic;
import com.ucar.datalink.worker.api.util.statistic.ReaderStatistic;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;


/**
 * Created by swj on 2020/04/09
 * 1.通过轮询fetch方法,方法中通过调用kafka的轮询方法poll,直至获取数据
 * 2.kafka中的数据,是目的端的ogg写入的,将kafka数据转换成RdbEventRecord对象
 * 3.直至数据同步完成,会自动调用已经重写过的commit方法,对kafka消费的数据进行commit操作
 *
 * 关于kafka偏移量的说明 com.ucar.datalink.worker.api.model.OracleTaskModel
 * OFFSET记录的是将要poll获取的偏移量,成功commit之后缓存和数据库更新下一个偏移量
 */

public class OracleTaskReader extends TaskReader {

    private static final Logger logger = LoggerFactory.getLogger(OracleTaskReader.class);

    private KafkaRetryConsumer kafkaDataComsumer;

    private MediaService mediaService;

    /*
    是否需要commit标识
    如果是正常退出,就不需要commit
     */
    private boolean commitFlag = true;

    @Override
    public void initialize(TaskReaderContext context) {
        super.initialize(context);
        this.mediaService = context.getService(MediaService.class);
    }

    @Override
    public void start() {
        if (isStart()) {
            return;
        }
        startInternal();
        super.start();
    }

    @Override
    public void close() {
        stopInternal();
    }

    @Override
    public void prePoll() {
        //把上一次的统计结果打印出来
        BaseReaderStatistic statistic = context.taskReaderSession().getData(ReaderStatistic.KEY);
        if (statistic != null && parameter.isPerfStatistic()) {
            logger.info(statistic.toJsonString());
        }

        //本次开始前,进行reset操作
        context.beginSession();
        context.taskReaderSession().setData(ReaderStatistic.KEY, new ReaderStatistic(context.taskId()));

        //检查offset信息
        preSetKafkaOffset();
    }

    /*
        poll之前检查一下offset
     */
    private void preSetKafkaOffset(){
        //检查kafka offset是否存在
        TaskCacheModel oracleTaskModel = (TaskCacheModel)TaskCache.getCacheByTaskId(context.taskId());
        if (oracleTaskModel.getOffset() == null) {
            //从数据库中获取
            MediaSourcesRel mediaSourcesRel = mediaService.findMediaSourcesByRelByTaskId(context.taskId());
            if(mediaSourcesRel.getOffset() != null){
                oracleTaskModel.setOffset(mediaSourcesRel.getOffset());
            }
        }
    }

    @Override
    protected RecordChunk fetch() throws InterruptedException {

        ConsumerRecords consumerRecords = null;
        boolean setOffsetFlag = true;
        commitFlag = true;
        while (isStart()) {
            if(!TaskCache.getTaskStatus(context.taskId())){
                logger.info("停止循环任务:"+context.taskId());
                commitFlag = false;
                break;
            }
            if(setOffsetFlag){//每次循环,值只要设置一次offset即可
                setKafkaOffset();//设置偏移量
                setOffsetFlag = false;
            }
            consumerRecords = kafkaDataComsumer.poll();
            if (consumerRecords != null && consumerRecords.count() > 0) {
                break;
            } else {
                Thread.sleep(100);
            }
        }
        if (!isStart()) {
            throw new InterruptedException();
        }

        RecordChunk result = null;
        if (consumerRecords != null && consumerRecords.count() > 0) {
            //将kafka的数据转为自定义对象
            result = KafkaTransToRecord.getRdbEventRecord(consumerRecords);
        }else {
            result = new RecordChunk();
            result.setRecords(new ArrayList<>());
        }
        return result;
    }

    @Override
    protected void dump(RecordChunk recordChunk) {
    }

    /*
        设置kafka消费的偏移量
     */
    private void setKafkaOffset(){

        TaskCacheModel taskCacheModel = (TaskCacheModel)TaskCache.getCacheByTaskId(context.taskId());
        Long offset = taskCacheModel.getOffset();
        if(offset != null){
            //获取最大offset
            Long endOffset = kafkaDataComsumer.endOffsets();
            Long beginningOffset = kafkaDataComsumer.beginningOffsets();
            if(offset > endOffset){
                logger.info("最大offset为:"+endOffset+"将消费位点设置为此");
                offset = endOffset;
                taskCacheModel.setOffset(offset);
            }else if(offset < beginningOffset){
                logger.info("最小offset为:"+beginningOffset+"将消费位点设置为此");
                offset = beginningOffset;
                taskCacheModel.setOffset(offset);
            }
            kafkaDataComsumer.seek(offset);
        }else {
            //从开始读取
            offset = kafkaDataComsumer.beginningOffsets();
            logger.info("offset为空,最小offset为:"+offset+"将消费位点设置为此");
            kafkaDataComsumer.seek(offset);
            taskCacheModel.setOffset(offset);
        }
        TaskCache.putCacheByTaskId(context.taskId(),taskCacheModel);
    }

    @Override
    public void commit(RecordChunk recordChunk){
        if(commitFlag){
            updateOffsetInfo();//更新偏移量
            kafkaDataComsumer.commit();
        }
    }

    /*
        更新缓存和数据库中的偏移量
     */
    private void updateOffsetInfo(){
        TaskCacheModel oracleTaskModel = (TaskCacheModel)TaskCache.getCacheByTaskId(context.taskId());
        //修改数据库
        mediaService.updateOffsetByTaskId(oracleTaskModel.getOffset()+1, context.taskId());
        //存入下一个offset偏移量
        oracleTaskModel.setOffset(oracleTaskModel.getOffset()+1);
        TaskCache.putCacheByTaskId(context.taskId(),oracleTaskModel);
    }

    @Override
    @SuppressWarnings({"unchecked"})
    public void rollback(RecordChunk recordChunk, Throwable t) {
        super.rollback(recordChunk, t);
    }

    private void startInternal() {

        MediaService mediaService = context.getService(MediaService.class);
        MediaSourceInfo mediaSourceInfo = mediaService.findMediaSourcesByRelTaskId(context.taskId());
        if (mediaSourceInfo == null) {
            logger.error("启动oracle-task,未查询到绑定的kafka介质,异常结束");
            stopInternal();
            return;
        }
        //从context中获取taskId 之后获取具体的配置信息创建实例
        kafkaDataComsumer = new KafkaRetryConsumer(mediaSourceInfo);
    }

    private void stopInternal() {
        if (kafkaDataComsumer != null) {
            kafkaDataComsumer.close();
        }
    }

    public static void main(String[] args) {
        List list = new ArrayList<>();
        RdbEventRecord rdbEventRecord = new RdbEventRecord();
        rdbEventRecord.setTableName("t_dl_test_source");
        rdbEventRecord.setSchemaName("datalink");
        rdbEventRecord.setEventType(EventType.ALTER);
        rdbEventRecord.setDdlSchemaName("datalink");
        rdbEventRecord.setSql("ALTER TABLE t_dl_test_source ADD (identity3 varchar(2) default '1')");
        list.add(rdbEventRecord);
        RecordChunk rc = new RecordChunk(list, 123, 1111);
        RecordChunk result = rc.copyWithoutRecords();
        rc.getRecords().stream().forEach(r -> {

        });
    }
}

数据转换

package com.ucar.datalink.reader.oracle.translator;

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.ucar.datalink.common.utils.DateUtil;
import com.ucar.datalink.contract.log.rdbms.EventColumn;
import com.ucar.datalink.contract.log.rdbms.EventType;
import com.ucar.datalink.contract.log.rdbms.RdbEventRecord;
import com.ucar.datalink.reader.oracle.constant.OpType;
import com.ucar.datalink.worker.api.task.RecordChunk;
import org.apache.commons.lang.StringUtils;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;


/*
    将kafka的数据转为record
 */
public class KafkaTransToRecord {

    private static final Logger logger = LoggerFactory.getLogger(KafkaTransToRecord.class);

    //这个表是ogg ddl记录表,需要特殊处理
    public static final String ORACLE_DDL_TABLE="ggs_ddl_hist_sync";


    public static RecordChunk getRdbEventRecord(ConsumerRecords consumerRecords) throws InterruptedException {
        List rdbEventRecords = new ArrayList<>();
        for (ConsumerRecord record : consumerRecords) {
            logger.info("从kafka获取到明细:" + record);
            JSONObject jsonObject = JSON.parseObject(record.value());
            RdbEventRecord rdbEventRecord = getRdbEventRecord(jsonObject);
            rdbEventRecords.add(rdbEventRecord);
        }
        RecordChunk rdbEventRecordRecordChunk = new RecordChunk(rdbEventRecords,System.currentTimeMillis(),getByteSize(rdbEventRecords));

        return rdbEventRecordRecordChunk;
    }

    /*
    根据不同操作类型的数据进行分别处理
     */
    private static RdbEventRecord getRdbEventRecord(JSONObject jsonObject) throws InterruptedException {
        RdbEventRecord rdbEventRecord = null;
        //处理ddl
        rdbEventRecord = transAlterData(jsonObject);
        if(rdbEventRecord != null){
            return rdbEventRecord;
        }

        String opType = jsonObject.getString("op_type");
        switch (opType) {
            case OpType.INSERT:
                rdbEventRecord = transInsertData(jsonObject);
                break;
            case OpType.UPDATE:
                rdbEventRecord = transUpdateData(jsonObject);
                break;
            case OpType.DELETE:
                rdbEventRecord = transDeleteData(jsonObject);
                break;
            default:
                throw new InterruptedException("不支持的数据操作类型:" + opType);
        }
        return rdbEventRecord;
    }

    /*
        数据新增操作
     */
    private static RdbEventRecord transInsertData(JSONObject jsonObject) {

        RdbEventRecord rdbEventRecord = new RdbEventRecord();
        String schemaTable = jsonObject.getString("table");
        String[] strings = schemaTable.split("\\.");
        rdbEventRecord.setSchemaName(strings[0]);
        rdbEventRecord.setTableName(strings[1]);
        rdbEventRecord.setEventType(EventType.INSERT);
        rdbEventRecord.setExecuteTime(DateUtil.strToTimeMills(jsonObject.getString("op_ts")));

        //设置主键
        JSONObject after = jsonObject.getJSONObject("after");
        JSONArray primaryKeys = jsonObject.getJSONArray("primary_keys");
        List keys = new ArrayList<>();
        Map columnsIndexMap = getColumnsIndex(jsonObject);
        for (Object key : primaryKeys) {
            EventColumn eventColumn = new EventColumn();
            eventColumn.setNull(false);
            eventColumn.setUpdate(true);
            eventColumn.setKey(true);
            eventColumn.setColumnName(key.toString());
            eventColumn.setColumnValue(after.getString(key.toString()));
            eventColumn.setIndex(columnsIndexMap.get(key));
            keys.add(eventColumn);
        }
        rdbEventRecord.setKeys(keys);

        //设置非主键
        List columns = new ArrayList<>();
        for (String column : after.keySet()) {
            //过滤主键字段
            if (primaryKeys.contains(column)) {
                continue;
            }
            EventColumn eventColumn = new EventColumn();
            eventColumn.setKey(false);
            eventColumn.setUpdate(true);
            if (StringUtils.isEmpty(after.getString(column))) {
                eventColumn.setNull(true);
            } else {
                eventColumn.setNull(false);
            }
            eventColumn.setIndex(columnsIndexMap.get(column));
            eventColumn.setColumnName(column);
            eventColumn.setColumnValue(after.getString(column));
            columns.add(eventColumn);
        }
        rdbEventRecord.setColumns(columns);

        return rdbEventRecord;
    }

    /*
        数据更新操作
     */
    private static RdbEventRecord transUpdateData(JSONObject jsonObject) {
        RdbEventRecord rdbEventRecord = new RdbEventRecord();
        String schemaTable = jsonObject.getString("table");
        String[] strings = schemaTable.split("\\.");
        rdbEventRecord.setSchemaName(strings[0]);
        rdbEventRecord.setTableName(strings[1]);
        rdbEventRecord.setEventType(EventType.UPDATE);
        rdbEventRecord.setExecuteTime(DateUtil.strToTimeMills(jsonObject.getString("op_ts")));

        //设置主键
        JSONObject before = jsonObject.getJSONObject("before");
        JSONObject after = jsonObject.getJSONObject("after");
        JSONArray primaryKeys = jsonObject.getJSONArray("primary_keys");
        List oldKeys = new ArrayList<>();
        List keys = new ArrayList<>();
        boolean updateKeyFlag = false;
        for (Object key : primaryKeys) {
            EventColumn beforeEventColumn = new EventColumn();
            beforeEventColumn.setNull(false);
            beforeEventColumn.setUpdate(true);
            beforeEventColumn.setKey(true);
            beforeEventColumn.setColumnName(key.toString());
            beforeEventColumn.setColumnValue(before.getString(key.toString()));
            oldKeys.add(beforeEventColumn);

            //如果涉及到任意一个主键值修改,就需要给oldKey全量赋值
            String beforeKeyVal = before.getString(key.toString());
            String afterKeyVal = after.getString(key.toString());
            if(beforeKeyVal != null && afterKeyVal != null){
                if(!beforeKeyVal.equals(afterKeyVal)){
                    updateKeyFlag = true;
                }
            }

            EventColumn afterEventColumn = new EventColumn();
            afterEventColumn.setNull(false);
            afterEventColumn.setUpdate(true);
            afterEventColumn.setKey(true);
            afterEventColumn.setColumnName(key.toString());
            afterEventColumn.setColumnValue(after.getString(key.toString()));
            keys.add(afterEventColumn);
        }
        if(updateKeyFlag){
            rdbEventRecord.setOldKeys(oldKeys);
        }
        rdbEventRecord.setKeys(keys);

        //设置非主键
        List oldColumns = new ArrayList<>();
        List columns = new ArrayList<>();
        for (String column : after.keySet()) {
            //过滤主键字段
            if (primaryKeys.contains(column)) {
                continue;
            }
            EventColumn oldEventColumn = new EventColumn();
            oldEventColumn.setKey(false);
            oldEventColumn.setUpdate(true);
            if (StringUtils.isEmpty(after.getString(column))) {
                oldEventColumn.setNull(true);
            } else {
                oldEventColumn.setNull(false);
            }
            oldEventColumn.setColumnName(column);
            oldEventColumn.setColumnValue(before.getString(column));
            oldColumns.add(oldEventColumn);


            EventColumn eventColumn = new EventColumn();
            eventColumn.setKey(false);
            eventColumn.setUpdate(true);
            if (StringUtils.isEmpty(after.getString(column))) {
                eventColumn.setNull(true);
            } else {
                eventColumn.setNull(false);
            }
            eventColumn.setColumnName(column);
            eventColumn.setColumnValue(after.getString(column));
            columns.add(eventColumn);
        }
        rdbEventRecord.setOldColumns(oldColumns);
        rdbEventRecord.setColumns(columns);

        return rdbEventRecord;
    }

    /*
        数据删除操作
     */
    private static RdbEventRecord transDeleteData(JSONObject jsonObject) {

        RdbEventRecord rdbEventRecord = new RdbEventRecord();
        String schemaTable = jsonObject.getString("table");
        String[] strings = schemaTable.split("\\.");
        rdbEventRecord.setSchemaName(strings[0]);
        rdbEventRecord.setTableName(strings[1]);
        rdbEventRecord.setEventType(EventType.DELETE);
        rdbEventRecord.setExecuteTime(DateUtil.strToTimeMills(jsonObject.getString("op_ts")));

        //设置主键
        JSONObject before = jsonObject.getJSONObject("before");
        JSONArray primaryKeys = jsonObject.getJSONArray("primary_keys");
        List keys = new ArrayList<>();
        for (Object key : primaryKeys) {
            EventColumn eventColumn = new EventColumn();
            eventColumn.setNull(false);
            eventColumn.setUpdate(true);
            eventColumn.setKey(true);
            eventColumn.setColumnName(key.toString());
            eventColumn.setColumnValue(before.getString(key.toString()));
            keys.add(eventColumn);
        }
        rdbEventRecord.setOldKeys(keys);
        rdbEventRecord.setKeys(keys);

        return rdbEventRecord;
    }

    /*
        /*
        处理ddl数据
        1.ddl数据是从GGS_DDL_HIST表中同步到自定义表GGS_DDL_HIST_SYNC中
        2.该自定义表在ogg同步范围内
     */
    private static RdbEventRecord transAlterData(JSONObject jsonObject) {
        RdbEventRecord rdbEventRecord = new RdbEventRecord();
        String schemaTable = jsonObject.getString("table");
        String[] strings = schemaTable.split("\\.");
        String opType = jsonObject.getString("op_type");
        //是否属于ddl记录表
        if(!strings[1].equalsIgnoreCase(ORACLE_DDL_TABLE)){
            return null;
        }
        if(!OpType.INSERT.equals(opType)){
            logger.debug("属于"+ORACLE_DDL_TABLE+"表,但不是插入操作,进行过滤");
            rdbEventRecord.setSchemaName("");
            rdbEventRecord.setTableName("");
            return rdbEventRecord;
        }
        JSONObject after = jsonObject.getJSONObject("after");
        String schema = after.getString("OBJECTOWNER");
        String table = after.getString("OBJECTNAME");
        String ddlText = after.getString("METADATA_TEXT").toUpperCase();
        //过滤不含alter sql的数据
        // 对于ddl历史表,只处理新增的记录
        if(ddlText.indexOf("ALTER TABLE")<0 && ddlText.indexOf("COMMENT ON")<0){
            logger.debug("属于"+ORACLE_DDL_TABLE+"表,表更新数据,进行过滤");
            rdbEventRecord.setSchemaName("");
            rdbEventRecord.setTableName("");
            return rdbEventRecord;
        }
        ddlText = getDdlbyText(ddlText);
        rdbEventRecord.setSql(ddlText);
        rdbEventRecord.setSchemaName(schema);
        rdbEventRecord.setTableName(table);
        rdbEventRecord.setEventType(EventType.ALTER);
        return rdbEventRecord;
    }

    /*
        过滤ddl数据中的特殊字符
     */
    private static String getDdlbyText(String ddlText){

        ddlText = ddlText.substring(5,ddlText.length()-2);

        ddlText = ddlText.replaceAll("\\\\","");

        return ddlText;
    }

    /*
        整理主键字段的index
     */
    private static Map getColumnsIndex(JSONObject jsonObject) {
        Map columnsIndexMap = new HashMap();
        JSONObject after = jsonObject.getJSONObject("after");

        int count = 0;
        for (String column : after.keySet()) {
            columnsIndexMap.put(column,count);
            count++;
        }
        return columnsIndexMap;
    }

    public static long getByteSize(List datas) {
        long byteSize = 0;
        try {
            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            ObjectOutputStream os = new ObjectOutputStream(baos);
            os.writeObject(datas);
            os.close();
            byteSize = baos.size();
            baos.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return byteSize;
    }
}

获取数据后,在dl-worker-writer-rdbms模块下也要做些修改,像我是需要从oracle写入greenplum中,所以需要增加写入gp的代码。另个比较头疼的问题是,我们从oracle拿到ddl语句,但是对于写入不同的数据源,对应的sql也需要作出相应的改变,我这里用的是阿里的druid工具,也是druid的底层包,对sql进行转换,包括对字段类型的转换,类型转换可以参考这位大神的系列文章:https://developer.aliyun.com/article/57142

下面贴出部分代码:

package com.ucar.datalink.writer.rdbms.handle.translator;

import com.alibaba.druid.sql.SQLUtils;
import com.alibaba.druid.sql.ast.SQLDataType;
import com.alibaba.druid.sql.ast.SQLStatement;
import com.alibaba.druid.util.JdbcConstants;
import com.ucar.datalink.writer.rdbms.handle.mapping.Oracle2PgColumnMapping;
import com.ucar.datalink.writer.rdbms.handle.visitor.Oracle2PgOutputVisitor;
import com.ucar.datalink.writer.rdbms.utils.ExpiryMap;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.IOException;
import java.io.StringWriter;
import java.util.List;
import java.util.Map;
import java.util.UUID;

/*
    oracle->pg转换类
 */
public class Oracle2PgTranslator {

    private static final Logger LOGGER = LoggerFactory.getLogger(Oracle2PgTranslator.class);

    public static String sqlEntry(String sql) {
        LOGGER.info("oracle-pg 原sql:" + sql);
        sql = translator(sql);

        if (sql.contains("MODIFY COLUMN")) {//修改字段
            sql = sql.replaceAll("MODIFY COLUMN", "ALTER COLUMN");
        }

        LOGGER.info("转换后sql:" + sql);
        return sql;
    }


    private static String translator(String sql) {
        List stmtList = SQLUtils.parseStatements(sql, JdbcConstants.ORACLE);
        SQLStatement stmt = stmtList.get(0);

        StringWriter out = null;
        try {
            //该流水针对每次的sql转换
            String sqlSerial = UUID.randomUUID().toString().replace("-","").toUpperCase();
            out = new StringWriter();
            Oracle2PgOutputVisitor outputVisitor = new Oracle2PgOutputVisitor(out);
            outputVisitor.setSqlSerial(sqlSerial);
            stmt.accept(outputVisitor);
            //如果涉及到sql拆分,需要sql转换完成后进行补充
            out.append(out.toString().endsWith(";") ? "" : ";");
            out.append((ExpiryMap.getInstance().get(sqlSerial) == null ? "" : ExpiryMap.getInstance().get(sqlSerial)).toString());
            sql = out.toString();
            ExpiryMap.getInstance().remove(sqlSerial);
        } catch (Exception e) {
            LOGGER.error("解析转换sql出现异常", e);
        } finally {
            if (out != null) {
                try {
                    out.close();
                } catch (IOException e) {
                    LOGGER.error("StringWriter关闭异常", e);
                }
            }
        }
        return sql;
    }

    //字段类型映射
    public static void transDataType(SQLDataType sqlDataType){

        String dataType = sqlDataType.getName();
        Map oracle2PgMap = Oracle2PgColumnMapping.getOracle2PgMap();
        if (oracle2PgMap.get(dataType) != null) {
            if(dataType.contains("interval")){
                sqlDataType.setName(oracle2PgMap.get("interval").toString());
                if(sqlDataType.getArguments() != null){
                    sqlDataType.getArguments().clear();
                }
                return;
            }
            //只替换类型,不替换长度
            sqlDataType.setName(oracle2PgMap.get(dataType).toString());
        }
    }  
}

package com.ucar.datalink.writer.rdbms.handle.visitor;

import com.alibaba.druid.sql.ast.SQLDataType;
import com.alibaba.druid.sql.ast.SQLExpr;
import com.alibaba.druid.sql.ast.SQLName;
import com.alibaba.druid.sql.ast.statement.*;
import com.alibaba.druid.sql.dialect.oracle.ast.stmt.OracleAlterTableModify;
import com.alibaba.druid.sql.dialect.oracle.ast.stmt.OraclePrimaryKey;
import com.alibaba.druid.sql.dialect.oracle.visitor.OracleOutputVisitor;
import com.ucar.datalink.writer.rdbms.handle.translator.Oracle2PgTranslator;
import com.ucar.datalink.writer.rdbms.utils.ExpiryMap;

import java.util.Iterator;
import java.util.List;
import java.util.Map;

/*
    对sql进行改造输出
 */
public class Oracle2PgOutputVisitor extends OracleOutputVisitor {

    public Oracle2PgOutputVisitor(Appendable appender) {
        super(appender);
    }

    private String sqlSerial;

    public void setSqlSerial(String sqlSerial) {
        this.sqlSerial = sqlSerial;
    }

    public String getSqlSerial() {
        return this.sqlSerial;
    }

    /*
        add字段
     */
    @Override
    public boolean visit(SQLAlterTableAddColumn x) {
        List sqlColumnDefinitions = x.getColumns();
        for (int n = 0;n < sqlColumnDefinitions.size();n++) {
            this.print0("ADD COLUMN ");
            sqlColumnDefinitions.get(n).accept(this);
            this.print0(n < sqlColumnDefinitions.size()-1 ? ",\n\t" : "");
        }
        return false;
    }

    /*
        新增主键
     */
    @Override
    public boolean visit(OraclePrimaryKey x) {
        this.print0(this.ucase ? "PRIMARY KEY (" : "primary key (");
        this.printAndAccept(x.getColumns(), ", ");
        this.print(')');
        return false;
    }

    /*
        删除主键
     */
    @Override
    public boolean visit(SQLAlterTableDropConstraint x) {
        this.print0(this.ucase ? "DROP CONSTRAINT " : "drop constraint ");
        x.getConstraintName().accept(this);
        return false;
    }

    /*
        drop字段
     */
    @Override
    public boolean visit(SQLAlterTableDropColumnItem x) {
        List sqlColumnDefinitions = x.getColumns();
        for (int n = 0;n < sqlColumnDefinitions.size();n++) {
            this.print0("DROP COLUMN ");
            sqlColumnDefinitions.get(n).accept(this);
            this.print0(n < sqlColumnDefinitions.size()-1 ? "," : "");
        }
        return false;
    }

    /*
        modify表字段
     */
    @Override
    public boolean visit(OracleAlterTableModify x) {
        ++this.indentCount;
        int i = 0;
        for(int size = x.getColumns().size(); i < size; ++i) {
            this.print0("MODIFY COLUMN ");
            SQLColumnDefinition column = (SQLColumnDefinition)x.getColumns().get(i);
            column.accept(this);
            if (i != size - 1) {
                this.print0(",\n\t");
            }
        }
        --this.indentCount;
        return false;
    }

    /*
        oracle add modify change/rename
     */
    @Override
    public boolean visit(SQLColumnDefinition x) {
        //处理字段映射关系
        if (appender.toString().contains("CHANGE COLUMN")) {
            Oracle2PgTranslator.transDataType(x.getDataType());
        } else if (appender.toString().contains("MODIFY COLUMN")) {
            Oracle2PgTranslator.transDataType(x.getDataType());
        } else if (appender.toString().contains("ADD COLUMN")) {
            Oracle2PgTranslator.transDataType(x.getDataType());
        }
        return superColumnDefinitionVisit(x);
    }

    private boolean superColumnDefinitionVisit(SQLColumnDefinition x) {
        Map sqlMap = ExpiryMap.getInstance();
        StringBuilder stringBuilder = new StringBuilder();
        String appenderSql = this.appender.toString();
        String alterSql = appenderSql.split("\n\t")[0];
        if (appender.toString().contains("MODIFY COLUMN")) {
            this.print(x.getNameAsString());
            this.print(" type ");
            this.print(x.getDataType().toString());
            //处理defalut
            if (x.getDefaultExpr() != null) {
                stringBuilder.append("\n\t");
                stringBuilder.append(alterSql);
                stringBuilder.append(" ALTER COLUMN ");
                stringBuilder.append(x.getNameAsString());
                stringBuilder.append(" SET DEFAULT ");
                stringBuilder.append(x.getDefaultExpr().toString());
                stringBuilder.append(";");
            }
            if (x.getComment() != null) {
                String schemaTable = alterSql.split("\\s+")[2];
                stringBuilder.append("\n\t");
                stringBuilder.append(" COMMENT ON COLUMN ");
                stringBuilder.append(schemaTable);
                stringBuilder.append(".");
                stringBuilder.append(x.getNameAsString());
                stringBuilder.append(" IS ");
                stringBuilder.append(x.getComment().toString());
                stringBuilder.append(";");
            }
        } else if (appender.toString().contains("ADD COLUMN")) {
            x.getName().accept(this);
            SQLDataType dataType = x.getDataType();
            if (dataType != null) {
                this.print(' ');
                dataType.accept(this);
            }
            Iterator var6 = x.getConstraints().iterator();
            while (var6.hasNext()) {
                SQLColumnConstraint item = (SQLColumnConstraint) var6.next();
                this.print(' ');
                item.accept(this);
            }
            SQLExpr defaultExpr = x.getDefaultExpr();
            if (defaultExpr != null) {
                this.print0(this.ucase ? " DEFAULT " : " default ");
                defaultExpr.accept(this);
            }
            if (x.getComment() != null) {
                String schemaTable = alterSql.split("\\s+")[2];
                stringBuilder.append("\n\t");
                stringBuilder.append(" COMMENT ON COLUMN ");
                stringBuilder.append(schemaTable);
                stringBuilder.append(".");
                stringBuilder.append(x.getNameAsString());
                stringBuilder.append(" IS ");
                stringBuilder.append(x.getComment().toString());
                stringBuilder.append(";");
            }
        }
        sqlMap.put(sqlSerial, (sqlMap.get(sqlSerial) == null ? "" : sqlMap.get(sqlSerial)) + stringBuilder.toString());
        return false;
    }
}

常见问题:
1.部署linux后启动报错com.ucar.datalink.common.errors.DatalinkException: Worker is not found for client id [null] or ip [172.17.0.1]
解决:
需要worker.properties添加属性
client.id=1 对应数据库的workid

2.如果idea启动项目,需要将启动类中很多配置的linux路径的配置文件,改为本机windows路径,否则启动加载不到对应的配置文件

3.对于读写插件reader writer,也需要在本机路径存放一份,否则无法加载插件启动

4.如果目的端是greenplum,需要修改映射管理的后台逻辑,例如mysql的数据库下直接是表名,例如mysql.table,但是gp是schema会多一层级,例如crm.public.table

5.oracle任务的配置界面需要自己增加,模仿其他页面即可。

你可能感兴趣的:(datalink,ogg)