之前一段时间,需要搭建数据实时同步的一个平台,了解到datalink这个已经开源的中间件,自己使用了一段时间后,学习到蛮多的,特别是整个平台的架构设计。由于已经有一段时间不接触了,怕忘记,写写自己的心得体会。
项目开源地址:https://github.com/ucarGroup/DataLink
这里已经有源码,也包括了说明文档,这里就不一一介绍了。
如果源端是mysql,这个中间已经非常友好的支持了,以及不同类型的数据源。支持各种类型数据源增量数据的实时同步。
但是对于oracle的增量数据读取,暂时还不支持,但是公司的业务需求需要,所以只能找方法。在源码中,插件reader是专门用来读取源库增量数据的,所以可以自己写一个关于oracle数据读取的插件,自己写读取工具难度太大。所有调研了几个同步中间件:
1.阿里的愚公,支持oracle增量数据的实时同步,利用的oracle物化视图的功能,每张表都要有对应的物化视图,有较大性能损失,不支持ddl
2.kafka-connect,是kafka的一个框架,支持多种数据源的增量数据读取到kafka中,轻便,配置简单,并且写入到kafka中,数据的后续处理就十分开放了,弊端是不支持ddl
3.oralce官方工具ogg,ogg的功能还是比较强大,每次同步数据都记录了日志偏移量,不过相比来说,需要了解配置文件中每个配置项的含义,以前是没有界面化工具,现在有了,只是我还没实际用过,也支持ddl。
对比了一下,想要ddl这个功能,所以还是选择了ogg。
其实如果没有一定需要ddl这个功能的话,个人比较推荐使用kafka-connect这个工具,统一将数据写入kafka中,在datalink中写reader读取kafka中数据即可。
关于ogg的搭建和基本使用这里不多说了,主要是说一下ddl读取的流程思路:
ogg读取ddl,ogg是可以获取表结构的变更数据的,但是前提是db到db的数据同步,例如源端是mysql,目的端是oracle,mysql等。对于目的端是kafka,是无法获取ddl语句的,所以需要一个转变,在ogg用户下,有个历史表GGS_DDL_HIST,每当监控的表结构发生变化的时候,这张表就会新增记录。所以你是不是想到改怎么做了?是不是把这张表加到extract抽取进程配置中就可以了?答案不是的,所以这个也是比较蛋疼的地方,需要以下步骤:
(1)编写存储过程,将GGS_DDL_HIST数据同步至自定义表中,这张表可以新建在ogg用户下。
(2)编写定时任务,定时任务是定时执行(1)中存储过程,大概每几秒扫一次。
(3)ogg中extract抽取进程和同步进程rep都配置该自定义表的增量数据。
所以最终将ddl执行语句数据同步到kafka中了,datalink中读取即可。不过对于实时性极高的场景,这样是不适合的,单纯使用ogg同步增量数据,从源端写入到目的端同步,一般至少需要5s的时间了,加上定时任务的时间差,一般5-10s内才能同步数据。
接下来是datalink中代码的编写了,主要是编写kafka客户端读取kafka数据,需要注意的是,要确保数据消费成功后才进行offset的提交,额外说一句,kafka-connect自己就有对offset的管理。下面贴出部分代码:
消费者
package com.ucar.datalink.reader.oracle.comsumer;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONObject;
import com.ucar.datalink.domain.media.MediaSourceInfo;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.common.TopicPartition;
import org.apache.kafka.common.serialization.StringDeserializer;
import java.time.Duration;
import java.util.Arrays;
import java.util.Map;
import java.util.Properties;
public class KafkaRetryConsumer {
private KafkaConsumer consumer;
private ConsumerRecords msgList;
private final String topic;
private TopicPartition topicPartition;
public KafkaRetryConsumer(MediaSourceInfo mediaSourceInfo) {
JSONObject jsonObject = JSON.parseObject( mediaSourceInfo.getParameter());
String groupId = jsonObject.getString("groupId");
String topicName = jsonObject.getString("topic");
String servers = jsonObject.getString("bootstrapServers");
Properties props = new Properties();
props.put("bootstrap.servers", servers);
props.put("group.id", groupId);
props.put("enable.auto.commit", "false");
props.put("session.timeout.ms", "30000");
props.put("max.poll.interval.ms", "30000");
props.put("max.poll.records", 1);//一次获取最大条数
//要发送自定义对象,需要指定对象的反序列化类
props.put("key.deserializer", StringDeserializer.class.getName());
props.put("value.deserializer", StringDeserializer.class.getName());
this.consumer = new KafkaConsumer(props);
this.topic = topicName;
//简单起见,就获取第一个partition.kafka配置文件设置一个topic一个partition:num.partitions=1
this.topicPartition = new TopicPartition(topicName, 0);
this.consumer.assign(Arrays.asList(topicPartition));
}
public ConsumerRecords poll(){
msgList = consumer.poll(Duration.ofMillis(100));//一次获取最大时间
return msgList;
}
//设置消费数据的偏移量
public void seek(Long offset){
consumer.seek(topicPartition, offset);
}
//获取最开始的偏移量开始读取
public Long beginningOffsets(){
Map map = consumer.beginningOffsets(Arrays.asList(topicPartition));
Long offset = map.get(topicPartition);
return offset == null ? 0L : offset;
}
//获取最大偏移量
public Long endOffsets(){
Map map = consumer.endOffsets(Arrays.asList(topicPartition));
Long offset = map.get(topicPartition);
return offset == null ? 0L : offset;
}
public void close(){
consumer.close();
}
public void commit(){
consumer.commitAsync();
}
}
读取
package com.ucar.datalink.reader.oracle;
import com.ucar.datalink.biz.service.MediaService;
import com.ucar.datalink.contract.log.rdbms.EventType;
import com.ucar.datalink.contract.log.rdbms.RdbEventRecord;
import com.ucar.datalink.domain.media.MediaSourceInfo;
import com.ucar.datalink.domain.media.MediaSourcesRel;
import com.ucar.datalink.domain.plugin.reader.oracle.OracleReaderParameter;
import com.ucar.datalink.reader.oracle.comsumer.KafkaRetryConsumer;
import com.ucar.datalink.reader.oracle.translator.KafkaTransToRecord;
import com.ucar.datalink.worker.api.model.TaskCacheModel;
import com.ucar.datalink.worker.api.task.RecordChunk;
import com.ucar.datalink.worker.api.task.TaskReader;
import com.ucar.datalink.worker.api.task.TaskReaderContext;
import com.ucar.datalink.worker.api.task.TaskCache;
import com.ucar.datalink.worker.api.util.statistic.BaseReaderStatistic;
import com.ucar.datalink.worker.api.util.statistic.ReaderStatistic;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
/**
* Created by swj on 2020/04/09
* 1.通过轮询fetch方法,方法中通过调用kafka的轮询方法poll,直至获取数据
* 2.kafka中的数据,是目的端的ogg写入的,将kafka数据转换成RdbEventRecord对象
* 3.直至数据同步完成,会自动调用已经重写过的commit方法,对kafka消费的数据进行commit操作
*
* 关于kafka偏移量的说明 com.ucar.datalink.worker.api.model.OracleTaskModel
* OFFSET记录的是将要poll获取的偏移量,成功commit之后缓存和数据库更新下一个偏移量
*/
public class OracleTaskReader extends TaskReader {
private static final Logger logger = LoggerFactory.getLogger(OracleTaskReader.class);
private KafkaRetryConsumer kafkaDataComsumer;
private MediaService mediaService;
/*
是否需要commit标识
如果是正常退出,就不需要commit
*/
private boolean commitFlag = true;
@Override
public void initialize(TaskReaderContext context) {
super.initialize(context);
this.mediaService = context.getService(MediaService.class);
}
@Override
public void start() {
if (isStart()) {
return;
}
startInternal();
super.start();
}
@Override
public void close() {
stopInternal();
}
@Override
public void prePoll() {
//把上一次的统计结果打印出来
BaseReaderStatistic statistic = context.taskReaderSession().getData(ReaderStatistic.KEY);
if (statistic != null && parameter.isPerfStatistic()) {
logger.info(statistic.toJsonString());
}
//本次开始前,进行reset操作
context.beginSession();
context.taskReaderSession().setData(ReaderStatistic.KEY, new ReaderStatistic(context.taskId()));
//检查offset信息
preSetKafkaOffset();
}
/*
poll之前检查一下offset
*/
private void preSetKafkaOffset(){
//检查kafka offset是否存在
TaskCacheModel oracleTaskModel = (TaskCacheModel)TaskCache.getCacheByTaskId(context.taskId());
if (oracleTaskModel.getOffset() == null) {
//从数据库中获取
MediaSourcesRel mediaSourcesRel = mediaService.findMediaSourcesByRelByTaskId(context.taskId());
if(mediaSourcesRel.getOffset() != null){
oracleTaskModel.setOffset(mediaSourcesRel.getOffset());
}
}
}
@Override
protected RecordChunk fetch() throws InterruptedException {
ConsumerRecords consumerRecords = null;
boolean setOffsetFlag = true;
commitFlag = true;
while (isStart()) {
if(!TaskCache.getTaskStatus(context.taskId())){
logger.info("停止循环任务:"+context.taskId());
commitFlag = false;
break;
}
if(setOffsetFlag){//每次循环,值只要设置一次offset即可
setKafkaOffset();//设置偏移量
setOffsetFlag = false;
}
consumerRecords = kafkaDataComsumer.poll();
if (consumerRecords != null && consumerRecords.count() > 0) {
break;
} else {
Thread.sleep(100);
}
}
if (!isStart()) {
throw new InterruptedException();
}
RecordChunk result = null;
if (consumerRecords != null && consumerRecords.count() > 0) {
//将kafka的数据转为自定义对象
result = KafkaTransToRecord.getRdbEventRecord(consumerRecords);
}else {
result = new RecordChunk();
result.setRecords(new ArrayList<>());
}
return result;
}
@Override
protected void dump(RecordChunk recordChunk) {
}
/*
设置kafka消费的偏移量
*/
private void setKafkaOffset(){
TaskCacheModel taskCacheModel = (TaskCacheModel)TaskCache.getCacheByTaskId(context.taskId());
Long offset = taskCacheModel.getOffset();
if(offset != null){
//获取最大offset
Long endOffset = kafkaDataComsumer.endOffsets();
Long beginningOffset = kafkaDataComsumer.beginningOffsets();
if(offset > endOffset){
logger.info("最大offset为:"+endOffset+"将消费位点设置为此");
offset = endOffset;
taskCacheModel.setOffset(offset);
}else if(offset < beginningOffset){
logger.info("最小offset为:"+beginningOffset+"将消费位点设置为此");
offset = beginningOffset;
taskCacheModel.setOffset(offset);
}
kafkaDataComsumer.seek(offset);
}else {
//从开始读取
offset = kafkaDataComsumer.beginningOffsets();
logger.info("offset为空,最小offset为:"+offset+"将消费位点设置为此");
kafkaDataComsumer.seek(offset);
taskCacheModel.setOffset(offset);
}
TaskCache.putCacheByTaskId(context.taskId(),taskCacheModel);
}
@Override
public void commit(RecordChunk recordChunk){
if(commitFlag){
updateOffsetInfo();//更新偏移量
kafkaDataComsumer.commit();
}
}
/*
更新缓存和数据库中的偏移量
*/
private void updateOffsetInfo(){
TaskCacheModel oracleTaskModel = (TaskCacheModel)TaskCache.getCacheByTaskId(context.taskId());
//修改数据库
mediaService.updateOffsetByTaskId(oracleTaskModel.getOffset()+1, context.taskId());
//存入下一个offset偏移量
oracleTaskModel.setOffset(oracleTaskModel.getOffset()+1);
TaskCache.putCacheByTaskId(context.taskId(),oracleTaskModel);
}
@Override
@SuppressWarnings({"unchecked"})
public void rollback(RecordChunk recordChunk, Throwable t) {
super.rollback(recordChunk, t);
}
private void startInternal() {
MediaService mediaService = context.getService(MediaService.class);
MediaSourceInfo mediaSourceInfo = mediaService.findMediaSourcesByRelTaskId(context.taskId());
if (mediaSourceInfo == null) {
logger.error("启动oracle-task,未查询到绑定的kafka介质,异常结束");
stopInternal();
return;
}
//从context中获取taskId 之后获取具体的配置信息创建实例
kafkaDataComsumer = new KafkaRetryConsumer(mediaSourceInfo);
}
private void stopInternal() {
if (kafkaDataComsumer != null) {
kafkaDataComsumer.close();
}
}
public static void main(String[] args) {
List list = new ArrayList<>();
RdbEventRecord rdbEventRecord = new RdbEventRecord();
rdbEventRecord.setTableName("t_dl_test_source");
rdbEventRecord.setSchemaName("datalink");
rdbEventRecord.setEventType(EventType.ALTER);
rdbEventRecord.setDdlSchemaName("datalink");
rdbEventRecord.setSql("ALTER TABLE t_dl_test_source ADD (identity3 varchar(2) default '1')");
list.add(rdbEventRecord);
RecordChunk rc = new RecordChunk(list, 123, 1111);
RecordChunk result = rc.copyWithoutRecords();
rc.getRecords().stream().forEach(r -> {
});
}
}
数据转换
package com.ucar.datalink.reader.oracle.translator;
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import com.ucar.datalink.common.utils.DateUtil;
import com.ucar.datalink.contract.log.rdbms.EventColumn;
import com.ucar.datalink.contract.log.rdbms.EventType;
import com.ucar.datalink.contract.log.rdbms.RdbEventRecord;
import com.ucar.datalink.reader.oracle.constant.OpType;
import com.ucar.datalink.worker.api.task.RecordChunk;
import org.apache.commons.lang.StringUtils;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/*
将kafka的数据转为record
*/
public class KafkaTransToRecord {
private static final Logger logger = LoggerFactory.getLogger(KafkaTransToRecord.class);
//这个表是ogg ddl记录表,需要特殊处理
public static final String ORACLE_DDL_TABLE="ggs_ddl_hist_sync";
public static RecordChunk getRdbEventRecord(ConsumerRecords consumerRecords) throws InterruptedException {
List rdbEventRecords = new ArrayList<>();
for (ConsumerRecord record : consumerRecords) {
logger.info("从kafka获取到明细:" + record);
JSONObject jsonObject = JSON.parseObject(record.value());
RdbEventRecord rdbEventRecord = getRdbEventRecord(jsonObject);
rdbEventRecords.add(rdbEventRecord);
}
RecordChunk rdbEventRecordRecordChunk = new RecordChunk(rdbEventRecords,System.currentTimeMillis(),getByteSize(rdbEventRecords));
return rdbEventRecordRecordChunk;
}
/*
根据不同操作类型的数据进行分别处理
*/
private static RdbEventRecord getRdbEventRecord(JSONObject jsonObject) throws InterruptedException {
RdbEventRecord rdbEventRecord = null;
//处理ddl
rdbEventRecord = transAlterData(jsonObject);
if(rdbEventRecord != null){
return rdbEventRecord;
}
String opType = jsonObject.getString("op_type");
switch (opType) {
case OpType.INSERT:
rdbEventRecord = transInsertData(jsonObject);
break;
case OpType.UPDATE:
rdbEventRecord = transUpdateData(jsonObject);
break;
case OpType.DELETE:
rdbEventRecord = transDeleteData(jsonObject);
break;
default:
throw new InterruptedException("不支持的数据操作类型:" + opType);
}
return rdbEventRecord;
}
/*
数据新增操作
*/
private static RdbEventRecord transInsertData(JSONObject jsonObject) {
RdbEventRecord rdbEventRecord = new RdbEventRecord();
String schemaTable = jsonObject.getString("table");
String[] strings = schemaTable.split("\\.");
rdbEventRecord.setSchemaName(strings[0]);
rdbEventRecord.setTableName(strings[1]);
rdbEventRecord.setEventType(EventType.INSERT);
rdbEventRecord.setExecuteTime(DateUtil.strToTimeMills(jsonObject.getString("op_ts")));
//设置主键
JSONObject after = jsonObject.getJSONObject("after");
JSONArray primaryKeys = jsonObject.getJSONArray("primary_keys");
List keys = new ArrayList<>();
Map columnsIndexMap = getColumnsIndex(jsonObject);
for (Object key : primaryKeys) {
EventColumn eventColumn = new EventColumn();
eventColumn.setNull(false);
eventColumn.setUpdate(true);
eventColumn.setKey(true);
eventColumn.setColumnName(key.toString());
eventColumn.setColumnValue(after.getString(key.toString()));
eventColumn.setIndex(columnsIndexMap.get(key));
keys.add(eventColumn);
}
rdbEventRecord.setKeys(keys);
//设置非主键
List columns = new ArrayList<>();
for (String column : after.keySet()) {
//过滤主键字段
if (primaryKeys.contains(column)) {
continue;
}
EventColumn eventColumn = new EventColumn();
eventColumn.setKey(false);
eventColumn.setUpdate(true);
if (StringUtils.isEmpty(after.getString(column))) {
eventColumn.setNull(true);
} else {
eventColumn.setNull(false);
}
eventColumn.setIndex(columnsIndexMap.get(column));
eventColumn.setColumnName(column);
eventColumn.setColumnValue(after.getString(column));
columns.add(eventColumn);
}
rdbEventRecord.setColumns(columns);
return rdbEventRecord;
}
/*
数据更新操作
*/
private static RdbEventRecord transUpdateData(JSONObject jsonObject) {
RdbEventRecord rdbEventRecord = new RdbEventRecord();
String schemaTable = jsonObject.getString("table");
String[] strings = schemaTable.split("\\.");
rdbEventRecord.setSchemaName(strings[0]);
rdbEventRecord.setTableName(strings[1]);
rdbEventRecord.setEventType(EventType.UPDATE);
rdbEventRecord.setExecuteTime(DateUtil.strToTimeMills(jsonObject.getString("op_ts")));
//设置主键
JSONObject before = jsonObject.getJSONObject("before");
JSONObject after = jsonObject.getJSONObject("after");
JSONArray primaryKeys = jsonObject.getJSONArray("primary_keys");
List oldKeys = new ArrayList<>();
List keys = new ArrayList<>();
boolean updateKeyFlag = false;
for (Object key : primaryKeys) {
EventColumn beforeEventColumn = new EventColumn();
beforeEventColumn.setNull(false);
beforeEventColumn.setUpdate(true);
beforeEventColumn.setKey(true);
beforeEventColumn.setColumnName(key.toString());
beforeEventColumn.setColumnValue(before.getString(key.toString()));
oldKeys.add(beforeEventColumn);
//如果涉及到任意一个主键值修改,就需要给oldKey全量赋值
String beforeKeyVal = before.getString(key.toString());
String afterKeyVal = after.getString(key.toString());
if(beforeKeyVal != null && afterKeyVal != null){
if(!beforeKeyVal.equals(afterKeyVal)){
updateKeyFlag = true;
}
}
EventColumn afterEventColumn = new EventColumn();
afterEventColumn.setNull(false);
afterEventColumn.setUpdate(true);
afterEventColumn.setKey(true);
afterEventColumn.setColumnName(key.toString());
afterEventColumn.setColumnValue(after.getString(key.toString()));
keys.add(afterEventColumn);
}
if(updateKeyFlag){
rdbEventRecord.setOldKeys(oldKeys);
}
rdbEventRecord.setKeys(keys);
//设置非主键
List oldColumns = new ArrayList<>();
List columns = new ArrayList<>();
for (String column : after.keySet()) {
//过滤主键字段
if (primaryKeys.contains(column)) {
continue;
}
EventColumn oldEventColumn = new EventColumn();
oldEventColumn.setKey(false);
oldEventColumn.setUpdate(true);
if (StringUtils.isEmpty(after.getString(column))) {
oldEventColumn.setNull(true);
} else {
oldEventColumn.setNull(false);
}
oldEventColumn.setColumnName(column);
oldEventColumn.setColumnValue(before.getString(column));
oldColumns.add(oldEventColumn);
EventColumn eventColumn = new EventColumn();
eventColumn.setKey(false);
eventColumn.setUpdate(true);
if (StringUtils.isEmpty(after.getString(column))) {
eventColumn.setNull(true);
} else {
eventColumn.setNull(false);
}
eventColumn.setColumnName(column);
eventColumn.setColumnValue(after.getString(column));
columns.add(eventColumn);
}
rdbEventRecord.setOldColumns(oldColumns);
rdbEventRecord.setColumns(columns);
return rdbEventRecord;
}
/*
数据删除操作
*/
private static RdbEventRecord transDeleteData(JSONObject jsonObject) {
RdbEventRecord rdbEventRecord = new RdbEventRecord();
String schemaTable = jsonObject.getString("table");
String[] strings = schemaTable.split("\\.");
rdbEventRecord.setSchemaName(strings[0]);
rdbEventRecord.setTableName(strings[1]);
rdbEventRecord.setEventType(EventType.DELETE);
rdbEventRecord.setExecuteTime(DateUtil.strToTimeMills(jsonObject.getString("op_ts")));
//设置主键
JSONObject before = jsonObject.getJSONObject("before");
JSONArray primaryKeys = jsonObject.getJSONArray("primary_keys");
List keys = new ArrayList<>();
for (Object key : primaryKeys) {
EventColumn eventColumn = new EventColumn();
eventColumn.setNull(false);
eventColumn.setUpdate(true);
eventColumn.setKey(true);
eventColumn.setColumnName(key.toString());
eventColumn.setColumnValue(before.getString(key.toString()));
keys.add(eventColumn);
}
rdbEventRecord.setOldKeys(keys);
rdbEventRecord.setKeys(keys);
return rdbEventRecord;
}
/*
/*
处理ddl数据
1.ddl数据是从GGS_DDL_HIST表中同步到自定义表GGS_DDL_HIST_SYNC中
2.该自定义表在ogg同步范围内
*/
private static RdbEventRecord transAlterData(JSONObject jsonObject) {
RdbEventRecord rdbEventRecord = new RdbEventRecord();
String schemaTable = jsonObject.getString("table");
String[] strings = schemaTable.split("\\.");
String opType = jsonObject.getString("op_type");
//是否属于ddl记录表
if(!strings[1].equalsIgnoreCase(ORACLE_DDL_TABLE)){
return null;
}
if(!OpType.INSERT.equals(opType)){
logger.debug("属于"+ORACLE_DDL_TABLE+"表,但不是插入操作,进行过滤");
rdbEventRecord.setSchemaName("");
rdbEventRecord.setTableName("");
return rdbEventRecord;
}
JSONObject after = jsonObject.getJSONObject("after");
String schema = after.getString("OBJECTOWNER");
String table = after.getString("OBJECTNAME");
String ddlText = after.getString("METADATA_TEXT").toUpperCase();
//过滤不含alter sql的数据
// 对于ddl历史表,只处理新增的记录
if(ddlText.indexOf("ALTER TABLE")<0 && ddlText.indexOf("COMMENT ON")<0){
logger.debug("属于"+ORACLE_DDL_TABLE+"表,表更新数据,进行过滤");
rdbEventRecord.setSchemaName("");
rdbEventRecord.setTableName("");
return rdbEventRecord;
}
ddlText = getDdlbyText(ddlText);
rdbEventRecord.setSql(ddlText);
rdbEventRecord.setSchemaName(schema);
rdbEventRecord.setTableName(table);
rdbEventRecord.setEventType(EventType.ALTER);
return rdbEventRecord;
}
/*
过滤ddl数据中的特殊字符
*/
private static String getDdlbyText(String ddlText){
ddlText = ddlText.substring(5,ddlText.length()-2);
ddlText = ddlText.replaceAll("\\\\","");
return ddlText;
}
/*
整理主键字段的index
*/
private static Map getColumnsIndex(JSONObject jsonObject) {
Map columnsIndexMap = new HashMap();
JSONObject after = jsonObject.getJSONObject("after");
int count = 0;
for (String column : after.keySet()) {
columnsIndexMap.put(column,count);
count++;
}
return columnsIndexMap;
}
public static long getByteSize(List> datas) {
long byteSize = 0;
try {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream os = new ObjectOutputStream(baos);
os.writeObject(datas);
os.close();
byteSize = baos.size();
baos.close();
} catch (Exception e) {
e.printStackTrace();
}
return byteSize;
}
}
获取数据后,在dl-worker-writer-rdbms模块下也要做些修改,像我是需要从oracle写入greenplum中,所以需要增加写入gp的代码。另个比较头疼的问题是,我们从oracle拿到ddl语句,但是对于写入不同的数据源,对应的sql也需要作出相应的改变,我这里用的是阿里的druid工具,也是druid的底层包,对sql进行转换,包括对字段类型的转换,类型转换可以参考这位大神的系列文章:https://developer.aliyun.com/article/57142
下面贴出部分代码:
package com.ucar.datalink.writer.rdbms.handle.translator;
import com.alibaba.druid.sql.SQLUtils;
import com.alibaba.druid.sql.ast.SQLDataType;
import com.alibaba.druid.sql.ast.SQLStatement;
import com.alibaba.druid.util.JdbcConstants;
import com.ucar.datalink.writer.rdbms.handle.mapping.Oracle2PgColumnMapping;
import com.ucar.datalink.writer.rdbms.handle.visitor.Oracle2PgOutputVisitor;
import com.ucar.datalink.writer.rdbms.utils.ExpiryMap;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.io.StringWriter;
import java.util.List;
import java.util.Map;
import java.util.UUID;
/*
oracle->pg转换类
*/
public class Oracle2PgTranslator {
private static final Logger LOGGER = LoggerFactory.getLogger(Oracle2PgTranslator.class);
public static String sqlEntry(String sql) {
LOGGER.info("oracle-pg 原sql:" + sql);
sql = translator(sql);
if (sql.contains("MODIFY COLUMN")) {//修改字段
sql = sql.replaceAll("MODIFY COLUMN", "ALTER COLUMN");
}
LOGGER.info("转换后sql:" + sql);
return sql;
}
private static String translator(String sql) {
List stmtList = SQLUtils.parseStatements(sql, JdbcConstants.ORACLE);
SQLStatement stmt = stmtList.get(0);
StringWriter out = null;
try {
//该流水针对每次的sql转换
String sqlSerial = UUID.randomUUID().toString().replace("-","").toUpperCase();
out = new StringWriter();
Oracle2PgOutputVisitor outputVisitor = new Oracle2PgOutputVisitor(out);
outputVisitor.setSqlSerial(sqlSerial);
stmt.accept(outputVisitor);
//如果涉及到sql拆分,需要sql转换完成后进行补充
out.append(out.toString().endsWith(";") ? "" : ";");
out.append((ExpiryMap.getInstance().get(sqlSerial) == null ? "" : ExpiryMap.getInstance().get(sqlSerial)).toString());
sql = out.toString();
ExpiryMap.getInstance().remove(sqlSerial);
} catch (Exception e) {
LOGGER.error("解析转换sql出现异常", e);
} finally {
if (out != null) {
try {
out.close();
} catch (IOException e) {
LOGGER.error("StringWriter关闭异常", e);
}
}
}
return sql;
}
//字段类型映射
public static void transDataType(SQLDataType sqlDataType){
String dataType = sqlDataType.getName();
Map oracle2PgMap = Oracle2PgColumnMapping.getOracle2PgMap();
if (oracle2PgMap.get(dataType) != null) {
if(dataType.contains("interval")){
sqlDataType.setName(oracle2PgMap.get("interval").toString());
if(sqlDataType.getArguments() != null){
sqlDataType.getArguments().clear();
}
return;
}
//只替换类型,不替换长度
sqlDataType.setName(oracle2PgMap.get(dataType).toString());
}
}
}
package com.ucar.datalink.writer.rdbms.handle.visitor;
import com.alibaba.druid.sql.ast.SQLDataType;
import com.alibaba.druid.sql.ast.SQLExpr;
import com.alibaba.druid.sql.ast.SQLName;
import com.alibaba.druid.sql.ast.statement.*;
import com.alibaba.druid.sql.dialect.oracle.ast.stmt.OracleAlterTableModify;
import com.alibaba.druid.sql.dialect.oracle.ast.stmt.OraclePrimaryKey;
import com.alibaba.druid.sql.dialect.oracle.visitor.OracleOutputVisitor;
import com.ucar.datalink.writer.rdbms.handle.translator.Oracle2PgTranslator;
import com.ucar.datalink.writer.rdbms.utils.ExpiryMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
/*
对sql进行改造输出
*/
public class Oracle2PgOutputVisitor extends OracleOutputVisitor {
public Oracle2PgOutputVisitor(Appendable appender) {
super(appender);
}
private String sqlSerial;
public void setSqlSerial(String sqlSerial) {
this.sqlSerial = sqlSerial;
}
public String getSqlSerial() {
return this.sqlSerial;
}
/*
add字段
*/
@Override
public boolean visit(SQLAlterTableAddColumn x) {
List sqlColumnDefinitions = x.getColumns();
for (int n = 0;n < sqlColumnDefinitions.size();n++) {
this.print0("ADD COLUMN ");
sqlColumnDefinitions.get(n).accept(this);
this.print0(n < sqlColumnDefinitions.size()-1 ? ",\n\t" : "");
}
return false;
}
/*
新增主键
*/
@Override
public boolean visit(OraclePrimaryKey x) {
this.print0(this.ucase ? "PRIMARY KEY (" : "primary key (");
this.printAndAccept(x.getColumns(), ", ");
this.print(')');
return false;
}
/*
删除主键
*/
@Override
public boolean visit(SQLAlterTableDropConstraint x) {
this.print0(this.ucase ? "DROP CONSTRAINT " : "drop constraint ");
x.getConstraintName().accept(this);
return false;
}
/*
drop字段
*/
@Override
public boolean visit(SQLAlterTableDropColumnItem x) {
List sqlColumnDefinitions = x.getColumns();
for (int n = 0;n < sqlColumnDefinitions.size();n++) {
this.print0("DROP COLUMN ");
sqlColumnDefinitions.get(n).accept(this);
this.print0(n < sqlColumnDefinitions.size()-1 ? "," : "");
}
return false;
}
/*
modify表字段
*/
@Override
public boolean visit(OracleAlterTableModify x) {
++this.indentCount;
int i = 0;
for(int size = x.getColumns().size(); i < size; ++i) {
this.print0("MODIFY COLUMN ");
SQLColumnDefinition column = (SQLColumnDefinition)x.getColumns().get(i);
column.accept(this);
if (i != size - 1) {
this.print0(",\n\t");
}
}
--this.indentCount;
return false;
}
/*
oracle add modify change/rename
*/
@Override
public boolean visit(SQLColumnDefinition x) {
//处理字段映射关系
if (appender.toString().contains("CHANGE COLUMN")) {
Oracle2PgTranslator.transDataType(x.getDataType());
} else if (appender.toString().contains("MODIFY COLUMN")) {
Oracle2PgTranslator.transDataType(x.getDataType());
} else if (appender.toString().contains("ADD COLUMN")) {
Oracle2PgTranslator.transDataType(x.getDataType());
}
return superColumnDefinitionVisit(x);
}
private boolean superColumnDefinitionVisit(SQLColumnDefinition x) {
Map sqlMap = ExpiryMap.getInstance();
StringBuilder stringBuilder = new StringBuilder();
String appenderSql = this.appender.toString();
String alterSql = appenderSql.split("\n\t")[0];
if (appender.toString().contains("MODIFY COLUMN")) {
this.print(x.getNameAsString());
this.print(" type ");
this.print(x.getDataType().toString());
//处理defalut
if (x.getDefaultExpr() != null) {
stringBuilder.append("\n\t");
stringBuilder.append(alterSql);
stringBuilder.append(" ALTER COLUMN ");
stringBuilder.append(x.getNameAsString());
stringBuilder.append(" SET DEFAULT ");
stringBuilder.append(x.getDefaultExpr().toString());
stringBuilder.append(";");
}
if (x.getComment() != null) {
String schemaTable = alterSql.split("\\s+")[2];
stringBuilder.append("\n\t");
stringBuilder.append(" COMMENT ON COLUMN ");
stringBuilder.append(schemaTable);
stringBuilder.append(".");
stringBuilder.append(x.getNameAsString());
stringBuilder.append(" IS ");
stringBuilder.append(x.getComment().toString());
stringBuilder.append(";");
}
} else if (appender.toString().contains("ADD COLUMN")) {
x.getName().accept(this);
SQLDataType dataType = x.getDataType();
if (dataType != null) {
this.print(' ');
dataType.accept(this);
}
Iterator var6 = x.getConstraints().iterator();
while (var6.hasNext()) {
SQLColumnConstraint item = (SQLColumnConstraint) var6.next();
this.print(' ');
item.accept(this);
}
SQLExpr defaultExpr = x.getDefaultExpr();
if (defaultExpr != null) {
this.print0(this.ucase ? " DEFAULT " : " default ");
defaultExpr.accept(this);
}
if (x.getComment() != null) {
String schemaTable = alterSql.split("\\s+")[2];
stringBuilder.append("\n\t");
stringBuilder.append(" COMMENT ON COLUMN ");
stringBuilder.append(schemaTable);
stringBuilder.append(".");
stringBuilder.append(x.getNameAsString());
stringBuilder.append(" IS ");
stringBuilder.append(x.getComment().toString());
stringBuilder.append(";");
}
}
sqlMap.put(sqlSerial, (sqlMap.get(sqlSerial) == null ? "" : sqlMap.get(sqlSerial)) + stringBuilder.toString());
return false;
}
}
常见问题:
1.部署linux后启动报错com.ucar.datalink.common.errors.DatalinkException: Worker is not found for client id [null] or ip [172.17.0.1]
解决:
需要worker.properties添加属性
client.id=1 对应数据库的workid
2.如果idea启动项目,需要将启动类中很多配置的linux路径的配置文件,改为本机windows路径,否则启动加载不到对应的配置文件
3.对于读写插件reader writer,也需要在本机路径存放一份,否则无法加载插件启动
4.如果目的端是greenplum,需要修改映射管理的后台逻辑,例如mysql的数据库下直接是表名,例如mysql.table,但是gp是schema会多一层级,例如crm.public.table
5.oracle任务的配置界面需要自己增加,模仿其他页面即可。