官网:https://debezium.io/documentation/reference/1.2/development/engine.html
Debezium是一个为CDC(change data capture)构建的分布式平台,它使用数据库事务日志并在行级别更改时创建事件流,即当其他应用对此数据库执行insert、update、delete操作时,它将快速得到响应。Debezium构建在Kafka上,提供kafka连接兼容的连接器来监视特定的数据库,支持目前可用的大部分数据库。
参考:https://baijiahao.baidu.com/s?id=1661152367659134691&wfr=spider&for=pc
很简单,将jar放到kafka指定位置即可。
官网:http://kafka.apache.org/documentation/#connect_errorreporting
GET /connectors – 返回所有正在运行的connector名。
POST /connectors – 新建一个connector; 请求体必须是json格式并且需要包含name字段和config字段,name是connector的名字,config是json格式,必须包含你的connector的配置信息。
GET /connectors/{name} – 获取指定connetor的信息。
GET /connectors/{name}/config – 获取指定connector的配置信息。
PUT /connectors/{name}/config – 更新指定connector的配置信息。
GET /connectors/{name}/status – 获取指定connector的状态,包括它是否在运行、停止、或者失败,如果发生错误,还会列出错误的具体信息。
GET /connectors/{name}/tasks – 获取指定connector正在运行的task。
GET /connectors/{name}/tasks/{taskid}/status – 获取指定connector的task的状态信息。
PUT /connectors/{name}/pause – 暂停connector和它的task,停止数据处理知道它被恢复。
PUT /connectors/{name}/resume – 恢复一个被暂停的connector。
POST /connectors/{name}/restart – 重启一个connector,尤其是在一个connector运行失败的情况下比较常用
POST /connectors/{name}/tasks/{taskId}/restart – 重启一个task,一般是因为它运行失败才这样做。
DELETE /connectors/{name} – 删除一个connector,停止它的所有task并删除配置。
参考:kafka connector 使用总结以及自定义connector开发
import com.alibaba.fastjson.JSONObject;
import org.springframework.cloud.openfeign.FeignClient;
import org.springframework.web.bind.annotation.*;
import java.util.List;
/**
* Description 调用Debezium API
*
* @author Bob
* @date 2020/9/11
**/
@FeignClient(url = "${debezium.url}", name = "deveziumUrl")
public interface DebeziumService {
/**
* @description 创建一个connector
* create a new connector; the request body should be a JSON object containing a string name field and an object config field with the connector configuration parameters
* @author Bob
* @date 2020/9/14
*/
@PostMapping("/connectors")
JSONObject createConnector(@RequestBody JSONObject jsonObject);
/**
* @description 暂停某个connector
* pause the connector and its tasks, which stops message processing until the connector is resumed
* @author Bob
* @date 2020/9/15
*/
@PutMapping("/connectors/{name}/pause")
void pauseConnector(@PathVariable("name") String name);
/**
* @description 恢复某个connector
* resume a paused connector (or do nothing if the connector is not paused)
* @author Bob
* @date 2020/9/15
*/
@PutMapping("/connectors/{name}/resume")
void resumeConnector(@PathVariable("name") String name);
/**
* @description 重启某个connector
* restart a connector (typically because it has failed)
* @author Bob
* @date 2020/9/15
*/
@PutMapping("/connectors/{name}/restart")
void restartConnector(@PathVariable("name") String name);
/**
* @description 删除某个connector
* delete a connector, halting all tasks and deleting its configuration
* @author Bob
* @date 2020/9/15
*/
@DeleteMapping("/connectors/{name}")
void deleteConnector(@PathVariable("name") String name);
/**
* @description 列出 Worker 上当前活动的 Connector
* return a list of active connectors
* @author Bob
* @date 2020/9/11
*/
@GetMapping("/connectors")
List getConnectorList();
/**
* @description 获取 Connector 正在运行的 Task 列表
* get a list of tasks currently running for a connector
* @author Bob
* @date 2020/9/11
*/
@GetMapping("/connectors/{name}/tasks")
List getTaskList(@PathVariable("name") String name);
/**
* @description 获取某个 Connector 信息
* get information about a specific connector
* @author Bob
* @date 2020/9/11
*/
@GetMapping("/connectors/{name}")
JSONObject getConnector(@PathVariable("name") String name);
/**
* @description 获取 Connector 当前状态
* get current status of the connector, including if it is running, failed, paused, etc., which worker it is assigned to, error information if it has failed, and the state of all its tasks
* @author Bob
* @date 2020/9/11
*/
@GetMapping("/connectors/{name}/tasks")
JSONObject getConnectorStatus(@PathVariable("name") String name);
/**
* @description 获取某个connector配置
* get the configuration parameters for a specific connector
* @author Bob
* @date 2020/9/15
*/
@GetMapping("/connectors/{name}/config")
JSONObject getConfig(@PathVariable("name") String name);
/**
* @description 获取某个connector配置
* update the configuration parameters for a specific connector
* @author Bob
* @date 2020/9/15
*/
@PutMapping("/connectors/{name}/config")
Object updateConfig(@PathVariable("name") String name);
}
部分代码,创建connector
/**
* @param dfpTask
* @description 创建一个connector
* @author Bob
* @date 2020/9/11
*/
@Override
public void create(DfpTask dfpTask) {
try {
//先删除同名的
debeziumService.deleteConnector("dataflow-task" + dfpTask.getId());
}catch (Exception e){
log.error(e.getMessage());
}
DfpDataSource dataSource = dataSourceService.queryById(dfpTask.getSDatasourceId());
JSONObject dsConfig = JSONObject.parseObject(dataSource.getConfig());
//获取connector配置
JSONObject jsonObject = getJsonConnector(dfpTask.getId(), dsConfig);
//调用dm创建connector接口
JSONObject connector = debeziumService.createConnector(jsonObject);
log.info(connector.toString());
}
/**
* @description 组装connector配置信息
* @author Bob
* @date 2020/9/15
*/
public JSONObject getJsonConnector(Long id, JSONObject dsConfig) {
JSONObject jsonConfig = new JSONObject();
JSONObject jsonObject = new JSONObject();
jsonObject.put("name", Constants.CONNECTOR_NAME + id);
//TODO 判断数据库类型
jsonConfig.put("connector.class", "io.debezium.connector.postgresql.PostgresConnector");
jsonConfig.put("key.converter.schemas.enable", "false");
jsonConfig.put("value.converter.schemas.enable", "false");
jsonConfig.put("database.hostname", dsConfig.getString("ip"));
jsonConfig.put("database.port", dsConfig.getString("port"));
jsonConfig.put("database.user", dsConfig.getString("username"));
jsonConfig.put("database.password", dsConfig.getString("password"));
jsonConfig.put("database.dbname", dsConfig.getString("database"));
jsonConfig.put("database.server.name", dsConfig.getString("database"));
String whitelist = dsConfig.getJSONArray("checkedTables").toJavaList(String.class).stream().filter(Objects::nonNull).collect(Collectors.joining(","));
jsonConfig.put("table.whitelist", whitelist);
jsonConfig.put("plugin.name", "wal2json_streaming");
if (Constants.DRIVER_POSTGRESQL.equals(dsConfig.getString("driver"))) {
jsonConfig.put("slot.name", dsConfig.getString("slot"));
}
jsonConfig.put("snapshot.mode", "exported");
jsonObject.put("config", jsonConfig);
log.info(jsonObject.toJSONString());
return jsonObject;
}
6.1 创建之前先删除,防止重复创建,删除的话数据不受影响,不会丢失(亲测)。
6.2 pg库字段类型为timestamp时,采集到kafka变成bigint型,导致spark streaming消费的时候该字段为null
参考:https://www.pianshen.com/article/38061912108/