

还有一个阿里云的开源软件 canal 也可以解决这个同步问题,他的原理与下边的 go-mysql-elasticsearch 很像,都是通过监控MySQL的binlog日志来实现同步的。但是我没具体使用过,就不多说它了,感兴趣的自己去搜一下这款工具。


项目开发我使用的是Laravel框架,所以采用了 Laravel Redis 队列 + ES API 的方式来实现的数据同步。

原理:使用 Laravel Redis 队列,在代码中MySQL新增数据之后触发异步任务调用 ES 的 API,将数据同步到ES中。



  1. 在es中先创建好相应的索引(这是个商城项目,以新增商品为例)

    PUT /products/
      "mappings": {
        "properties": {
            "type": "text",
            "analyzer": "ik_smart"
            "type": "text",
            "analyzer": "ik_smart"
            "type": "integer"
          "create_time" : {
              "type" : "date"
          "last_time" : {
              "type" : "date"
  2. 修改laravel队列驱动为Redis

    # 在.env文件中修改
    # 如果要修改更多默认配置在 config/queue.php 文件中
  3. 在商品模型(App\Models\Product.php)中配置

    * 取出要同步到 es中的数据
    * @return array
    public function toESArray()
        $arr = Arr::only($this->toArray(), [
        return $arr;
  4. 创建监听任务

    php artisan make:job SyncProductToES
  5. 编写任务中的代码

    product = $product;
         * Execute the job.
         * @return void
        public function handle()
            $data = $this->product->toESArray();
                'index' => 'products',
                'type'  => '_doc',
                'id'    => $data['id'],
                'body'  => $data,
  6. 在需要数据同步的地方触发这个任务

    $form->saved(function (Form $form) {
        $product = $form->model();
        dispatch(new SyncProductToES($product));
  7. 启动队列

    php artisan queue:work
  8. 将mysql中已有的数据导入到es中

    上述一系列操作,可以实现增量同步,在每次新增数据时都会写入es。旧数据的全量同步我这里通过创建一个 Artisan 命令来实现。


    php artisan make:command Elasticsearch/SyncProducts


    chunkById(100, function ($products) use ($es) {
                    $this->info(sprintf('正在同步 ID 范围为 %s 至 %s 的商品', $products->first()->id, $products->last()->id));
                    // 初始化请求体
                    $req = ['body' => []];
                    // 遍历商品
                    foreach ($products as $product) {
                        // 将商品模型转为 es 所用的数组
                        $data = $product->toESArray();
                        $req['body'][] = [
                            'index' => [
                                '_index' => 'products',
                                '_type'  => '_doc',
                                '_id'    => $data['id'],
                        $req['body'][] = $data;
                    try {
                        // 使用 bulk 方法批量创建
                    } catch (\Exception $e) {


     php artisan es:sync-products
  9. 线上部署

    在生产环境中,一般需要安装 Horizon 队列管理工具Supervisor 进程监视器 来更好的管理队列以及提高稳定性。这两款工具的安装配置直接看laravel官方文档就好,写的很详细:https://learnku.com/docs/laravel/7.x/horizon/7514

第二种方案:使用 go-mysql-elasticsearch 工具


原理:使用mysqldump获取当前MySQL的数据,然后再通过此时binlog的name和position获取增量数据,再根据binlog构建restful api写入数据到ES中。



  1. GitHub文档中说使用的版本要求是:MySQL < 8.0 ES < 6.0

    但经过测试,我的版本是 MySQL:8.0.26,ES:7.12.1,也可以实现增量同步。只不过不能用mysqldump来同步旧数据,因为MySQL8.0之后与之前版本相比改变挺多,目前的 go-mysql-elasticsearch 版本还不支持MySQL8.0的mysqldump

  2. MySQL binlog 格式必须是ROW模式


  3. 要同步的MySQL数据表必须包含主键,否则直接忽略。这是因为如果数据表没有主键,UPDATE和DELETE操作就会因为在ES中找不到对应的document而无法进行同步

  4. 在 go-mysql-elasticsearch 运行时不能更改MySQL表结构

  1. 安装 go


    [root@VM-0-8-centos]# wget https://golang.google.cn/dl/go1.15.5.linux-amd64.tar.gz
    [root@VM-0-8-centos]# tar -C /usr/local -zxvf go1.15.5.linux-amd64.tar.gz


    yum install -y go

    配置环境变量(GOPATH 是go项目代码放置的目录)

    [root@VM-0-8-centos go]# vim /etc/profile
    export GOROOT=/usr/local/go
    export GOPATH=/usr/local/app/go
    export PATH=$PATH:/usr/local/go/bin
    [root@VM-0-8-centos go]# source /etc/profile


    [root@VM-0-8-centos]# go version
    go version go1.15.5 linux/amd64
  2. 安装 go-mysql-elasticsearch


    yum install -y gettext-devel openssl-devel perl-CPAN perl-devel zlib-devel

    安装 go-mysql-elasticsearch


    go get github.com/siddontang/go-mysql-elasticsearch

    下载完成后会存放到上边环境变量中配置的项目地址中,进入执行 make 操作

    [root@VM-0-8-centos ~]# cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch
    [root@VM-0-8-centos go-mysql-elasticsearch]# ls
    clear_vendor.sh  cmd  Dockerfile  elastic  etc  go.mod  go.sum  LICENSE  Makefile  README.md  river
    [root@VM-0-8-centos go-mysql-elasticsearch]# make

    安装完成修改配置文件,配置文件路径就是下载的这个安装包的 etc 目录下


    [root@VM-0-8-centos go-mysql-elasticsearch]# vim etc/river.toml
    # MySQL address, user and password
    # user must have replication privilege in MySQL.
    my_addr = ""  # mysql地址与端口
    my_user = "root"         # mysql用户名  
    my_pass = ""             # mysql密码
    my_charset = "utf8"          # mysql字符集
    # Set true when elasticsearch use https
    #es_https = false
    # Elasticsearch address  
    es_addr = ""  # es的地址与端口 
    # Elasticsearch user and password, maybe set by shield, nginx, or x-pack
    es_user = ""                # es用户名,没有默认为空即可
    es_pass = ""             # es密码,没有默认为空即可
    # Path to store data, like master.info, if not set or empty,
    # we must use this to support breakpoint resume syncing. 
    # TODO: support other storage, like etcd. 
    data_dir = "./var"           # 数据存储目录
    # Inner Http status address
    stat_addr = ""
    stat_path = "/metrics"
    # pseudo server id like a slave 
    server_id = 1001
    # mysql or mariadb
    flavor = "mysql"
    # mysqldump execution path
    # if not set or empty, ignore mysqldump.
    mysqldump = "mysqldump"      # 如果设置为空,则不会同步mysql中现有的旧数据
    # if we have no privilege to use mysqldump with --master-data,
    # we must skip it.
    #skip_master_data = false
    # minimal items to be inserted in one bulk
    bulk_size = 128
    # force flush the pending requests if we don't have enough items >= bulk_size
    flush_bulk_time = "200ms"
    # Ignore table without primary key
    skip_no_pk_table = false
    # MySQL data source
    schema = "test"      # 需要同步的mysql数据库
    # Only below tables will be synced into Elasticsearch.
    # "t_[0-9]{4}" is a wildcard table format, you can use it if you have many sub tables, like table_0000 - table_1023
    # I don't think it is necessary to sync all tables in a database.
    tables = ["t", "t_[0-9]{4}", "tfield", "tfilter"] # 需要同步的mysql数据表
    # Below is for special rule mapping
    # Very simple example
    # desc t;
    # +-------+--------------+------+-----+---------+-------+
    # | Field | Type         | Null | Key | Default | Extra |
    # +-------+--------------+------+-----+---------+-------+
    # | id    | int(11)      | NO   | PRI | NULL    |       |
    # | name  | varchar(256) | YES  |     | NULL    |       |
    # +-------+--------------+------+-----+---------+-------+
    # The table `t` will be synced to ES index `test` and type `t`.
    # 定义mysql和es同步的对应关系,有几个写几个,下边多余的可以删掉
    schema = "test"      # 需要同步的mysql数据库
    table = "t"          # 需要同步的mysql数据表
    index = "test"       # 需要同步的es索引
    type = "t"           # 需要同步的es类型,es7之后类型只有一种,只能设为 _doc
    # Wildcard table rule, the wildcard table must be in source tables 
    # All tables which match the wildcard format will be synced to ES index `test` and type `t`.
    # In this example, all tables must have same schema with above table `t`;
    schema = "test"
    table = "t_[0-9]{4}"
    index = "test"
    type = "t"
    # Simple field rule 
    # desc tfield;
    # +----------+--------------+------+-----+---------+-------+
    # | Field    | Type         | Null | Key | Default | Extra |
    # +----------+--------------+------+-----+---------+-------+
    # | id       | int(11)      | NO   | PRI | NULL    |       |
    # | tags     | varchar(256) | YES  |     | NULL    |       |
    # | keywords | varchar(256) | YES  |     | NULL    |       |
    # +----------+--------------+------+-----+---------+-------+
    schema = "test"
    table = "tfield"
    index = "test"
    type = "tfield"
    # 这个配置是定义mysql中的字段对应es中的字段,如果全都一致可以删掉这个配置
    # Map column `id` to ES field `es_id`
    id="es_id"       # 这个就是指mysql中的id字段对应es中的es_id字段,下边同理
    # Map column `tags` to ES field `es_tags` with array type 
    # Map column `keywords` to ES with array type
    # Filter rule 
    # desc tfilter;
    # +-------+--------------+------+-----+---------+-------+
    # | Field | Type         | Null | Key | Default | Extra |
    # +-------+--------------+------+-----+---------+-------+
    # | id    | int(11)      | NO   | PRI | NULL    |       |
    # | c1    | int(11)      | YES  |     | 0       |       |
    # | c2    | int(11)      | YES  |     | 0       |       |
    # | name  | varchar(256) | YES  |     | NULL    |       |
    # +-------+--------------+------+-----+---------+-------+
    schema = "test"
    table = "tfilter"
    index = "test"
    type = "tfilter"
    # Only sync following columns
    filter = ["id", "name"]      # 指定mysql中哪些字段需要同步
    # id rule
    # desc tid_[0-9]{4};
    # +----------+--------------+------+-----+---------+-------+
    # | Field    | Type         | Null | Key | Default | Extra |
    # +----------+--------------+------+-----+---------+-------+
    # | id       | int(11)      | NO   | PRI | NULL    |       |
    # | tag      | varchar(256) | YES  |     | NULL    |       |
    # | desc     | varchar(256) | YES  |     | NULL    |       |
    # +----------+--------------+------+-----+---------+-------+
    schema = "test"
    table = "tid_[0-9]{4}"
    index = "test"
    type = "t"
    # The es doc's id will be `id`:`tag`
    # It is useful for merge muliple table into one type while theses tables have same PK 
    id = ["id", "tag"]


    my_addr = ""  
    my_user = "root"
    my_pass = "root"
    my_charset = "utf8"
    es_addr = ""
    es_user = ""
    es_pass = ""
    data_dir = "/docker/data"
    stat_addr = ""
    stat_path = "/metrics"
    server_id = 1001
    flavor = "mysql"
    mysqldump = ""
    bulk_size = 128
    flush_bulk_time = "200ms"
    skip_no_pk_table = false
    schema = "lmrs"
    tables = ["lmrs_products"]
    schema = "lmrs"
    table = "lmrs_products"
    index = "products"
    type = "_doc"
    filter = ["id","name","long_name","brand_id","shop_id","price","sold_count","review_count","status","create_time","last_time","three_category_id"]
    mysql = "three_category_id"
    elastic = "category_id"

    启动 go-mysql-elasticsearch,输出以下信息证明成功

    [root@VM-0-8-centos go-mysql-elasticsearch]# ./bin/go-mysql-elasticsearch -config=./etc/river.toml
    [2021/08/01 13:37:06] [info] binlogsyncer.go:141 create BinlogSyncer with config {1001 mysql 3306 root   utf8mb4 false false  false UTC false 0 0s 0s 0 false 0}
    [2021/08/01 13:37:06] [info] dump.go:180 skip dump, use last binlog replication pos (mysql-bin.000001, 2606) or GTID set 
    [2021/08/01 13:37:06] [info] binlogsyncer.go:362 begin to sync binlog from position (mysql-bin.000001, 2606)
    [2021/08/01 13:37:06] [info] binlogsyncer.go:211 register slave for master server
    [2021/08/01 13:37:06] [info] sync.go:25 start sync binlog at binlog file (mysql-bin.000001, 2606)
    [2021/08/01 13:37:06] [info] binlogsyncer.go:731 rotate to (mysql-bin.000001, 2606)
    [2021/08/01 13:37:06] [info] sync.go:71 rotate binlog to (mysql-bin.000001, 2606)
    [2021/08/01 13:37:06] [info] master.go:54 save position (mysql-bin.000001, 2606)
  3. 如果觉得上述两步太麻烦,可以直接使用docker来安装 go-mysql-elasticsearch,镜像中自带了go语言环境


    docker pull gozer/go-mysql-elasticsearch

    构建容器,其中 river.toml 配置文件与上边的内容一样

    docker run -p 12345:12345 -d --name go-mysql-es -v /docker/go-mysql-es/river.toml:/config/river.toml --privileged=true gozer/go-mysql-elasticsearch

第三种方案:使用 Logstash 工具

Logstash 是免费且开放的服务器端数据处理管道,能够从多个来源采集数据,转换数据,然后将数据发送到您最喜欢的“存储库”中,可与各种部署集成。 它提供了大量插件,可帮助你解析,丰富,转换和缓冲来自各种来源的数据。 如果你的数据需要 Beats 中没有的其他处理,则需要将 Logstash 添加到部署中。

这个工具不止可以用来做mysql到es的数据同步,它的应用场景还有:日志搜索器( logstash采集、处理、转发到elasticsearch存储,在kibana进行展示)、elk日志分析(elasticsearch + logstash + kibana)等。


  1. 安装


    PS: logstash 的版本一定要和 es 保持一致,我的 es 是 7.12.1 版本,所以 logstash 也下载的 7.12.1 版本

    wget https://artifacts.elastic.co/downloads/logstash/logstash-7.12.1-linux-x86_64.tar.gz


    docker pull logstash:7.12.1
  2. 安装两个插件



    [root@localhost]# tar -C /usr/local -zxvf logstash-7.12.1-linux-x86_64.tar.gz
    [root@localhost]# cd /usr/local/logstash-7.12.1/bin
    [root@localhost bin]# ./logstash-plugin install logstash-input-jdbc
    ERROR: Installation aborted, plugin 'logstash-input-jdbc' is already provided by 'logstash-integration-jdbc'
    [root@localhost bin]# ./logstash-plugin install logstash-output-elasticsearch
    Installation successful
    [root@localhost bin]#
  3. 下载 jdbc 的 mysql-connection.jar 包,版本与自己的 mysql 版本保持一致

    [root@localhost logstash-7.12.1]# mkdir pipeline
    [root@localhost logstash-7.12.1]# cd pipeline/
    [root@localhost pipeline]# wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.26/mysql-connector-java-8.0.26.jar
  4. 更改配置文件

    [root@localhost logstash-7.12.1]# vi config/logstash.yml
    # 加入以下内容,下边那个是es的地址,根据自己的情况改
    http.host: ""
    xpack.monitoring.elasticsearch.hosts: [""]
    [root@localhost logstash-7.12.1]# vi config/pipelines.yml
    # 加入以下内容,路径同样也是根据自己实际的来
    pipeline.id: table1
    path.config: "/usr/local/logstash-7.12.1/pipeline/logstash.config"
  5. 创建上边配置里的指定的配置文件 logstash.config

    vi pipeline/logstash.config

    input {
        stdin {}
        # 可以有多个jdbc,来同步不同的数据表
        jdbc {
            # 类型,区分开每个 jdbc,以便输出的时候做判断
            type => "product"
            # 注意mysql连接地址一定要用ip,不能使用localhost等
            jdbc_connection_string => "jdbc:mysql://"
            jdbc_user => "root"
            jdbc_password => "root"
            # 数据库重连尝试次数
            connection_retry_attempts => "3"
            # 数据库连接校验超时时间,默认为3600s
            jdbc_validation_timeout => "3600"
            # 这个jar包就是上边下载那个,可以是绝对路径也可以是相对路径,把地址写对
            jdbc_driver_library => "/usr/local/logstash-7.12.1/pipeline/mysql-connector-java-8.0.26.jar"
            # 驱动类名
            jdbc_driver_class => "com.mysql.jdbc.Driver"
            # 开启分页,默认是 false
            jdbc_paging_enabled => "true"
            # 单次分页查询条数(默认100000,字段较多的话,可以适当调整这个数值)
            jdbc_page_size => "50000"
            # 要执行的sql,从这查出的数据就会同步到es中
            statement => "select id,`name`,long_name,brand_id,three_category_id as category_id,shop_id,price,status,sold_count,review_count,create_time,last_time from lmrs_products"
            # 执行的sql文件路径,这与上边的 statement 参数 二选一
            # statement_filepath => "/usr/local/logstash-7.12.1/pipeline/products.sql"
            # 是否将字段名转为小写,默认为true(如果具备序列化或者反序列化,建议设置为false)
            lowercase_column_names => false
            # 需要记录查询结果某字段的值时,此字段为true,否则默认tracking_colum为timestamp的值
            use_column_value => true
            # 需要记录的字段,同于增量同步,需要是数据库字段
            tracking_column => id
            # 记录字段的数据类型
            tracking_column_type => numeric
            # 上次数据存放位置
            record_last_run => true
            # 上一个sql_last_value的存放路径,必须在文件中指定字段的初始值,手动创建文件并赋予读写权限
            last_run_metadata_path => "/usr/local/logstash-7.12.1/pipeline/products.txt"
            # 是否清除last_run_metadata_path的记录,需要增量同步这个字段的值必须为false
            clean_run => false
            # 设置定时任务间隔  含义:分、时、天、月、年,全部为*默认为每分钟跑一次任务
            schedule => "* * * * *"
    output {
        # 判断类型
        if [type] == "product" {
            # es的配置
            elasticsearch {
                hosts => ""
                index => "products"
                document_type => "_doc"
                document_id => "%{id}"
        # 日志输出
        stdout {
            codec => json_lines
  6. 启动 Logstash(--config.reload.automatic 选项启用自动配置重新加载,不必在每次修改配置文件时停止并重新启动 Logstash)

    [root@localhost logstash-7.12.1]# ./bin/logstash -f pipeline/logstash.config --config.reload.automatic

    浏览器访问 ip:9600 可以打印出以下信息证明启动成功



go-mysql-elasticsearch 和 Logstash 工具都可以放到 Supervisor 中来管控,来提高稳定性。

