
1 生产使用以及注意事项


  • 时区相差8小时:TIMESTAMP类型数据采集时,会有八小时的时差,这个只能通过更改源码的方式解决
  • Bootstrap命令行BUG:当多数据库实例的采集配置公用同一MaxWell配置库时,使用Bootstrap命令行导入全量数据会有BUG,应使用SQL插入方式
  • server_id冲突BUG:maxwell是根据采集的mysql实例库的server_id来区分不同的实例的配置的,当采集的多个实例的server_id存在相同时则会使binglog读取串了
  • Bootstrap时区设置BUG:由于我们在东八区,故需要在jdbc_options添加时区信息,注意时区必须设置为serverTimezone=GMT+8,不设置以及设置为serverTimezone=Asia/Shanghai 都会有错误
  • kafka生产者尝试次数:默认kafka的尝试次数是0,当出现发送失败时就会使maxwell进程停止,需调整,设置为100以上,增加容错概率。
  • 不支持模式匹配分库分表:不支持类似于cannal那样模式匹配分库分表,如database*.table* 无法匹配到database1.table1、database2.table2,以及kafka输出时将分库分表合在一个topic中,若想实现此功能需要二次开发源码。

2 生产配置案列


# tl;dr config



# mysql login info

#     *** general ***
# choose where to produce data to. stdout|file|kafka|kinesis|pubsub|sqs|rabbitmq|redis

# set the log level.  note that you can configure things further in log4j2.xml

# if set, maxwell will look up the scoped environment variables, strip off the prefix and inject the configs

#     *** mysql ***

# mysql host to connect to

# mysql port to connect to

# mysql user to connect as.  This user must have REPLICATION SLAVE permissions,
# as well as full access to the `maxwell` (or schema_database) database

# mysql password

# options to pass into the jdbc connection, given as opt=val&opt2=val2

# name of the mysql database where maxwell keeps its own state

# whether to use GTID or not for positioning

# SSL/TLS options
# To use VERIFY_CA or VERIFY_IDENTITY, you must set the trust store with Java opts:
# or import the MySQL cert into the global Java cacerts.
# turns on ssl for the maxwell-store connection, other connections inherit this setting unless specified
# for binlog-connector
# for the schema-capture connection, if used

# maxwell can optionally replicate from a different server than where it stores
# schema and binlog position info.  Specify that different server here:


# This may be useful when using MaxScale's binlog mirroring host.
# Specifies that Maxwell should capture schema from a different server than
# it replicates from:


#       *** output format ***

# records include binlog position (default false)

# records include a gtid string (default false)

# records include fields with null values (default true).  If this is false,
# fields where the value is null will be omitted entirely from output.

# records include server_id (default false)

# records include thread_id (default false)

# records include schema_id (default false)

# records include row query, binlog option "binlog_rows_query_log_events" must be enabled" (default false)

# DML records include list of values that make up a row's primary key (default false)

# DML records include list of columns that make up a row's primary key (default false)

# records include commit and xid (default true)

# This controls whether maxwell will output JSON information containing
# DDL (ALTER/CREATE TABLE/ETC) infromation. (default: false)
# See also: ddl_kafka_topic

#       *** kafka ***

# list of kafka brokers

# kafka topic to write to
# this can be static, e.g. 'maxwell', or dynamic, e.g. namespace_%{database}_%{table}
# in the latter case 'database' and 'table' will be replaced with the values for the row being processed

# alternative kafka topic to write DDL (alter/create/drop) to.  Defaults to kafka_topic

# hash function to use.  "default" is just the JVM's 'hashCode' function.
#kafka_partition_hash=default # [default, murmur3]

# how maxwell writes its kafka key.
# 'hash' looks like:
# {"database":"test","table":"tickets","":10001}
# 'array' looks like:
# ["test","tickets",[{"id":10001}]]
# default: "hash"
#kafka_key_format=hash # [hash, array]

# extra kafka options.  Anything prefixed "kafka." will get
# passed directly into the kafka-producer's config.

# a few defaults.
# These are 0.11-specific. They may or may not work with other versions.
# kafka.compression.type=snappy

# kafka+SSL example
# kafka.ssl.truststore.location=/var/private/ssl/kafka.client.truststore.jks
# kafka.ssl.truststore.password=test1234
# kafka.ssl.keystore.location=/var/private/ssl/kafka.client.keystore.jks
# kafka.ssl.keystore.password=test1234
# kafka.ssl.key.password=test1234#

# controls a heuristic check that maxwell may use to detect messages that
# we never heard back from.  The heuristic check looks for "stuck" messages, and
# will timeout maxwell after this many milliseconds.
# See
# if you really want to get into it.
#producer_ack_timeout=120000 # default 0

#           *** partitioning ***

# What part of the data do we partition by?

# specify what fields to partition by when using producer_partition_by=column
# column separated list.

# when using producer_partition_by=column, partition by this when
# the specified column(s) don't exist.

#            *** kinesis ***


# AWS places a 256 unicode character limit on the max key length of a record
# Setting this option to true enables hashing the key with the md5 algorithm
# before we send it to kinesis so all the keys work within the key size limit.
# Values: true, false
# Default: false

#            *** sqs ***


# The sqs producer will need aws credentials configured in the default
# root folder and file format. Please check below link on how to do it.

#            *** pub/sub ***


#            *** rabbit-mq ***


#           *** redis ***


# name of pubsub/list/whatever key to publish to

# this can be static, e.g. 'maxwell', or dynamic, e.g. namespace_%{database}_%{table}
# this can be static, e.g. 'maxwell', or dynamic, e.g. namespace_%{database}_%{table}
# this can be static, e.g. 'maxwell', or dynamic, e.g. namespace_%{database}_%{table}
# Valid values for redis_type = pubsub|lpush. Defaults to pubsub


#           *** custom producer ***

# the fully qualified class name for custom ProducerFactory
# see the following link for more details.

# custom producer properties can be configured using the custom_producer.* property namespace

#          *** filtering ***

# filter rows out of Maxwell's output.  Command separated list of filter-rules, evaluated in sequence.
# A filter rule is:
#   ":"  "."  [ "."  "="  ]
#  type    ::= [ "include" | "exclude" | "blacklist" ]
#  db      ::= [ "/regexp/" | "string" | "`string`" | "*" ]
#  tbl     ::= [ "/regexp/" | "string" | "`string`" | "*" ]
#  col_val ::= "column_name"
#  tbl     ::= [ "/regexp/" | "string" | "`string`" | "*" ]
# See for more details
filter= exclude: *.*, include: market_divide.md_customer_divide_info, include: market_divide.md_customer_divide_info_agent

# javascript filter
# maxwell can run a bit of javascript for each row if you need very custom filtering/data munging.
# See for more details

#       *** encryption ***

# Encryption mode. Possible values are none, data, and all. (default none)

# Specify the secret key to be used

#       *** monitoring ***

# Maxwell collects metrics via dropwizard. These can be exposed through the
# base logging mechanism (slf4j), JMX, HTTP or pushed to Datadog.
# Options: [jmx, slf4j, http, datadog]
# Supplying multiple is allowed.

# The prefix maxwell will apply to all metrics
#metrics_prefix=MaxwellMetrics # default MaxwellMetrics

# Enable (dropwizard) JVM metrics, default false

# When metrics_type includes slf4j this is the frequency metrics are emitted to the log, in seconds

# When metrics_type includes http or diagnostic is enabled, this is the port the server will bind to.

# When metrics_type includes http or diagnostic is enabled, this is the http path prefix, default /.

# ** The following are Datadog specific. **
# When metrics_type includes datadog this is the way metrics will be reported.
# Options: [udp, http]
# Supplying multiple is not allowed.

# datadog tags that should be supplied

# The frequency metrics are pushed to datadog, in seconds

# required if metrics_datadog_type = http

# required if metrics_datadog_type = udp
#metrics_datadog_host=localhost # default localhost
#metrics_datadog_port=8125 # default 8125

# Maxwell exposes http diagnostic endpoint to check below in parallel:
# 1. binlog replication lag
# 2. producer (currently kafka) lag

# To enable Maxwell diagnostic
#http_diagnostic=true # default false

# Diagnostic check timeout in milliseconds, required if diagnostic = true
#http_diagnostic_timeout=10000 # default 10000

#    *** misc ***

# maxwell's bootstrapping functionality has a couple of modes.
# In "async" mode, maxwell will output the replication stream while it
# simultaneously outputs the database to the topic.  Note that it won't
# output replication data for any tables it is currently bootstrapping -- this
# data will be buffered and output after the bootstrap is complete.
# In "sync" mode, maxwell stops the replication stream while it
# outputs bootstrap data.
# async mode keeps ops live while bootstrapping, but carries the possibility of
# data loss (due to buffering transactions).  sync mode is safer but you
# have to stop replication.
#bootstrapper=async [sync, async, none]

# output filename when using the "file" producer
