mysql+maxwell+SparkStreaming 分析数据流水

maxwell可以模拟mysqlslave获取binlog,对binlog进行解析,变成json格式

{"database":"taskcenter","table":"lts_admin_job_tracker_monitor_data","type":"insert","ts":1535090880,"xid":1555704593,"commit":true,"data":{"id":415919,"gmt_created":1535090880803,"identity":"JT_192.168.31.8_1_10-10-55.371_1","receive_job_num":0,"push_job_num":0,"exe_success_num":0,"exe_failed_num":0,"exe_later_num":0,"exe_exception_num":0,"fix_executing_job_num":0,"timestamp":1535090820000}}
 

→ Download: https://github.com/zendesk/maxwell/releases/download/v1.17.1/maxwell-1.17.1.tar.gz 
→ Source: https://github.com/zendesk/maxwell

Configure Mysql


Server Config: Ensure your server_id is configured, and that row-based replication is turned on.

$ vi my.cnf

[mysqld]
server_id=1
log-bin=master
binlog_format=row

Or on a running server:

mysql> set global binlog_format=ROW;
mysql> set global binlog_row_image=FULL;

note: binlog_format is a session-based property. You will need to shutdown all active connections to fully convert to row-based replication.

Permissions: Maxwell needs permissions to store state in the database specified by the schema_database option (default maxwell).

mysql> GRANT ALL on maxwell.* to 'maxwell'@'%' identified by 'XXXXXX';
mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE on *.* to 'maxwell'@'%';

# or for running maxwell locally:

mysql> GRANT SELECT, REPLICATION CLIENT, REPLICATION SLAVE on *.* to 'maxwell'@'localhost' identified by 'XXXXXX';
mysql> GRANT ALL on maxwell.* to 'maxwell'@'localhost';

Run Maxwell


Command line

bin/maxwell --user='maxwell' --password='XXXXXX' --host='127.0.0.1' --producer=stdout

Docker

docker run -it --rm zendesk/maxwell bin/maxwell --user=$MYSQL_USERNAME \
    --password=$MYSQL_PASSWORD --host=$MYSQL_HOST --producer=stdout

Kafka

Boot kafka as described here: http://kafka.apache.org/documentation.html#quickstart, then:

bin/maxwell --user='maxwell' --password='XXXXXX' --host='127.0.0.1' \
   --producer=kafka --kafka.bootstrap.servers=localhost:9092 --kafka_topic=maxwell

(or docker):

docker run -it --rm zendesk/maxwell bin/maxwell --user=$MYSQL_USERNAME \
    --password=$MYSQL_PASSWORD --host=$MYSQL_HOST --producer=kafka \
    --kafka.bootstrap.servers=$KAFKA_HOST:$KAFKA_PORT --kafka_topic=maxwell

Kinesis

docker run -it --rm --name maxwell -v `cd && pwd`/.aws:/root/.aws zendesk/maxwell sh -c 'cp /app/kinesis-producer-library.properties.example /app/kinesis-producer-library.properties && echo "Region=$AWS_DEFAULT_REGION" >> /app/kinesis-producer-library.properties && bin/maxwell --user=$MYSQL_USERNAME --password=$MYSQL_PASSWORD --host=$MYSQL_HOST --producer=kinesis --kinesis_stream=$KINESIS_STREAM'

Google Cloud Pub/Sub

bin/maxwell --user='maxwell' --password='XXXXXX' --host='127.0.0.1' \
  --producer=pubsub --pubsub_project_id='$PUBSUB_PROJECT_ID' \
  --pubsub_topic='maxwell'

RabbitMQ

bin/maxwell --user='maxwell' --password='XXXXXX' --host='127.0.0.1' \
    --producer=rabbitmq --rabbitmq_host='rabbitmq.hostname'

Redis

bin/maxwell --user='maxwell' --password='XXXXXX' --host='127.0.0.1' \
    --producer=redis --redis_host=redis.hostname

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------

以上基于官网

maxwell解析的数据为json格式,直接放入kafka中,可以直接通过sparkStreaming进行解析存储,丢到hive,hbase等数据仓储中,

  maxwell: {
    "database": "test",
    "table": "maxwell",
    "type": "update",
    "ts": 1449786341,
    "xid": 940786,
    "commit": true,
    "data": {"id":1, "daemon": "Firebus!  Firebus!"},
    "old":  {"daemon": "Stanislaw Lem"}
  }

如上图为更新操作,会对数据主键等关键信息进行传输。

通过对old的解析可以查看数据的更改项

从而分析流水轨迹

你可能感兴趣的:(mysql,大数据,ETL)