阿里云数仓实时同步任务binlog解析文档

背景:

阿里云dataworks中有实时同步数据库binlog的任务配置,起源是业务数据库中数据被回滚了,所以借助数仓这边的同步任务数据进行解析和监控更新时间。

以oneclickOdps_***_to_odps_first_hour_extra_level_checkStreamXDone为例

阿里云数仓实时同步任务binlog解析文档_第1张图片

 

目标表日志:

 

同步脚本:

阿里云数仓实时同步任务binlog解析文档_第2张图片

 

解析逻辑:

主要的字段是_execute_time_,_operation_type_,_sequence_id_,_before_image_,_after_image_,_ddl_sql_

执行时间,binlog操作的类型,序列号和前后数据镜像,_data_columns_这个字段中保存着数据行原始数据。

WITH delta_new_data AS 
  ( 
 select 
     _execute_time_,_operation_type_,_sequence_id_,_before_image_,_after_image_,_ddl_sql_, 
     x.`id` as `id`,x.`creator` as `creator`,x.`modifier` as `modifier`,cast( from_utc_timestamp(x.`gmt_created`, 'GMT') as Datetime) as `gmt_created`,cast( from_utc_timestamp(x.`gmt_modified`, 'GMT') as Datetime) as `gmt_modified`,x.`institution_id` as `institution_id`,x.`talent_id` as `talent_id`,x.`talent_source` as `talent_source`,x.`home_page` as `home_page`,x.`talent_name` as `talent_name`,x.`profile_photo` as `profile_photo`,x.`source_account` as `source_account`,x.`source_id` as `source_id`,x.`buyin_id` as `buyin_id`,x.`number_of_fans` as `number_of_fans`,x.`last_month_gmv` as `last_month_gmv`,x.`last_month_volume` as `last_month_volume`,x.`last_month_volume_num` as `last_month_volume_num`,x.`volume` as `volume`,x.`volume_num` as `volume_num`,x.`level` as `level`,x.`real_institution` as `real_institution`,x.`gmv_range` as `gmv_range`,x.`main_commodity_categories` as `main_commodity_categories`,x.`tier` as `tier`,x.`demand_direction` as `demand_direction`,x.`comment` as `comment`,x.`ext_info` as `ext_info`,x.`owner_state` as `owner_state`,x.`owner_id` as `owner_id`,cast( from_utc_timestamp(x.`claim_time`, 'GMT') as Datetime) as `claim_time`,cast( from_utc_timestamp(x.`mark_customer_time`, 'GMT') as Datetime) as `mark_customer_time`,x.`pool_id` as `pool_id`,cast( from_utc_timestamp(x.`enter_pool_time`, 'GMT') as Datetime) as `enter_pool_time`,cast( from_utc_timestamp(x.`customer_unassign_time`, 'GMT') as Datetime) as `customer_unassign_time`,x.`follow_state` as `follow_state`,x.`owner_state_version` as `owner_state_version`,x.`talent_from` as `talent_from`,cast( from_utc_timestamp(x.`last_sale_time`, 'GMT') as Datetime) as `last_sale_time`,x.`bind_owner_gmv` as `bind_owner_gmv`,cast( from_utc_timestamp(x.`last_follow_number_time`, 'GMT') as Datetime) as `last_follow_number_time`,cast( from_utc_timestamp(x.`last_apply_number_time`, 'GMT') as Datetime) as `last_apply_number_time`,cast( from_utc_timestamp(x.`adjust_pool_time`, 'GMT') as Datetime) as `adjust_pool_time`,cast( from_utc_timestamp(x.`last_sixty_days_sale_time`, 'GMT') as Datetime) as `last_sixty_days_sale_time`,cast( from_utc_timestamp(x.`last_thirty_days_init_time`, 'GMT') as Datetime) as `last_thirty_days_init_time`,cast( from_utc_timestamp(x.`last_thirty_days_sale_time`, 'GMT') as Datetime) as `last_thirty_days_sale_time`,x.`resource_type` as `resource_type`,x.`follow_state_tag` as `follow_state_tag`,x.`contact_type` as `contact_type`,cast( from_utc_timestamp(x.`follow_state_time`, 'GMT') as Datetime) as `follow_state_time`,cast( from_utc_timestamp(x.`owner_state_time`, 'GMT') as Datetime) as `owner_state_time` from (
       SELECT from_json(cast(_data_columns_ as string), '`id` BIGINT,`creator` STRING,`modifier` STRING,`gmt_created` BIGINT,`gmt_modified` BIGINT,`institution_id` BIGINT,`talent_id` BIGINT,`talent_source` STRING,`home_page` STRING,`talent_name` STRING,`profile_photo` STRING,`source_account` STRING,`source_id` STRING,`buyin_id` STRING,`number_of_fans` BIGINT,`last_month_gmv` DECIMAL(38,18),`last_month_volume` STRING,`last_month_volume_num` BIGINT,`volume` STRING,`volume_num` BIGINT,`level` STRING,`real_institution` STRING,`gmv_range` STRING,`main_commodity_categories` STRING,`tier` STRING,`demand_direction` STRING,`comment` STRING,`ext_info` STRING,`owner_state` STRING,`owner_id` STRING,`claim_time` BIGINT,`mark_customer_time` BIGINT,`pool_id` BIGINT,`enter_pool_time` BIGINT,`customer_unassign_time` BIGINT,`follow_state` STRING,`owner_state_version` BIGINT,`talent_from` STRING,`last_sale_time` BIGINT,`bind_owner_gmv` DECIMAL(38,18),`last_follow_number_time` BIGINT,`last_apply_number_time` BIGINT,`adjust_pool_time` BIGINT,`last_sixty_days_sale_time` BIGINT,`last_thirty_days_init_time` BIGINT,`last_thirty_days_sale_time` BIGINT,`resource_type` STRING,`follow_state_tag` STRING,`contact_type` STRING,`follow_state_time` BIGINT,`owner_state_time` BIGINT') x,
              *
       FROM   sbc_***_odps_first_log 
       WHERE  year='2022' and month='10' and day in('13','12')
       AND (_data_columns_ is not null) AND _dest_table_name_="ods_saas_***_df") )

SELECT _sequence_id_, _after_image_, _after_image_,`id`, `creator`, `modifier`, `gmt_created`, `gmt_modified`, `institution_id`, `talent_id`, `talent_source`, `home_page`, `talent_name`, `profile_photo`, `source_account`, `source_id`, `buyin_id`, `number_of_fans`, `last_month_gmv`, `last_month_volume`, `last_month_volume_num`, `volume`, `volume_num`, `level`, `real_institution`, `gmv_range`, `main_commodity_categories`, `tier`, `demand_direction`, `comment`, `ext_info`, `owner_state`, `owner_id`, `claim_time`, `mark_customer_time`, `pool_id`, `enter_pool_time`, `customer_unassign_time`, `follow_state`, `owner_state_version`, `talent_from`, `last_sale_time`, `bind_owner_gmv`, `last_follow_number_time`, `last_apply_number_time`, `adjust_pool_time`, `last_sixty_days_sale_time`, `last_thirty_days_init_time`, `last_thirty_days_sale_time`, `resource_type`, `follow_state_tag`, `contact_type`, `follow_state_time`, `owner_state_time`
FROM 
  (  SELECT row_number() OVER(PARTITION BY t.`id`
                            ORDER BY _execute_time_ DESC) AS row_number,
          _sequence_id_,sbc_***_odps_first_log
          _operation_type_,
          _after_image_,
          _execute_time_,
          `id`, `creator`, `modifier`, `gmt_created`, `gmt_modified`, `institution_id`, `talent_id`, `talent_source`, `home_page`, `talent_name`, `profile_photo`, `source_account`, `source_id`, `buyin_id`, `number_of_fans`, `last_month_gmv`, `last_month_volume`, `last_month_volume_num`, `volume`, `volume_num`, `level`, `real_institution`, `gmv_range`, `main_commodity_categories`, `tier`, `demand_direction`, `comment`, `ext_info`, `owner_state`, `owner_id`, `claim_time`, `mark_customer_time`, `pool_id`, `enter_pool_time`, `customer_unassign_time`, `follow_state`, `owner_state_version`, `talent_from`, `last_sale_time`, `bind_owner_gmv`, `last_follow_number_time`, `last_apply_number_time`, `adjust_pool_time`, `last_sixty_days_sale_time`, `last_thirty_days_init_time`, `last_thirty_days_sale_time`, `resource_type`, `follow_state_tag`, `contact_type`, `follow_state_time`, `owner_state_time`
   FROM
     (SELECT delta._sequence_id_,
            delta._operation_type_,
            delta._after_image_,
            delta._execute_time_,
            delta.`id`, delta.`creator`, delta.`modifier`, delta.`gmt_created`, delta.`gmt_modified`, delta.`institution_id`, delta.`talent_id`, delta.`talent_source`, delta.`home_page`, delta.`talent_name`, delta.`profile_photo`, delta.`source_account`, delta.`source_id`, delta.`buyin_id`, delta.`number_of_fans`, delta.`last_month_gmv`, delta.`last_month_volume`, delta.`last_month_volume_num`, delta.`volume`, delta.`volume_num`, delta.`level`, delta.`real_institution`, delta.`gmv_range`, delta.`main_commodity_categories`, delta.`tier`, delta.`demand_direction`, delta.`comment`, delta.`ext_info`, delta.`owner_state`, delta.`owner_id`, delta.`claim_time`, delta.`mark_customer_time`, delta.`pool_id`, delta.`enter_pool_time`, delta.`customer_unassign_time`, delta.`follow_state`, delta.`owner_state_version`, delta.`talent_from`, delta.`last_sale_time`, delta.`bind_owner_gmv`, delta.`last_follow_number_time`, delta.`last_apply_number_time`, delta.`adjust_pool_time`, delta.`last_sixty_days_sale_time`, delta.`last_thirty_days_init_time`, delta.`last_thirty_days_sale_time`, delta.`resource_type`, delta.`follow_state_tag`, delta.`contact_type`, delta.`follow_state_time`, delta.`owner_state_time`
      FROM delta_new_data delta 
      where delta.`id` = '**********'

) t) gt

 
  

你可能感兴趣的:(大数据,阿里云,dataworks,实时同步)