Structured Streaming checkpoint

checkpoint主要是面向kafkaStreamSource、rateStreamSource

/commits
/metadata
/offsets
/sources
/state

1、commitlog内容如下:
/commits/262
/commits/263

hadoop fs -cat /commits/164
v1
{}
注:version with prefix v

2、offsetSeq

/offsets/263
/offsets/264

hadoop fs -cat /offsets/265
v1
{"batchWatermarkMs":0,"batchTimestampMs":1533975004482,"conf":{"spark.sql.shuffle.partitions":"200","spark.sql.streaming.stateStore.providerClass":"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider"}}
{"CBSS_ODS_GS":{"137":1124398818,"92":1124727398,"101":1124450643,"83":1124433278,"110":1124456872,"128":1124219743,"119":1124432547,"104":1124628336,"23":1124311329,"95":1124525375,"131":1124469343,"122":1124535024,"77":1124448871,"86":1124485192,"50":1124508471,"59":1124684544,"113":1124646095,"41":1124064203,"32":1124519454,"68":1124446354,"53":1124557152,"62":1124382089,"134":1124559698,"35":1124507617,"44":1124407028,"8":1124433001,"17":1124476420,"125":1124396390,"26":1124443966,"80":1124567565,"89":1124658207,"116":1124614654,"98":1124475544,"71":1124473950,"107":1124542529,"11":1124438204,"74":1124556643,"56":1124538763,"38":1124504126,"29":1124694681,"47":1124603677,"20":1124192614,"2":1124588467,"65":1124444154,"5":1124294139,"14":1124448849,"124":1124375546,"133":1124404400,"106":1124446976,"115":1124223326,"46":1124558944,"127":1124129305,"118":1124312050,"136":1124321917,"100":1124699504,"109":1124627319,"82":1124442521,"91":1124555678,"55":1124335963,"64":1124373113,"73":1124589828,"58":1124459124,"67":1124482459,"85":1124359891,"94":1124393984,"139":1124434665,"40":1124473917,"49":1124267140,"130":1124675189,"4":1124393400,"13":1124416361,"121":1124308740,"22":1124465366,"103":1124399814,"31":1124497397,"76":1124663123,"112":1124321464,"16":1124506270,"97":1124378643,"7":1124546058,"79":1124127427,"88":1124465497,"70":1124683417,"43":1124335991,"52":1124426670,"25":1123996945,"34":1124445672,"61":1124724327,"10":1124662549,"37":1124557737,"1":1124492770,"28":1124529006,"19":1124554560,"129":1124748745,"138":1124809837,"120":1124577963,"60":1124458993,"87":1124570910,"96":1124326628,"132":1124298235,"105":1124477087,"123":1124601206,"114":1124572178,"69":1124593835,"78":1124544523,"99":1124359214,"63":1124340260,"90":1124554065,"45":1124736498,"54":1124353713,"72":1124277878,"81":1124393509,"126":1124593551,"27":1124665644,"135":1124587297,"108":1124331277,"36":1124681226,"117":1124574196,"9":1124244797,"18":1124433233,"48":1124776095,"21":1124395649,"57":1124623562,"12":1124528086,"3":1124515899,"84":1124481011,"102":1124598175,"93":1124697719,"75":1124569607,"30":1124450004,"39":1124477532,"111":1124542936,"66":1124505569,"15":1124454816,"42":1124248608,"51":1124284882,"33":1124404806,"24":1124475933,"6":1124439966,"0":1124402328}}

batchWatermarkMs:当前event time watermark 主要是绑定迟到数据
batchTimestampMs:当前批次处理时间戳,(感觉不象unix时间,具体有待确认)
"spark.sql.shuffle.partitions":"200" 分区数
"CBSS_ODS_GS":{"137":1124398818:topic处理的分区最新偏移量

3、stateStore
存储跨批次聚合函数运行状态

/state/0/199/261.snapshot
--快照数据,有专门的线程定期负责清理,可以调整参数配置保留时间
/state/0/199/262.delta
--快照数据,有专门的线程定期负责清理,可以调整参数配置保留时间

4、metaData
/metadata

hadoop fs -cat /metadata
{"id":"b4ead721-abef-4021-b556-1d6874ea19e3"}

你可能感兴趣的:(Structured Streaming checkpoint)