Kafka删除数据有两种方式,一种是按照时间,超过一段时间后删除过期消息,第二种是按照消息大小删除数据的,消息数量超过一定大小后删除最旧的数据
但是Kafka的数据是存储在文件系统内的,随机删除数据是不可能的,那么,Kafka是如何删除数据的呢?
Kafka删除数据主逻辑
对应配置: log.cleanup.interval.mins
当前使用值:1
file: core/src/main/scala/kafka/log/LogManager.scala
line: 271
/** * Delete any eligible logs. Return the number of segments deleted. */ def cleanupLogs() { debug("Beginning log cleanup...") var total = 0 val startMs = time.milliseconds for(log <- allLogs) { debug("Garbage collecting '" + log.name + "'") total += cleanupExpiredSegments(log) + cleanupSegmentsToMaintainSize(log) } debug("Log cleanup completed. " + total + " files deleted in " + (time.milliseconds - startMs) / 1000 + " seconds") } |
Kafka 每隔 log.cleanup.interval.mins 分钟调用一次 cleanupLogs ,该函数对所有 Logs 执行清理操作,(目前不确定 Logs 对应的是 Topic 还是 Partition,目测应当是 Partition)
清理超时数据 (必选策略)
对应配置:log.retention.hours
当前使用值: 72 (3天)
file: core/src/main/scala/kafka/log/LogManager.scala
line: 237
/** * Runs through the log removing segments older than a certain age */ private def cleanupExpiredSegments(log: Log): Int = { val startMs = time.milliseconds val topic = parseTopicPartitionName(log.name).topic val logCleanupThresholdMs = logRetentionMsMap.get(topic).getOrElse(this.logCleanupDefaultAgeMs) val toBeDeleted = log.markDeletedWhile(startMs - _.messageSet.file.lastModified > logCleanupThresholdMs) val total = log.deleteSegments(toBeDeleted) total } |
清理超大小数据 (可选策略)
对应配置:log.retention.bytes
当前使用值: -1 (默认值,即不采用该策略)
file: core/src/main/scala/kafka/log/LogManager.scala
line: 250
/** * Runs through the log removing segments until the size of the log * is at least logRetentionSize bytes in size */ private def cleanupSegmentsToMaintainSize(log: Log): Int = { val topic = parseTopicPartitionName(log.dir.getName).topic val maxLogRetentionSize = logRetentionSizeMap.get(topic).getOrElse(config.logRetentionBytes) if(maxLogRetentionSize < 0 || log.size < maxLogRetentionSize) return 0 var diff = log.size - maxLogRetentionSize def shouldDelete(segment: LogSegment) = { if(diff - segment.size >= 0) { diff -= segment.size true } else { false } } val toBeDeleted = log.markDeletedWhile( shouldDelete ) val total = log.deleteSegments(toBeDeleted) total } |
按照 Segment 删除的影响
对超时规则的影响
每个 Segment 文件实际会按照最后一条日志的时间进行删除。当日志中的最后一条日志没有超时时,该文件不会被删除。
对超过大小规则的影响
删除该Segment之后,数据仍然超过大小,才会删除该Segment。如果删除该Segment后,数据大小小于设定上限,则不删除该Segment。
Segment相关配置
log.segment.bytes
log.roll.hours