mysql百万级数据删除方法对比

背景

        主键采用雪花ID, 最小ID 1630961122999455744, 最大ID 1631593704371969722

        测试数据量600w, 每次只删除5k条数据

方案一

        限制数据的主键范围,然后通过 LIMIT 控制每次删除的数据范围

        缺点: 当主键范围过大时,删除效率随着时间推移越来越慢

// java
public void range() {
    Long min = 1630961122999455744L;
    Long max = 1631593704371969722L;
    int limit = 5000;
    int num = limit;
    int total = 0;
    int counter = 1;
    long sleepTotal = 0;
    log.info("根据主键范围删除开始...");
    long s = System.currentTimeMillis();
    while (num == limit) {
        long startMill = System.currentTimeMillis();
        num = mapper.delByRange(min, max, limit);
        long endMill = System.currentTimeMillis();
        long sleep = RandomUtil.randomLong(100, 200);
        total = total + num;
        if (ThreadLocalRandom.current().nextInt(100) < 20) {
            log.info("【[随机抽样打印]主键游标范围删除数据】已循环{}次,当前删除{}条,已删除{}条,耗时:{},休眠:{}",
                     counter, num, total, (endMill - startMill), sleep);
        }
        Thread.sleep(sleep);
        counter++;
        sleepTotal = sleepTotal + sleep;
        }
    long e = System.currentTimeMillis();
    log.info("根据主键范围删除,循环删除次数:{},删除总数:{},休眠总耗时:{},总耗时:{}", 
             counter, total, sleepTotal, (e - s));
}
DELETE FROM `table` 
WHERE `id` BETWEEN 1630961122999455744 AND 1631593704371969722 
LIMIT 5000

方案二

        最小主键每次加5k, 即1630961122999455744+5000, 然后每次都加+5000去循环删除

        缺点: 当主键不连续时,会存在大量空删的情况,空删过多时,如果不进行休眠会由于频繁请求mysql空删,导致mysql的cpu飙升, 进行休眠时又会由于空删次数太多,导致整体休眠时间过长,进而降低删除效率        

mysql百万级数据删除方法对比_第1张图片

// java
public void add() {
    int limit = 5000;
    Long min = 1630961122999455744L;
    Long max = min + limit;
    int num = 1;
    int total = 0;
    int counter = 1;
    int emptyCounter = 0;
    long sleepTotal = 0;
    long emptySleepTotal = 0;
    log.info("从最小主键开始,每次+5k删除开始...");
    long s = System.currentTimeMillis();
    while (num > 0) {
    		long startMill = System.currentTimeMillis();
        num = mapper.delByIncreament(routing, min, max, limit);
        long endMill = System.currentTimeMillis();
        long sleep = num == 0 ? RandomUtil.randomLong(10, 20) : RandomUtil.randomLong(100, 200);
        if (num == 0) {
        		emptyCounter++;
            emptySleepTotal = emptySleepTotal + sleep;
       	}
        total = total + num;
        if (ThreadLocalRandom.current().nextInt(100) < 20) {
        		log.info("【[随机抽样打印]主键游标每次加5k删除数据】当前删除{}条,已删除{}条,耗时:{},休眠:{}",
                        num, total, (endMill - startMill), sleep);
        }
        if (max.compareTo(1631593704371969722L) < 0) {
        		num = 1;
        }
        Thread.sleep(sleep);
        counter++;
        min = max + 1L;
        max = min + 5000L;
        if (max.compareTo(1631593704371969722L) > 0) {
        		max = 1631593704371969722L;
        }
        sleepTotal = sleepTotal + sleep;
   	}
    long e = System.currentTimeMillis();
    log.info("从最小主键开始,每次+5k删除,循环删除次数:{},空删次数:{},删除总数:{},休眠总耗时:{},空删休眠总耗时:{},总耗时:{}",
             counter, emptyCounter, total, sleepTotal, emptySleepTotal, (e - s));
}

方案三

        从最小主键开始,每次拿往后第5000条和第5001条,第5001条用于下次偏移删除,从5001条继续往后找5000条和5001条,如此往复直至删除所有数据

public void offset() {
		int limit = 5000;
    int num = limit;
    Long minId = 1630961122999455744L;
    Long maxId = 1631593704371969722L;
    int total = 0;
    int counter = 1;
    long sleepTotal = 0;
    long delTotal = 0;
    long scanTotal = 0;
    log.info("根据主键循环偏移删除开始...");
    Long startId = minId;
    List ids = mapper.scanId(startId, limit);
    Long endId = ids.size() == 0 ? maxId : ids.get(0);
    Long nextId = ids.size() > 1 ? ids.get(1) : null;
    long s = System.currentTimeMillis();
    while (num == limit) {
    		long startMill = System.currentTimeMillis();
        num = mapper.delByScan(startId, endId, limit);
        long endMill = System.currentTimeMillis();
        delTotal = delTotal + (endMill - startMill);
        total = total + num;
        if (null == nextId) {
        		break;
        } else {
        		startId = nextId;
        		long startMill2 = System.currentTimeMillis();
        		ids = mapper.scanId(startId, limit);
        		long endMill2 = System.currentTimeMillis();
        		scanTotal = scanTotal + (endMill2 - startMill2);
        		endId = ids.size() == 0 ? maxId : ids.get(0);
        		nextId = ids.size() > 1 ? ids.get(1) : null;
        }
        long sleep = RandomUtil.randomLong(100, 200);
        if (ThreadLocalRandom.current().nextInt(100) < 20) {
        		log.info("【[随机抽样打印]根据主键循环偏移删除数据】已循环{}次,当前删除{}条,已删除{}条,耗时:{},休眠:{}",
                        counter, num, total, (endMill - startMill), sleep);
        }
        Thread.sleep(sleep);
        counter++;
        sleepTotal = sleepTotal + sleep;
    }
    long e = System.currentTimeMillis();
    log.info("根据主键循环偏移删除,循环删除次数:{},删除总数:{},休眠总耗时:{},删除总耗时:{},偏移获取总耗时:{},总耗时:{}",
             counter, total, sleepTotal, delTotal, scanTotal, (e - s));
}
SELECT `id`
FROM `table`
WHERE `id` > ${minId}
ORDER BY `id`
LIMIT ${limit - 2}, 2

耗时比对

方案

耗时

方案一

38分钟

方案二

由于雪花ID主键的自增不连续,20分钟删不到2w行

方案三

5分钟(休眠时间100至200)

  1. 将休眠时间调整至[30,50]和[80,100]毫秒,3分钟就完成600w数据的删除,cpu会有轻微飙升至30%左右,但由于休眠时间较短,导致数据删除不完整,连续2次都删除到580w左右,少删除了10w+

    mysql百万级数据删除方法对比_第2张图片

  2. 将休眠时间调整到删除耗时+查询偏移id耗时的总和,连续测试3次,均能完整删除600w数据 

你可能感兴趣的:(mysql)