记一次空间告警与pg_rman keep-data-days参数研究

一、 背景

       收到一个磁盘空间告警,检查发现是本地备份保留比较多导致的,处理过程倒很简单,手动清理掉旧的备份(已自动备到远端服务器),告警就恢复了。

       但是检查备份脚本的时候,发现keep-data-days参数明明只设置了1,为什么本地会出现3份备份(保留了3天的备份)?

pg_rman backup -b ${BACKUP_TYPE}  -s -C -Z -P --keep-data-days=1 --keep-arclog-files=…(非完整命令)

查了下官方文档的解释…说了好像没说一样

pg_rman

二、 源码学习

1. 奇怪的第3份备份文件

       检查了下其他设置keep-data-days=1的服务器,发现都只有最近2天的备份文件,而之前在处理告警时,备份正在执行中。因此可以推测,pg_rman是在备份完成后才清理掉过期的备份文件。因此在备份期间会有3天的文件,而备完后就只有2

       要验证这个猜测,可以简单地再执行下备份,也可以从pg_rman备份源码分析。

       以下在backup.c文件的do_backup函数,可以看到pgBackupDelete函数调用是在各种备份完成之后,符合前面的结论。

int
do_backup(pgBackupOption bkupopt)
{
    parray *backup_list;
    parray *files_database;
    parray *files_arclog;
    parray *files_srvlog;
    int    ret;
    char   path[MAXPGPATH];

    /* repack the necesary options */
    int keep_arclog_files = bkupopt.keep_arclog_files;
    int keep_arclog_days  = bkupopt.keep_arclog_days;
    int keep_srvlog_files = bkupopt.keep_srvlog_files;
    int keep_srvlog_days  = bkupopt.keep_srvlog_days;
    int keep_data_generations = bkupopt.keep_data_generations;
    int keep_data_days        = bkupopt.keep_data_days;

…
    /*
     * Signal for backup_cleanup() that there may actually be some cleanup
     * for it to do from this point on.
     */
    in_backup = true;

    /* backup data */
    files_database = do_backup_database(backup_list, bkupopt);

    /* backup archived WAL */
    files_arclog = do_backup_arclog(backup_list);

    /* backup serverlog */
    files_srvlog = do_backup_srvlog(backup_list);

    pgut_atexit_pop(backup_cleanup, NULL);

    /* update backup status to DONE */
    current.end_time = time(NULL);
    current.status = BACKUP_STATUS_DONE;
   …

    /* Delete old backup files after all backup operation. */
    pgBackupDelete(keep_data_generations, keep_data_days);
    …
    return 0;
}

2. keep-data-days的含义

       3份备份的问题解决了,还剩下一个,为什么设置keep-data-days=1会保留2天的备份文件而不是1天?以下在delete.c文件的pgBackupDelete函数

/*
 * Delete backups that are older than KEEP_xxx_DAYS, or have more generations
 * than KEEP_xxx_GENERATIONS.
 */
void
pgBackupDelete(int keep_generations, int keep_days)
{
    int     i;
    parray *backup_list;
    int     existed_generations;
    bool    check_generations;
 …
    /* determine whether to check based on the given days */
    if (keep_days == KEEP_INFINITE)
    {
        check_days = false;
        strncpy(days_str, "INFINITE", lengthof(days_str));
    }
    else
    {
        check_days = true;
        snprintf(days_str, lengthof(days_str),
                "%d", keep_days);
        /*
         * Calculate the threshold day from given keep_days.
         * Any backup taken before this threshold day to be
         * a candidate for deletion.
         */
        tim = current.start_time - (keep_days * 60 * 60 * 24);
        ltm = localtime(&tim);
        ltm->tm_hour = 0;
        ltm->tm_min  = 0;
        ltm->tm_sec  = 0;
        keep_after = mktime(ltm);
        time2iso(keep_after_timestamp, lengthof(keep_after_timestamp),
                    keep_after);
    }
…
}

       可以看到最重要的一行注释:Calculate the threshold day from given keep_days. Any backup taken before this threshold day to be a candidate for deletion.

       而所谓的threshold day是怎么算的 —— tim = current.start_time - (keep_days * 60 * 60 * 24);

       以20230809为例,当keep-data-days=1,则threshold day为当前时间减1,即20230808。而在阈值日期之前的备份才是过期的,因此20230808不属于,自然也就不会被删除。而20230807就属于过期的文件,因此在备份完成后,它会被删除。

3. 如何只保留当天的备份

      有了上面的分析,其实就很简单了,就是设置keep-data-days=0。threshold day为当前时间减0,即20230809,因此当天之前的备份都是过期的,备份完成后也就会删除20230808的文件。简单测试一把:

pg_rman backup -b ${BACKUP_TYPE}  -s -C -Z -P --keep-data-days=0 --keep-arclog-files=…(非完整命令)

​​​​​​​

符合预期~

你可能感兴趣的:(PostgreSQL,源码学习,备份还原,postgresql,pg_rman,保留,keep-data-days,backup)