网址:https://github.com/facebook/rocksdb/wiki/How-to-backup-RocksDB
(有道)
Backup API
BackupEngine is an object managing a directory of backed-up DBs, with functions to create, restore, delete, and inspect backup DBs. The on-disk format is specific to BackupEngine so that (a) files can be shared between backups even without hard links, and (b) extra metadata can be associated with backups, for data integrity and other purposes. The C++ API is in include/rocksdb/utilities/backupable_db.h (unfortunate legacy name).
BackupEngine是一个用于管理backed-up目录的类,带有功能函数create, restore, delete和inspect内容。
磁盘上的格式是特定于BackupEngine的,因此(a)文件可以在备份之间共享,即使没有硬链接,(b)额外的元数据可以与备份相关联,以实现数据完整性和其他目的。c++ API在include/rocksdb/utilities/backupable_db.h(不幸的遗留名称)中。
A key feature of BackupEngine is taking backups from one FileSystem abstraction to another, such as from a local filesystem to a remote filesystem that is not provided through an OS abstraction. Writes are bound by a configurable RateLimiter and can use many threads. Checksums are computed for added data integrity, with the best data integrity provided when the DB uses whole file checksums with file_checksum_gen_factory.
BackupEngine的一个关键特性是将备份从一个文件系统抽象备份到另一个文件系统抽象,例如从本地文件系统备份到远程文件系统,而远程文件系统不是通过操作系统抽象提供的。写操作被一个可配置的RateLimiter绑定,可以使用多个线程。校验和的计算是为了增加数据完整性,当DB使用file_checksum_gen_factory的整个文件校验和时,提供了最好的数据完整性。
BackupEngine is thread-safe (though hasn't always been so), though complex safety rules apply if accessing a backup directory from more than one BackupEngine object simultaneously. (See the API documentation.) BackupEngineReadOnly provides a safer abstraction when writing is not intended.
BackupEngine是线程安全的(尽管并非总是如此),但如果同时从多个BackupEngine对象访问备份目录,则会应用复杂的安全规则。(请参阅API文档。)当不打算写时,BackupEngineReadOnly提供了一个更安全的抽象。
Creating and verifying a backup
In RocksDB, we have implemented an easy way to backup your DB and verify correctness. Here is a simple example:
在RocksDB中,我们实现了一种简单的方法来备份你的数据库并验证其正确性。下面是一个简单的例子
#include "rocksdb/db.h"
#include "rocksdb/utilities/backupable_db.h"
#include
using namespace rocksdb;
int main() {
Options options;
options.create_if_missing = true;
DB* db;
Status s = DB::Open(options, "/tmp/rocksdb", &db);
assert(s.ok());
db->Put(...); // do your thing
BackupEngine* backup_engine;
s = BackupEngine::Open(Env::Default(), BackupableDBOptions("/tmp/rocksdb_backup"), &backup_engine);
assert(s.ok());
s = backup_engine->CreateNewBackup(db);
assert(s.ok());
db->Put(...); // make some more changes
s = backup_engine->CreateNewBackup(db);
assert(s.ok());
std::vector backup_info;
backup_engine->GetBackupInfo(&backup_info);
// you can get IDs from backup_info if there are more than two
s = backup_engine->VerifyBackup(1 /* ID */);
assert(s.ok());
s = backup_engine->VerifyBackup(2 /* ID */);
assert(s.ok());
delete db;
delete backup_engine;
}
This simple example will create a couple backups in "/tmp/rocksdb_backup". Note that you can create and verify multiple backups using the same engine.
这个简单的例子将在“/tmp/rocksdb_backup”中创建两个备份。注意,您可以使用同一个引擎创建和验证多个备份。
Backups are normally incremental (see BackupableDBOptions::share_table_files). You can create a new backup with BackupEngine::CreateNewBackup() and only the new data will be copied to backup directory (for more details on what gets copied, see Under the hood).
备份通常是增量的(参见BackupableDBOptions::share_table_files)。您可以使用BackupEngine::CreateNewBackup()创建一个新的备份,并且只有新的数据将被复制到备份目录(关于复制什么内容的更多细节,请参见Under the hood)。
Once you have some backups saved, you can issue BackupEngine::GetBackupInfo() call to get a list of all backups together with information on timestamp and logical size of each backup. File details are optionally returned, from which sharing details can be determined. GetBackupInfo() even provides a way to open a backup in-place as a read-only DB, which could be useful for inspecting the exact state, etc. Backups are identified by simple increasing integer IDs, which can be saved in an output parameter when creating a new backup or taken from GetBackupInfo().
一旦您保存了一些备份,您可以发出BackupEngine::GetBackupInfo()调用来获得所有备份的列表,以及关于每个备份的时间戳和逻辑大小的信息。文件详细信息是可选的,可以从中确定共享详细信息。GetBackupInfo()甚至提供了一种以只读DB的形式打开本地备份的方法,这对于检查确切的状态等可能很有用。备份由简单的递增整数id标识,可以在创建新备份时保存在输出参数中,也可以从GetBackupInfo()中获取。
When BackupEngine::VerifyBackup() is called, it checks the file sizes in the backup directory against the expected sizes recorded from the original DB directory. Checksum verification is optional but requires reading all the data. In either case, the purpose is to check for some sort of quiet failure during backup creation or accidental corruption afterward. BackupEngine::Open() essentially does a VerifyBackup(), without checksums, on each backup to determine whether to classify it as corrupt.
当调用BackupEngine::VerifyBackup()时,它将检查备份目录中的文件大小与从原始DB目录记录的预期大小。校验和验证是可选的,但需要读取所有数据。在这两种情况下,目的都是检查备份创建过程中是否出现了某种悄无声息的故障,或者在备份创建之后是否出现了意外损坏。BackupEngine::Open()本质上是对每个备份执行VerifyBackup(),而不进行校验和,以确定是否将其分类为损坏。
Restoring a backup
Restoring is also easy:
#include "rocksdb/db.h"
#include "rocksdb/utilities/backupable_db.h"
using namespace rocksdb;
int main() {
BackupEngineReadOnly* backup_engine;
Status s = BackupEngineReadOnly::Open(Env::Default(), BackupableDBOptions("/tmp/rocksdb_backup"), &backup_engine);
assert(s.ok());
s = backup_engine->RestoreDBFromBackup(1, "/tmp/rocksdb", "/tmp/rocksdb");
assert(s.ok());
delete backup_engine;
}
This code will restore the first backup back to "/tmp/rocksdb". The first parameter of BackupEngineReadOnly::RestoreDBFromBackup() is the backup ID, second is target DB directory, and third is the target location of log files (in some DBs they are different from DB directory, but usually they are the same. See Options::wal_dir for more info). BackupEngineReadOnly::RestoreDBFromLatestBackup() will restore the DB from the latest backup, i.e., the one with the highest ID.
这段代码将把第一个备份恢复到“/tmp/rocksdb”。BackupEngineReadOnly::RestoreDBFromBackup()的第一个参数是备份ID,第二个是目标DB目录,第三个是日志文件的目标位置(在某些数据库中,它们与DB目录不同,但通常是相同的)。更多信息请参见Options::wal_dir)。BackupEngineReadOnly::RestoreDBFromLatestBackup()将从最新的备份中恢复数据库,即ID最高的那个。
Checksum is calculated for any restored file and compared against the one stored during the backup time. If a checksum mismatch is detected, the restore process is aborted and Status::Corruption is returned.
校验和将为任何恢复的文件计算,并与备份期间存储的文件进行比较。如果检测到校验和不匹配,则会中止恢复进程,并返回Status::Corruption。
Backup directory structure
/tmp/rocksdb_backup/
├── meta
│ └── 1
├── private
│ └── 1
│ ├── CURRENT
│ ├── MANIFEST-000008
| └── OPTIONS-000009
└── shared_checksum
└── 000007_1498774076_590.sst
meta directory contains a "meta-file" describing each backup, where its name is the backup ID. For example, a meta-file contains a listing of all files belonging to that backup. The format is described fully in the implementation file (utilities/backupable/backupable_db.cc).
meta目录包含一个描述每个备份的“元文件”,其中它的名称是备份ID。例如,元文件包含属于该备份的所有文件的列表。该格式在实现文件(utilities/backupable/backupable_db.cc)中有完整的描述。
private directory always contains non-SST/blob files (options, current, manifest, and WALs). In case Options::share_table_files is unset, it also contains the SST/blob files.
私有目录总是包含非sst /blob文件(选项、当前、清单和wal)。在case Options::share_table_files未设置的情况下,它还包含SST/blob文件。
shared_checksum directory contains SST/blob files when both Options::share_table_files and Options::share_files_with_checksum are set. In this directory, files are named using their name in the original database, size, and checksum. These attributes uniquely identify files that can come from multiple RocksDB instances.
当“Options::share_table_files”和“Options::share_files_with_checksum”同时设置时,“shared_checksum”目录下包含SST/blob文件。在这个目录中,使用原始数据库中的名称、大小和校验和来命名文件。这些属性可以唯一标识来自多个RocksDB实例的文件。
Deprecated: shared directory (not shown) contains SST files when Options::share_table_files is set and Options::share_files_with_checksum is false. In this directory, files are named using only by their name in the original DB. In the presence of restoring from a backup other than latest, this could lead to corruption even when backing up only a single DB.
已弃用:当设置Options::share_table_files且Options::share_files_with_checksum为false时,共享目录(未显示)包含SST文件。在此目录中,文件仅使用原始DB中的名称命名。在从非最新备份进行恢复的情况下,即使只备份单个DB,也可能导致损坏。
Backup performance
Beware that backup engine's Open() takes time proportional to the number of existing backups since we initialize info about files in each existing backup. So if you target a remote file system (like HDFS), and you have a lot of backups, then initializing the backup engine can take some time due to all the network round-trips. We recommend to keep your backup engine alive and not to recreate it every time you need to do a backup or restore.
请注意,备份引擎的Open()所花费的时间与现有备份的数量成正比,因为我们在每个现有备份中初始化了有关文件的信息。因此,如果您的目标是一个远程文件系统(如HDFS),并且您有很多备份,那么由于所有的网络往返,初始化备份引擎可能会花费一些时间。我们建议您保持备份引擎处于活动状态,不要在每次需要执行备份或恢复时重新创建它。
Another way to keep engine initialization fast is to remove unnecessary backups. To delete unnecessary backups, just call PurgeOldBackups(N), where N is how many backups you'd like to keep. All backups except the N newest ones will be deleted. You can also choose to delete arbitrary backup with call DeleteBackup(id).
保持引擎初始化速度的另一种方法是删除不必要的备份。要删除不必要的备份,只需调用PurgeOldBackups(N),其中N是您希望保留的备份数量。除最新的N个备份外,所有备份将被删除。您也可以选择通过调用DeleteBackup(id)来删除任意备份。
Also beware that performance is decided by reading from local db and copying to backup. Since you may use different environments for reading and copying, the parallelism bottleneck can be on one of the two sides. For example, using more threads for backup (See Advanced usage) won't be helpful if local db is on HDD, because the bottleneck in this condition is disk reading capability, which is saturated. Also a poor small HDFS cluster cannot show good parallelism. It'll be beneficial if local db is on SSD and backup target is a high-capacity HDFS. In our benchmarks, using 16 threads will reduce the backup time to 1/3 of single-thread job.
还要注意,性能是由读取本地数据库并复制到备份决定的。由于您可能使用不同的环境来读取和复制,并行性瓶颈可能在这两个方面。例如,如果本地数据库在HDD上,使用更多的线程进行备份(参见高级用法)将没有帮助,因为在这种情况下的瓶颈是磁盘读取能力,它已经饱和。此外,一个糟糕的小型HDFS集群无法显示良好的并行性。如果本地数据库在SSD上,备份目标是一个大容量HDFS,这将是有益的。在我们的基准测试中,使用16个线程将把备份时间减少到单线程作业的1/3。
Under the hood
Creating backups is built on checkpoint. When you call BackupEngine::CreateNewBackup(), it does the following:
创建备份建立在检查点之上。当你调用BackupEngine::CreateNewBackup()时,它会做以下事情:
- Disable file deletions
禁用文件删除 - Get live files (this includes table files, current, options and manifest file).
获取实时文件(包括表文件、当前文件、选项和清单文件)。 - Copy live files to the backup directory. Since table files are immutable and filenames unique, we don't copy a table file that is already present in the backup directory. Since version 6.12, we essentially have unique identifiers for SST files, using file numbers and DB session IDs in the SST file properties. Options, manifest and current files are always copied to the private directory, since they are not immutable.
将活动文件复制到备份目录。由于表文件是不可变的且文件名是唯一的,所以我们不会复制备份目录中已经存在的表文件。从版本6.12开始,我们基本上拥有了SST文件的唯一标识符,在SST文件属性中使用文件号和DB会话id。选项、清单和当前文件总是复制到私有目录,因为它们不是不可变的。 - If flush_before_backup was set to false, we also need to copy log files to the backup directory. We call GetSortedWalFiles() and copy all live files to the backup directory.
如果flush_before_backup设置为false,我们还需要将日志文件复制到备份目录。我们调用GetSortedWalFiles()并将所有活动文件复制到备份目录。 - Re-enable file deletions
启用删除
Advanced usage
We can store user-defined metadata in the backups. Pass your metadata to BackupEngine::CreateNewBackupWithMetadata() and then read it back later using BackupEngine::GetBackupInfo(). For example, this can be used to identify backups using different identifiers from our auto-incrementing IDs.
我们可以在备份中存储用户定义的元数据。将元数据传递给BackupEngine::CreateNewBackupWithMetadata(),然后稍后使用BackupEngine::GetBackupInfo()读取它。例如,这可以用于使用来自自动递增id的不同标识符来标识备份。
We also backup and restore the options file now. After restore, you can load the options from db directory using rocksdb::LoadLatestOptions() or rocksdb:: LoadOptionsFromFile(). The limitation is that not everything in options object can be transformed to text in a file. You still need a few steps to manually set up missing items in options after restore and load. Good news is that you need much less than previously.
我们还备份和恢复的选项文件现在。恢复后,你可以使用rocksdb::LoadLatestOptions()或rocksdb:: LoadOptionsFromFile()从db目录加载选项。限制是选项对象中的所有内容都不能转换为文件中的文本。在恢复和加载之后,您仍然需要几个步骤来手动设置选项中缺少的项目。好消息是你需要的比以前少得多。
You need to instantiate some env and initialize BackupableDBOptions::backup_env for backup_target. Put your backup root directory in BackupableDBOptions::backup_dir. Under the directory the files will be organized in the structure mentioned above.
你需要实例化一些env并为backup_target初始化BackupableDBOptions::backup_env。将你的备份根目录放在BackupableDBOptions::backup_dir中。在目录下,文件将按照上面提到的结构组织起来。
BackupableDBOptions::max_background_operations controls the number of threads used for copying files during backup and restore. For distributed file systems like HDFS, it can be very beneficial to increase the copy parallelism.
BackupableDBOptions::max_background_operations控制在备份和恢复期间用于复制文件的线程数。对于像HDFS这样的分布式文件系统,增加复制并行度是非常有益的。
BackupableDBOptions::info_log is a Logger object that is used to print out LOG messages if not-nullptr. See Logger wiki.
BackupableDBOptions::info_log是一个Logger对象,如果不是nullptr,用于输出LOG消息。看到日志记录器wiki。
If BackupableDBOptions::sync is true, we will use fsync(2) to sync file data and metadata to disk after every file write, guaranteeing that backups will be consistent after a reboot or if machine crashes. Setting it to false will speed things up a bit, but some (newer) backups might be inconsistent. In most cases, everything should be fine, though.
如果BackupableDBOptions::sync为true,我们将使用fsync(2)在每次写入文件后将文件数据和元数据同步到磁盘,以保证在重新启动或机器崩溃后,备份将保持一致。将其设置为false会加快一些速度,但是一些(较新的)备份可能不一致。不过,在大多数情况下,一切都应该很好。
If you set BackupableDBOptions::destroy_old_data to true, creating new BackupEngine will delete all the old backups in the backup directory.
如果将BackupableDBOptions::destroy_old_data设置为true,则创建新的BackupEngine将删除备份目录中的所有旧备份。
BackupEngine::CreateNewBackup() method takes a parameter flush_before_backup, which is false by default. When flush_before_backup is true, BackupEngine will first issue a memtable flush and only then copy the DB files to the backup directory. Doing so will prevent log files from being copied to the backup directory (since flush will delete them). If flush_before_backup is false, backup will not issue flush before starting the backup. In that case, the backup will also include log files corresponding to live memtables. Backup will be consistent with current state of the database regardless of flush_before_backup parameter.
BackupEngine::CreateNewBackup()方法的参数为flush_before_backup,默认为false。当flush_before_backup为true时,BackupEngine将首先发出memtable flush,然后将DB文件复制到备份目录。这样做将防止日志文件被复制到备份目录(因为flush将删除它们)。如果flush_before_backup为false,则备份在启动备份之前不会发出flush。在这种情况下,备份还将包含与活动memtables相对应的日志文件。无论参数是否为flush_before_backup,备份都将与数据库的当前状态一致。
Further reading
For the implementation, see utilities/backupable/backupable_db.cc.