MANIFEST
存储引擎状态的元数据持久化的文件
CURRENT:指向最新的MANIFEST文件
MANIFEST-
在RocksDB中任意时间存储引擎的状态都会保存为一个Version(也就是SST的集合),而每次对Version的修改都是一个VersionEdit,而最终这些VersionEdit就是 组成manifest-log文件的内容.
下面就是MANIFEST的log文件的基本构成:
version-edit = Any RocksDB state change
version = { version-edit* }
manifest-log-file = { version, version-edit* }
= { version-edit* }
VersionSet实现
VersionSet::LogAndApply
->VersionSet::ProcessManifestWrites
descriptor_log_ 表示了当前manifest-log文件的写入句柄
manifest_writers__表示需要写入到manifest-log文件中的writes的数组.
ManifestWriter的结构,包含了VersionEdit的数组,这个数组就是即将要写入到manifest文件中version_edit.
// this is used to batch writes to the manifest file
struct VersionSet::ManifestWriter {
Status status;
bool done;
InstrumentedCondVar cv;
ColumnFamilyData* cfd;
const autovector& edit_list;
explicit ManifestWriter(InstrumentedMutex* mu, ColumnFamilyData* _cfd,
const autovector& e)
: done(false), cv(mu), cfd(_cfd), edit_list(e) {}
};
VersionSet更改(LogAndApply)
创建新的MANIFEST
//如果manifest的大小触发阈值则创建新的manifest
...
assert(pending_manifest_file_number_ == 0);
//如果manifest的大小触发阈值则创建新的manifest
if (!descriptor_log_ ||
manifest_file_size_ > db_options_->max_manifest_file_size) {
TEST_SYNC_POINT("VersionSet::ProcessManifestWrites:BeforeNewManifest");
pending_manifest_file_number_ = NewFileNumber();
batch_edits.back()->SetNextFile(next_file_number_.load());
new_descriptor_log = true;
} else {
pending_manifest_file_number_ = manifest_file_number_;
}
// This is fine because everything inside of this block is serialized --
// only one thread can be here at the same time
if (new_descriptor_log) {
// create new manifest file
ROCKS_LOG_INFO(db_options_->info_log, "Creating manifest %" PRIu64 "\n",
pending_manifest_file_number_);
std::string descriptor_fname =
DescriptorFileName(dbname_, pending_manifest_file_number_);
std::unique_ptr descriptor_file;
s = NewWritableFile(env_, descriptor_fname, &descriptor_file,
opt_env_opts);
if (s.ok()) {
descriptor_file->SetPreallocationBlockSize(
db_options_->manifest_preallocation_size);
std::unique_ptr file_writer(new WritableFileWriter(
std::move(descriptor_file), descriptor_fname, opt_env_opts, env_,
nullptr, db_options_->listeners));
descriptor_log_.reset(
new log::Writer(std::move(file_writer), 0, false));
//创建完新的file writer后就将现在的状态写入Manifest
//依次写入db状态,cf信息,数据文件信息,lognumber等
s = WriteCurrentStateToManifest(descriptor_log_.get());
}
}
如果新创建了manifest,将其写入写的CURRENT 文件中(通过rename保证原子性)
// If we just created a new descriptor file, install it by writing a
// new CURRENT file that points to it.
if (s.ok() && new_descriptor_log) {
s = SetCurrentFile(env_, dbname_, pending_manifest_file_number_,
db_directory);
TEST_SYNC_POINT("VersionSet::ProcessManifestWrites:AfterNewManifest");
}
Install new version
Version Edit
Version Edit Layout
Version edit是每次元数据变化时的增量(添加/删除文件,添加/删除 column family)
Data Types
Simple data types
VarX - Variable character encoding of intX
FixedX - Fixed character encoding of intX
Complex data types
String - Length prefixed string data
+-----------+--------------------+
| size (n) | content of string |
+-----------+--------------------+
|<- Var32 ->|<-- n -->|
Version Edit Record Types and Layout
不同的状态改变的数据记录,大致格式
+-------------+------ ......... ----------+
| Record ID | Variable size record data |
+-------------+------ .......... ---------+
<-- Var32 --->|<-- varies by type -->
Comparator edit record
Log number edit record
Previous File Number edit record:
Next File Number edit record:
Last Sequence Number edit record
Max Column Family edit record
-
Deleted File edit record
Mark a file as deleted from database. +-----------------+-------------+--------------+ | kDeletedFile | level | file number | +-----------------+-------------+--------------+ <-- Var32 --->|<-- Var32 -->|<-- Var64 -->|
New File edit record
Mark a file as newly added to the database and provide RocksDB meta information.
-
-
File edit record with compaction information(compaction信息的)
kNewFile Level file number file size Smallest_key Largest_key Smallest_seqno Largest_seqno ... var32 var32 var64 var64 String String var64 var64 ...
-
-
- File edit record backward compatible
-
- File edit record with path information
Column family status edit record
-
Column family add edit record
Add a column family +---------------------+----------------+ | kColumnFamilyAdd | cf name | +---------------------+----------------+ <-- Var32 --->|<-- String -->|
-
Column family drop edit record
Drop all column family +---------------------+ | kColumnFamilyDrop | +---------------------+ <-- Var32 --->|