百度文件系统bfs源码分析系列(二)

写流程(中)

这一篇还是分析写过程,比较复杂其实,涉及到众多数据结构的实现,这里先分析两个类BlockMappingManager/FileLockManager,一个是管理block元数据的类,一个是锁的实现。

BlockMappingManager是BlockMapping的管理类,后者是具体的block管理类,在nameserver启动时,由命令行参数指定FLAGS_blockmapping_bucket_num

 16 class BlockMappingManager {
 17 public :
 18     BlockMappingManager(int32_t bucket_num);
 19     ~BlockMappingManager();
 46 private:
 47     int32_t blockmapping_bucket_num_;
 48     std::vector block_mapping_;
 49     ThreadPool* thread_pool_;
 50 };

 20 BlockMappingManager::BlockMappingManager(int32_t bucket_num) :
 21     blockmapping_bucket_num_(bucket_num) {
 22     thread_pool_ = new ThreadPool(FLAGS_blockmapping_working_thread_num);
 23     block_mapping_.resize(blockmapping_bucket_num_);
 24     for (size_t i = 0; i < block_mapping_.size(); i++) {
 25         block_mapping_[i] = new BlockMapping(thread_pool_);
 26     }
 27     srand(time(NULL));
 28 }

 33 int32_t BlockMappingManager::GetBucketOffset(int64_t block_id) {
 34     return block_id % blockmapping_bucket_num_;
 35 }

所有该类的接口,通过使用block_id的,最后都会调用GetBucketOffset定位到哪个具体的BlockMapping

在nameserver启动时,构造函数NameServerImpl::NameServerImpl中会进行:

  93 void NameServerImpl::CheckLeader() {
  94     if (!sync_ || sync_->IsLeader()) {
  96         NameServerLog log;
  97         std::function task =
  98             std::bind(&NameServerImpl::RebuildBlockMapCallback, this, std::placeholders::_1);
  99         namespace_->Activate(task, &log);
 100         if (!LogRemote(log, std::function())) {
 102         }
 103         recover_timeout_ = FLAGS_nameserver_start_recover_timeout;
 104         start_time_ = common::timer::get_micros();
 105         work_thread_pool_->DelayTask(1000, std::bind(&NameServerImpl::CheckRecoverMode, this));
 106         is_leader_ = true;
 107     } else {
 108         is_leader_ = false;
 109         work_thread_pool_->DelayTask(100, std::bind(&NameServerImpl::CheckLeader, this));
 111     }
 112 }

namespace_->Activate中:

 52 void NameSpace::Activate(std::function callback, NameServerLog* log) {
 53     //more code...
 75     SetupRoot();
 76     RebuildBlockMap(callback);
 77     InitBlockIdUpbound(log);
 78 }

675 bool NameSpace::RebuildBlockMap(std::function callback) {
676     //more code...
681     leveldb::Iterator* it = db_->NewIterator(leveldb::ReadOptions());
682     for (it->Seek(std::string(7, '\0') + '\1'); it->Valid(); it->Next()) {
683         FileInfo file_info;
684         bool ret = file_info.ParseFromArray(it->value().data(), it->value().size());
689         FileType file_type = GetFileType(file_info.type());
690         if (file_type == kDefault) {
691             //a file
692             for (int i = 0; i < file_info.blocks_size(); i++) {
693                 if (file_info.blocks(i) >= next_block_id_) {
694                     next_block_id_ = file_info.blocks(i) + 1;
695                     block_id_upbound_ = next_block_id_;
696                 }
697                 ++block_num;
698             }
699             ++file_num;
700             if (callback) {
701                 callback(file_info);
702             }
707         }
708     }
709 //more code...

即进行RebuildBlockMapCallback

1065 void NameServerImpl::RebuildBlockMapCallback(const FileInfo& file_info) {
1066     for (int i = 0; i < file_info.blocks_size(); i++) {
1067         int64_t block_id = file_info.blocks(i);
1068         int64_t version = file_info.version();
1069         block_mapping_manager_->RebuildBlock(block_id, file_info.replicas(), version, file_info.size());
1071     }   
1072 } 

以上过程在启动时遍历leveldb中的元数据,并建立起映射关系,也会初始化诸如last_entry_id_ /next_block_id_等信息,用于在后续真正写文件时分配新的block索引,后期会介绍他们的具体作用。

以下具体是block的结构:

 23 struct NSBlock {
 24     int64_t id;
 25     int64_t version; 
 26     std::set replica;
 27     int64_t block_size;
 28     uint32_t expect_replica_num;
 29     RecoverStat recover_stat;                 
 30     std::set incomplete_replica;
 31     NSBlock();
 32     NSBlock(int64_t block_id, int32_t replica, int64_t version, int64_t size);
 33     bool operator<(const NSBlock &b) const {
 34         return (this->replica.size() >= b.replica.size());
 35     }   
 36 };

 48 enum RecoverStat { //此block的状态
 49     kNotInRecover = 0;
 50     kLoRecover = 1;
 51     kHiRecover = 2;
 52     kCheck = 3;
 53     kIncomplete = 4;
 54     kLost = 5;
 55     kBlockWriting = 6;
 56     kAny = 20;
 57 }

105 void BlockMapping::RebuildBlock(int64_t block_id, int32_t replica,
106                                 int64_t version, int64_t size) {
107     NSBlock* nsblock = NULL;
108     nsblock = new NSBlock(block_id, replica, version, size);
109     if (size) {//不是很明白为什么会根据size设置状态
110         nsblock->recover_stat = kLost;
111         lost_blocks_.insert(block_id);
112     } else {                   
113         nsblock->recover_stat = kBlockWriting;
114     }
121     
123     MutexLock lock(&mu_);
125     std::pair ret =
126         block_map_.insert(std::make_pair(block_id, nsblock));
129 }

其中nsblock在初始化过程中recover_stat(version < 0 ? kBlockWriting : kNotInRecover),用版本号来决定该块处于什么状态,后面再根据size重置该状态,有点不明白;

问题:
一个block的状态有好多种,分别会在什么情况下设置对应的状态呢?比如版本号不对,大小不对,未同步(raft)等?后面等具体写文件时再分析。

BlockMapping剩下的变量作用后面分析,后期分析块的更新/修复过程;

 58 class BlockMapping {
 59 public:
 68     bool UpdateBlockInfo(int64_t block_id, int32_t server_id, int64_t block_size, int64_t block_version);
 84 private:
 85     void DealWithDeadBlockInternal(int32_t cs_id, int64_t block_id);
 86     typedef std::map > CheckList;
 89     void TryRecover(NSBlock* block);
104 private:
105     Mutex mu_;
106     ThreadPool* thread_pool_;
107     typedef std::map NSBlockMap;
108     NSBlockMap block_map_;
109 
110     CheckList hi_recover_check_;
111     CheckList lo_recover_check_;
112     CheckList incomplete_;
113     std::set lo_pri_recover_;
114     std::set hi_pri_recover_;
115     std::set lost_blocks_;
116 };

下面分析下FileLockManager实现,从使用它的地方开始分析比较好,在NameServerImpl::CreateFile中,会创建元数据,因为该进程是多线程的,且可能某个目录被多个线程在同一时刻操作,比如创建文件和删除目录,分发到不同线程会造成数据不一致,先分析具体实现,再分析为什么要这么做?以及锁的粒度:

 369 void NameServerImpl::CreateFile(::google::protobuf::RpcController* controller,
 370                                 const CreateFileRequest* request,
 371                                 CreateFileResponse* response,
 372                                 ::google::protobuf::Closure* done) {
 373     //more code...
 389     FileLockGuard file_lock(new WriteLock(path));
 390     StatusCode status = namespace_->CreateFile(path, flags, mode, replica_num, &blocks_to_remov     e, &log);
 391     //more code...

下面是读锁和写锁的类声明,只含有路径数据,并不是真正的锁,最近也只是调用file_lock_manager_相关的成员函数:

 19 class Lock {
 20 public:
 21     virtual ~Lock() {}
 22 };  
 23 
 24 class WriteLock : public Lock {
 25 public:
 26     WriteLock(const std::string& file_path);
 27     WriteLock(const std::string& file_path_a,
 28               const std::string& file_path_b);//对多个路径加锁,先比较,按照字典序先后加锁,解锁时逆序;
 29     ~WriteLock();
 30     static void SetFileLockManager(FileLockManager* file_lock_manager);
 31 private:
 32     // will be initialized in NameServerImpl's constructor
 33     static FileLockManager* file_lock_manager_;
 34     std::vector file_path_;
 35 };  
 36 
 37 class ReadLock : public Lock {
 38 public:
 39     ReadLock(const std::string& file_path);
 40     ~ReadLock();
 41     static void SetFileLockManager(FileLockManager* file_lock_manager);
 42 private:
 43     // will be initialized in NameServerImpl's constructor
 44     static FileLockManager* file_lock_manager_;
 45     std::string file_path_; 
 46 };

 16 WriteLock::WriteLock(const std::string& file_path) {
 17     file_path_.push_back(file_path);
 18     file_lock_manager_->WriteLock(file_path);
 19 }

 40 WriteLock::~WriteLock() {
 41     if (file_path_.size() == 1) {
 42         file_lock_manager_->Unlock(file_path_[0]);
 43     } else {//解锁时逆序
 44         file_lock_manager_->Unlock(file_path_[1]);
 45         file_lock_manager_->Unlock(file_path_[0]);
 46     }
 47 }
 52 
 53 ReadLock::ReadLock(const std::string& file_path) {
 54     file_path_ = file_path;
 55     file_lock_manager_->ReadLock(file_path);
 56 }
 57 
 58 ReadLock::~ReadLock() {
 59     file_lock_manager_->Unlock(file_path_);
 60 }
 21 class FileLockManager {
 22 public:
 23     FileLockManager(int bucket_num = 19);
 24     ~FileLockManager();
 25     void ReadLock(const std::string& file_path);
 26     void WriteLock(const std::string& file_path);
 27     void Unlock(const std::string& file_path);
 28 private:
 29     enum LockType {
 30         kRead,
 31         kWrite
 32     };
 33     struct LockEntry {
 34         common::Counter ref_;
 35         common::RWLock rw_lock_;
 36     };
 37     struct LockBucket {
 38         Mutex mu;
 39         std::unordered_map lock_map;
 40     };
 41     void LockInternal(const std::string& path, LockType lock_type);
 42     void UnlockInternal(const std::string& path);
 43     int GetBucketOffset(const std::string& path);
 44 private:
 45     std::vector locks_;
 46 };

如实现中,对路径加锁,其实这里会对每一级加锁,引用“加锁的整体过程为:为最后一级文件加写锁,前面所有目录加读锁。加锁时,从根目录开始,不断尝试对下级目录进行加锁。同时,加读写锁时,写锁优先,可以防止写目录操作的饥饿现象”“由于在rename时涉及到两个不同的路径,为防止死锁,rename操作时,都遵循按照路径的字典序进行加锁”。

对于要修改文件或目录的元数据,如上面的CreateFile,加的WriteLock,最终:

 46 void FileLockManager::WriteLock(const std::string& file_path) {
 48     std::vector paths;
 49     common::SplitString(file_path, "/", &paths);//对路径分层
 50     // first lock "/"
 51     if (paths.size() == 0) { //对根目录加写锁
 52         LockInternal("/", kWrite);
 53         return;
 54     }
 55     LockInternal("/", kRead);//否则对根目录加读锁
 56     std::string cur_path;
 57     for (size_t i = 0; i < paths.size() - 1; i++) {
 58         cur_path += ("/" + paths[i]);
 59         LockInternal(cur_path, kRead);//依次加读锁
 60     }
 61     cur_path += ("/" + paths.back()); //加写锁
 62     LockInternal(cur_path, kWrite);
 63 }

以上实现是判断有没有目录,由SplitString按照/进行切分;如果没有则对根目录加写锁返回;否则先对根目录加读锁,然后对每一层目录的所在路径加读锁,最后对整个路径,即文件名的绝对路径加写锁;

 83 void FileLockManager::LockInternal(const std::string& path,
 84                                      LockType lock_type) {
 85     LockEntry* entry = NULL;
 86 
 87     int bucket_offset = GetBucketOffset(path);//hash path
 88     LockBucket* lock_bucket = locks_[bucket_offset];
 89 
 90     {
 91         MutexLock lock(&(lock_bucket->mu));
 92         auto it = lock_bucket->lock_map.find(path);
 93         if (it == lock_bucket->lock_map.end()) {
 94             entry = new LockEntry();
 95             // hold a ref for lock_map_
 96             entry->ref_.Inc();
 97             lock_bucket->lock_map.insert(std::make_pair(path, entry));
 98         } else {
 99             entry = it->second;
100         }
101         // inc ref_ first to prevent deconstruct
102         entry->ref_.Inc();
103     }
104 
105     if (lock_type == kRead) {
106         // get read lock
107         entry->rw_lock_.ReadLock();
108     } else {
109         // get write lock
110         entry->rw_lock_.WriteLock();
111     }
112 }

 33 void FileLockManager::ReadLock(const std::string& file_path) {
 35     std::vector paths;
 36     common::SplitString(file_path, "/", &paths);
 37     // first lock "/"
 38     LockInternal("/", kRead);
 39     std::string cur_path;
 40     for (size_t i = 0; i < paths.size(); i++) {
 41         cur_path += ("/" + paths[i]);
 42         LockInternal(cur_path, kRead);
 43     }
 44 }

以上是具体的加锁操作过程,先根据path哈希到vector的哪个LockBucket,然后再具体到某个LockEntry,没有的话则构造一个并设置引用计数1,并保存到map中;然后对计数加1;接着判断是读还是写锁,如果是加锁立即返回,否则阻塞直到成功。

加读锁和加写锁的部分实现差不多,不分析了。

下面分配下解锁时的过程:

 65 void FileLockManager::Unlock(const std::string& file_path) {
 66     /// TODO maybe use NormalizePath is better
 67     std::vector paths;
 68     common::SplitString(file_path, "/", &paths);
 69     std::string path;
 70     for (size_t i = 0; i < paths.size(); i++) {
 71         path += ("/" + paths[i]);
 72     }
 74     std::string cur_path = path;
 75     for (size_t i = 0; i < paths.size() ; i++) {
 76         UnlockInternal(cur_path);
 77         cur_path.resize(cur_path.find_last_of('/'));
 78     }
 79     // last unlock "/"
 80     UnlockInternal("/");
 81 }

114 void FileLockManager::UnlockInternal(const std::string& path) {
115     int bucket_offset = GetBucketOffset(path);
116     LockBucket* lock_bucket = locks_[bucket_offset];
117 
118     MutexLock lock(&(lock_bucket->mu));
119     auto it = lock_bucket->lock_map.find(path);
120     assert(it != lock_bucket->lock_map.end());
121     LockEntry* entry = it->second;
122     // release lock
123     entry->rw_lock_.Unlock();
124     if (entry->ref_.Dec() == 1) {
125         // we are the last holder
126         /// TODO maybe don't need to deconstruct immediately
127         delete entry;
128         lock_bucket->lock_map.erase(it);
129     }
130 }

引用“在对某一级目录或文件操作完成后,对锁进行释放时,需要按照加锁的相反顺序进行释放。释放时,先释放读写锁,然后将LockEntry的引用计数减1,如果引用计数减至0,则触发LockEntry的析构”。

以上实现如连接中描述那般,可能存在性能问题,引用:
“考虑到每个文件的锁相对比较独立,可以对FileLockManager中的映射信息进行分桶;

绝大部分情况下,并不会出现同时操作同一个文件的元数据的情况,因此次加锁、释放锁的时候,会造成LockEntry结构的构造和析构,或者可以辅以cache,将需要释放的LockEntry先缓存起来,让缓存去负责最终释放,当需要新申请LockEntry时,直接从缓存中召回一个即可,类似内核中slab的做法;”

TODO
分析块的修复,更新过程,同步,下一篇分析。

bfs file_lock

你可能感兴趣的:(百度文件系统bfs源码分析系列(二))