原文地址: 数据库Buffer Pool实现原理
Introduction
实现一个buffer manager,用来对buffer pool进行管理。
Requirement
In this project milestone, you are going to implement a buffer manager discussed in the class. You should use C++11 to finish this project. If you use more recent C++ standards (e.g., C++17), make sure to update the required standard in CMakeLists.txt. See Tutorial 2 for a quick overview of C++11 features. If your code (as a result of using more recent C++ standards) requires a specific version of GCC/Clang, please document it in your submission and make corresponding changes in your CMakeLists.txt file. You are not allowed to use external libraries other than standard C++ library. All code must run on a Linux machine. We will grade your code on a Linux machine.
Note: It is very important that you are familiar with the provided codebase to be able to finish the required tasks (described later). Make sure you understand the following two sections before attempting the problems in Section 3. Also check out Tutorial 1 for basic information on project architecture and building and working on the project code.
Setting up the Base Code
We have prepared the basic code skeleton for you to start with. Assuming you have set up your repo as in Project Milestone 0, issue the following commands to get it in your code repo:
$ git fetch --all
$ git merge base/master --allow-unrelated-histories
Code Structure
Now you should see a new yase_internal.h header file and a BufferManager directory in which there are multiple header (.h) and source (.cc) files:
- buffer_manager.{h,cc}: buffer manager implementation, including page frame management, replacement algorithm.
- table.{h,cc}: user-visible table abstraction. The test program uses structures defined and implemented in these files to manipulate data.
- file.{h,cc}: implements storage files that hold data for tables. Each table corresponds to a file.
- page.{h,cc}: data and directory page abstractions.
Building the Code
Dependencies: The code depends on the glog, g_ags, gtest libraries for logging, handling command-line parameters and testing, and CMake for configuring and building. You will need to have these libraries installed in your system (or build your own). If you are using CSIL server, then all these packages should have already been installed. Each Linux distribution has its own package manager so you will need to find out yourself. Below are guidelines for Arch Linux and Ubuntu.
For Arch Linux users: The following command will get you all set directly:
$sudo pacman -S cmake gtest gflags google- glog.
Proceed to the "Building the code" section.
For Ubuntu users: To install these libraries in your own Ubuntu system, issue the following commands:
$ sudo apt-get install cmake
$ sudo apt-get install libgoogle-glog-dev libgflags-dev libgtest-dev // Note the 'dev' suffix
To use gtest on Ubuntu, you need to do some extra work to build gtest by yourself as the gtest packages in Ubuntu only give source code stored under /usr/src/gtest
. To do so, follow the steps below:
$ mkdir ~/gtest && cd ~/gtest // Create a gtest folder in your own directory
$ cd gtest
$ cmake -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=on /usr/src/gtest // Build a shared library
$ make -jN // Replace N with the number of threads you'd like to use for compiling
Then issue sudo make install
if you are using your own machine. If you are using CSIL machine, you may skip this step, and change your top-level CMakeLists.txt to include the library install path using -L/home/[username]/gtest
when linking with gtest (i.e., tell the compiler where to find libgtest.so). This can be done by adding the following to the top-level CMakeLists.txt: link_directories("/home/[username]/gtest")
. Finally, in some environments you may need to also add -lpthread to your CMakeLists.txt (e.g., by adding a new line link_libraries("-lpthread -pthread")
.
Building the code: Once you have all the dependencies set up, use the following commands to build the project:
$ mkdir build && cd build // Create a separate directory to build the project
$ cmake -DCMAKE_BUILD_TYPE=[Debug/Release/RelWithDebInfo] ..
$ make -jN
YASE Architecture
We design YASE to be a row-store that is capable of storing fixed-length records (size can be specified by the user) using slotted pages. Free and occupied pages are tracked by directory pages. Each table is backed by a single file that stores both directory and data pages.
Supported operations
Insert, delete, read, and update a record. The BufferManager/table.{h,cc} files define these user- visible interfaces. The insert operation will return the record ID (RID) of the inserted record, the other operations require an RID as the input and returns true if the operation succeeded. RID and page ID are defined in yase_internal.h.
Table and Space Management
Each table is backed by a file (represented by the File structure defined in file.{h,cc}). We use directory pages to manage pages in the table. Upon initialization, the user specifies the maximum amount of data that can be stored by the table (see Table's constructor in table.h). Since the maximum size of a file is known, the number of directory pages is also fixed. In each file, we store all directory pages in the beginning of the file, and each directory entry includes two field: free_slots and allocated, which represents the number of free slots in the corresponding data page and whether the data page was allocated.
Data Page
Data pages follow the bit array design where at the end of the page is a header area that includes (1) record count and (2) a bit array (one bit per slot to indicate whether the slot is vacant). Data records are stored one after another starting from the beginning of the page. To delete a record, we simply reset the slot's bit in the bit array to 0 and decrement record count, then increment the free_slots field in the corresponding directory page.
Page, File, Record IDs
RIDs are 8-byte integers subdivided into three parts (file ID, page ID and slot ID) occupying different parts of the 8-byte (64 bits). The file ID indicates which file (table) this record belongs to; the page ID indicates which page inside the file stores the record; the slot ID indicates the slot in the page storing the record. File IDs are 16-bit integers, while both page and slot IDs are 24-bit long. Detailed definitions for PageID and RID can be found in yase_internal.h. An important note is that page IDs are local to files, i.e., page IDs run from 0 to maximum for each different file ID. Similarly, slot IDs are page-local, running from 0 to maximum for each different page.
Tasks
There are three tasks in this project milestone. Storage-level functionality (insert/delete/read/update) are already provided in the base code. You will need to finish the missing buffer pool functionality based on the given code. Since the code is missing in functionality, buffer_manager_test will not pass until you correctly implement the required components.
Task 1 - Buffer Pool Structure
The buffer pool is defined by the BufferManager class in buffer_manager.h. First, finish the constructor function to initialize the buffer pool with an array of page frames (described by the Page structure). Detailed steps are provided in the code. Also finish the destructor function to free the page frames allocated upon class destruction.
Next, follow the approach discussed in class (slide 5 of Storage Management - Part 2), set up a PageID - Page Frame mapping. Note that PageIDs are local to each file. Your buffer manager should support multiple tables. That is, you should setup another FileID - File object mapping in order to be able to process pages belonging to different table.
Hint: For both mappings, you may use std::map or std::unorderd_map. An example is included in the code at lines 83-87 of buffer_manager.h
Next, implement the PinPage and UnpinPage functions that pins and unpins a page with a given page ID. PinPage is used by the Table and File classes to load a page in the buffer pool in order to modify (e.g., insert record) it. PinPage's signature is Page *PinPage(PageID page_id, bool new_page = false);
. The return value (Page *
)is a page frame that contains a page backed by storage.
The UnpinPage function is as simple as just decrementing the pin count in the Page structure provided.
Check the buffer_manager.{h,cc} for details. You are free to modify the buffer manager code (including removing/replacing variable/function definitions), but you are not allowed to change the insert/delete/read/update interfaces exported by table.h.
Task 2 - LRU Replacement Algorithm
Your PinPage function should use LRU to evict a page when needed. You are free to use different approaches to implement LRU. One way is to follow the approach that is discussed in class, by using a queue of unpinned pages. A page frame (i.e., Page*) is added to the tail of the queue when its pin count is zero. The head page on the queue will be the candidate for eviction. When eviting a page, make sure to properly _ush the page back back using File::FlushPage.
Task 3 - Testing Your Code
Under the Tests directory a new test program (buffer_manager_test.cc) is given to get you started on testing your code. You task here is to add one more test case that covers more than one table. You may follow the similar structure and operations of the existing test, but modify it to use two different tables.
We will test your code using more complex cases so you are encouraged to use this code as a starting point and vary as many parameters as possible to test your code.
Submission
You need to do two things to submit your solution:
- Add a CONTRIBUTIONS file that documents each group member's contribution to your solution. If you are doing the project alone, you may just indicate this in the CONTRIBUTIONS file.
- Check in (commit and push!) your code and the CONTRIBUTIONS file to your project repo by the deadline.
There is no need to submit anything to CourSys (grades will be posted there, though). Make sure you are familiar with the course policies. Especially, we do not accept late submissions or submissions by means other than specified here. You can get partial marks for coding style if your code does not compile. You can (and should) also continue to work on your solutions throughout the semester to possibly get bonus marks in the end of the semester. Sample solutions will not be provided for course project.
(本文出自csprojectedu.com,转载请注明出处)