This page describes how & where Bitcoin Core stores blockchain data.
There are basically four pieces of data that are maintained:
Note that the LevelDB's are redundant in the sense that they can be rebuilt from the block data. But validation and other operations would become intolerably slow without them.
See here: StackExchange post by Pieter Wuille (2013)
Block files store the raw blocks as they were received over the network.
Block files are about 128 MB, allocated in 16 MB chunks to prevent excessive fragmentation. As of October 2015, the block chain is stored in about 365 block files, for a total of about 45 GB.
Each block file (blk1234.dat) has a corresponding undo file (rev1234.dat) which contains the data necessary to remove blocks from the blockchain in the event of a reorganization (fork).
Info about the block files is stored in the block index (the LevelDB) in two places:
Accessing the block data files from the code
The block files are accessed through:
1) DiskBlockPos: a struct that is simply a pointer to a block's location on disk (a file number and an offset.)
2) vInfoBlockFiles: a vector of BlockFileInfo objects. This variable is used to perform such tasks as:
Blocks are written to disk as soon as they are received, in AcceptBlock. (The actual disk write operation is in WriteBlockToDisk [main.cpp:1164]). Note that there is some overlap of the code that accesses block files with the code that accesses and writes to the coins database (/chainstate). There is a complex system of when to flush state to disk. None of this code affects block files, which are simply written to disk when received. Once they have been received and stored, the block files are only needed for serving blocks to other nodes.
More info about block files
See here: the commit that puts multiple blocks in a block file (2012)
The block index holds metadata about all known blocks, including where the block is stored on disk.
Note that the set of "known blocks" is a superset of the longest chain, because it includes blocks that were received and processed but are not part of the active chain - for example, orphaned blocks that were detached from the active chain in a small reorganization.
Terminology
The terminology can be a little confusing here, because while people normally think of the "blockchain" as being synonymous with the active chain (an uninterrupted, linear chain of X blocks starting with the genesis block and continuing to the current tip), there are some places in the code where "blockchain" refers to the active chain plus the numerous, mostly short forks off the chain that our node happens to know about.
a) Block Tree
A better term for the set of known blocks stored on disk is "block tree," as this term contemplates a tree structure with numerous branches (albeit small ones) from the main chain. Indeed, the block index LevelDB is accessed through the "CBlockTreeDB" wrapper class, defined in src/txdb.h. Note that it's perfectly fine, indeed it is expected, that different nodes would have slightly different block trees; what matters is that they agree on the active chain.
Key-value pairs
Inside the actual LevelDB, the used key/value pairs are:
'b' + 32-byte block hash -> block index record. Each record stores: * The block header * The height. * The number of transactions. * To what extent this block is validated. * In which file, and where in that file, the block data is stored. * In which file, and where in that file, the undo data is stored.
'f' + 4-byte file number -> file information record. Each record stores: * The number of blocks stored in the block file with that number. * The size of the block file with that number ($DATADIR/blocks/blkNNNNN.dat). * The size of the undo file with that number ($DATADIR/blocks/revNNNNN.dat). * The lowest and highest height of blocks stored in the block file with that number. * The lowest and highest timestamp of blocks stored in the block file with that number.
'l' -> 4-byte file number: the last block file number used.
'R' -> 1-byte boolean ('1' if true): whether we're in the process of reindexing.
'F' + 1-byte flag name length + flag name string -> 1 byte boolean ('1' if true, '0' if false): various flags that can be on or off. Currently defined flags include: * 'txindex': Whether the transaction index is enabled.
't' + 32-byte transaction hash -> transaction index record. These are optional and only exist if 'txindex' is enabled (see above). Each record stores: * Which block file number the transaction is stored in. * Which offset into that file the block the transaction is part of is stored at. * The offset from the start of that block to the position where that transaction itself is stored.
See here: StackExchange post by Pieter Wuille (2014)
Data Access Layer
The database is accessed through CBlockTreeDB wrapper class. See txdb.h.
The wrapper is instantiated in a global variable called pblocktree, defined in main.cpp.
CBlockIndex
Blocks stored in the database are represented in memory as CBlockIndex objects. An object of this type is first created after the header is received; the code does not wait to receive the full block. When headers are received over the network, they are streamed into a vector of CBlockHeaders, which are then checked. Each header that checks out causes a new CBlockIndex to be created, which is stored to the database.
CBlock / CBlockHeader
Note that these objects have little to do with the /blocks LevelDB. A CBlock holds the full set of transactions in the block, the data for which is stored in two places - in full, in raw format, in the blk???.dat files, and in pruned format in the UTXO database. The block index database cares not for such details, since it holds only the metadata for the block.
Loading the block database into memory
The entire database is loaded into memory on startup. See LoadBlockIndexGuts (txdb.cpp). This only takes a few seconds.
The blocks ('b' keys) are loaded into the global "mapBlockIndex" variable. "mapBlockIndex" is an unordered_map that holds CBlockIndex for each block in the entire block tree; not just the active chain.
mapBlockIndex is described in more detail in Chapter 6 - The Blockchain.
The block file metadata ('f' keys) is loaded into vInfoBlockFiles.
The UTXO database was introduced in 2012 in pull request #1677 - "Ultraprune."
The idea behind "Ultraprune" is to reduce the size of (prune) the set of past transactions, keeping only those parts of past transactions that are necessary to validate later transactions.
Say you have a transaction T1 which takes two inputs and sends to 3 outputs: O1,O2,O3. Two of those outputs (O1, O2) have been used as inputs in a later transaction, T2. Once T2 has been mined, T1 only has one item of interest (O3). There's no reason to keep T1 around in its entirety. Instead, a slimmed-down version of T1 will suffice, consisting only of O3 (locking script and amount) and certain basic information about T1 (height, whether it is a coinbase, etc.)
The description of ultraprune is on the specific "ultraprune" commit within the pull:
See: Ultraprune - July 2012
Terminology
"UTXO (Unspent Transaction Out):" An output from a transaction. This is colloquially referred to as a "coin." For this reason, the UTXO db is sometimes referred to as the "coins database."
"UTXO set / coins database / chainstate database:" These terms are more or less synonymous and are used interchangeably.
"Provably Unspendable:" A coin is provably unspendable if its scriptPubKey cannot be satisfied - for example, an OP_RETURN. A provably unspendable coin can be eliminated from the utxo database regardless of its amount.
Key-value pairs
The records in the chainstate levelDB are:
'c' + 32-byte transaction hash -> unspent transaction output record for that transaction. These records are only present for transactions that have at least one unspent output left. Each record stores: * The version of the transaction. * Whether the transaction was a coinbase or not. * Which height block contains the transaction. * Which outputs of that transaction are unspent. * The scriptPubKey and amount for those unspent outputs. 'B' -> 32-byte block hash: the block hash up to which the database represents the unspent transaction outputs.
See here: StackExchange post by Pieter Wuille (2014)
Data Access Layer and Caching
Access to the UTXO database is considerably more complex than the block index. This is because its performance is critical to the overall performance of the Bitcoin system. The block index is not so critical to performance because there are only a few hundred thousand blocks and a node running on decent hardware can retrieve and scroll through them in a few seconds (and does not need to do so very often.) On the other hand, there are millions of coins in the UTXO database and they must be checked and modified for each input of each transaction going into the mempool or included in a block.
As sipa said in the ultraprune commit:
This is not stated as clearly as it might have been, however; at least, not for the current state of the code.
In 0.11, the instantiations of the CoinsView are:
Separate from that chain of caches is the memory pool's CoinsView, which is backed by the database.
The class diagram (data types) for the views is:
CCoinsView (abstract class) / \ ViewDB ViewBacked (database) / \ ViewMempool ViewCache
Each class has one key characteristic:
Those are the defined classes; whereas the object diagram is:
Database / \ MemPool Blockchain cache (pcoinsTip) View/Cache \ Validation cache
Here is a table summarizing the instantiations of Views:
Object | Type | Backed By? | Description / Purpose |
---|---|---|---|
DB view | ViewDB | n/a | Represents the UTXO set according to the /chainstate LevelDB. Retrieves coins and flushes changes to the LevelDB. Creation in code (instantiation): see init.cpp:1131 |
pCoinsTip (blockchain cache) |
ViewCache | DB view | Holds the UTXO set corresponding to the active chain's tip. Retrieves/flushes to the database view. Creation in code: see init.cpp:1133 |
Validation cache | ViewCache | pCoinsTip | This cache's lifetime is within ConnectTip (or DisconnectTip). Its purpose is to keep track of modifications to the UTXO set while processing a block. If the block validates, the cache is flushed to pcoinsTip. If the block fails, the cache is discarded. Creation in code: see main.cpp:2231: CCoinsViewCache view(pcoinsTip); |
Mempool view | ViewMemPool | pCoinsTip | This object brings the mempool into view, meaning it can see both a UTXO set and the mempool. Its purpose is to enable validation of chains of transactions, a.k.a. "zero-confirmation" transactions. (If chains of transactions weren't permitted, the mempool could simply validate against pcoinsTip.) Thus, when queried, it can check if a given input can be found either in the mempool (i.e., "zero-conf") or in the blockchain's utxo set ("confirmed.") Note that this object is not a cache; rather, it is a view that is used by the object below, which does contain a cache. Creation in code: Its lifetime is that of AcceptToMemoryPool in main.cpp. |
Mempool cache | ViewCache | Mempool view | The cache for the mempool. It contains a cache and sets its backend to be the mempool view. Creation in code: Its lifetime is also that of AcceptToMemoryPool in main.cpp. |
Loading the UTXO set
Access to the coins database is initialized in init.cpp: 1131-1133:
pcoinsdbview = new CCoinsViewDB(nCoinDBCache, false, fReindex); pcoinscatcher = new CCoinsViewErrorCatcher(pcoinsdbview); pcoinsTip = new CCoinsViewCache(pcoinscatcher);
The code starts by initializing a CoinsViewDB, which is equipped with methods to load coins from the LevelDB.
The error catcher is a little hack that can be ignored.
Next, the code initalizes pCoinsTip, which is the cache representing the state of the active chain, and is backed by the database view.
Cache vs. Database
The FetchCoins function in coins.cpp demonstrates how the code uses the cache vs. the database:
1 CCoinsMap::iterator it = cacheCoins.find(txid); 2 if (it != cacheCoins.end()) 3 return it; 4 CCoins tmp; 5 if (!base->GetCoins(txid, tmp)) 6 return cacheCoins.end(); 7 CCoinsMap::iterator ret = cacheCoins.insert(std::make_pair(txid, CCoinsCacheEntry())).first;
First, the code searches the cache for the coins for a given transaction id. (line 1)
If found, it returns the "fetched" coins. (lines 2-3)
If not, it searches the database. (line 5)
If found in the database, it updates the cache. (line 7)
Note: if the cache's backend is another cache, then the term "database" really means "parent cache."
Flushing the Validation Cache to the Blockchain Cache
The validation cache is flushed to the blockchain cache after connecting a block, just before it goes out of scope. The scope is captured in ConnectTip, and specifically, in the code block main.cpp:2231-2243. In that code block, there is a call to ConnectBlock, during which the code stores the new coins in the validation cache. (Specifically, see UpdateCoins() in main.cpp.) At the end of the code block, the validation cache is flushed. Since its "parent view" is also a cache (pcoinsTip, the "blockchain cache") the code will call the parent's ViewCache::BatchWrite, which swaps the updated coin entries into its own cache. (Polymorphism in action: Later, when the the blockchain cache flushes to the database view, the code will run CoinsViewDB::BatchWrite, the last line of which writes to the LevelDB.)
In summary, usage of the validation cache is straightforward: it is instantiated, used, flushed, and goes out of scope in the aforementioned code block.
Flushing the Blockchain Cache to the Database
Flushing the validate cache was simple because the code only shuffled items between two caches in memory (of which no one is aware outside of the caching code.) Flushing the blockchain cache to the database is a bit more complicated. At the lowest level, the mechanics of flushing the blockchain cache (pcoinsTip) is the same as the validation cache: the Flush() method calls BatchWrite on its backend (the "base" pointer), and in this case that means BatchWrite on the database view. Up a level, Flush() is called from FlushStateToDisk (FSTD) - main.cpp:2098. FlushStateToDisk is invoked at a few different points, with a given mode:
Flush Mode | Description | When called |
---|---|---|
IF_NEEDED | Flush only if the cache is over its size limit. | Right after connecting (or disconnecting) a block and flushing the validation cache. See ConnectTip / DisconnectTip. |
ALWAYS | Flush cache. | During initialization only. |
PERIODIC | Here, the code considers other data points to decide whether to flush. Is the code almost over its size limit? Has it been a long time since the cache was flushed? If so, then proceed. |
At end of ActivateBestChain() (Code comment: "write changes periodically to disk, after relay"). |
The idea is to flush the block cache frequently (to avoid having to download a large number of blocks if the program crashes), but the coins cache infrequently (in order to maximize the benefit from the coins cache.)
Specifically, the block cache is guaranteed to be flushed once an hour, whereas the coins cache once per day. (See here: Sipa comment on PR 6102)
The FlushStateToDisk code is well-commented so for more info, the curious reader can check main.cpp.
The undo data contains the information that is necessary to disconnect or "roll back" a block: specifically, the coins that were spent by the block in question.
So, the data being written is essentially a set of CTxOut objects. (A CTxOut is simply an amount and a script - see primitives/transaction.h:107-108).
The matter is complicated slightly by the fact that if the coin is the last one being spent by its transaction, the undo data needs to store the transaction's metadata (the txn's block height, whether it's a coinbase, and its version.) So, if you have a transaction T with outputs O1,O2,O3 spent in that order, for O1 and O2 all that will be written to the undo file is the amount and the script. For 03, the undo file will have the amount, the script, plus T's height and version, and whether T is a coinbase.
The undo data is written to the raw file with the following code:
fileout << blockundo; (main.cpp:1567 [UndoWriteToDisk])
This line of code calls the serialization function on the CBlockUndo - which is basically just a vector of coins (CTxOuts.) Finally, a checksum is written to the undo file. The checksum is used during initialization to verify that any undo data being checked is intact. See Pull 2145
The undo data is used when disconnecting a block. The DisconnectBlock() code is discussed further down this wiki page in The Blockchain: Reorganizations.
LevelDB is a key-value store that was introduced to store the block index and UTXO set (chainstate) in 2012 as part of the complex "Ultraprune" pull (PR 1677). See here: the 27 commits on Ultraprune.
On the subject of why LevelDB is used, core developer Greg Maxwell stated the following to the bitcoin-dev mailing list in October 2015:
One might ask whether different nodes could use different databases - as long as they retrieve the same data, what's the difference? The issue here is "bug-for-bug compatibility" - if one database has a bug that causes records to not be returned under certain circumstances, then all other nodes bst have the same bug, else the network could fork as a result.
Greg Maxwell stated the following in the same thread referenced above (in response to a proposal to switch to using sqlite):