Lightning Memory-Mapped Database Manager (MDB)

Introduction

MDB is a Btree-based database management library modeled loosely on the BerkeleyDB API, but much simplified. The entire database is exposed in a memory map, and all data fetches return data directly from the mapped memory, so no malloc's or memcpy's occur during data fetches. As such, the library is extremely simple because it requires no page caching layer of its own, and it is extremely high performance and memory-efficient. It is also fully transactional with full ACID semantics, and when the memory map is read-only, the database integrity cannot be corrupted by stray pointer writes from application code.

The library is fully thread-aware and supports concurrent read/write access from multiple processes and threads. Data pages use a copy-on- write strategy so no active data pages are ever overwritten, which also provides resistance to corruption and eliminates the need of any special recovery procedures after a system crash. Writes are fully serialized; only one write transaction may be active at a time, which guarantees that writers can never deadlock. The database structure is multi-versioned so readers run with no locks; writers cannot block readers, and readers don't block writers.

Unlike other well-known database mechanisms which use either write-ahead transaction logs or append-only data writes, MDB requires no maintenance during operation. Both write-ahead loggers and append-only databases require periodic checkpointing and/or compaction of their log or database files otherwise they grow without bound. MDB tracks free pages within the database and re-uses them for new write operations, so the database size does not grow without bound in normal use.

The memory map can be used as a read-only or read-write map. It is read-only by default as this provides total immunity to corruption. Using read-write mode offers much higher write performance, but adds the possibility for stray application writes thru pointers to silently corrupt the database. Of course if your application code is known to be bug-free (...) then this is not an issue.

Caveats

Troubleshooting the lock file, plus semaphores on BSD systems:

  • A broken lockfile can cause sync issues. Stale reader transactions left behind by an aborted program cause further writes to grow the database quickly, and stale locks can block further operation.

Fix: Check for stale readers periodically, using the mdb_reader_check function or the mdb_stat tool. Or just make all programs using the database close it; the lockfile is always reset on first open of the environment.

  • On BSD systems or others configured with MDB_USE_POSIX_SEM, startup can fail due to semaphores owned by another userid.

Fix: Open and close the database as the user which owns the semaphores (likely last user) or as root, while no other process is using the database.

Restrictions/caveats (in addition to those listed for some functions):

  • Only the database owner should normally use the database on BSD systems or when otherwise configured with MDB_USE_POSIX_SEM. Multiple users can cause startup to fail later, as noted above.
  • A thread can only use one transaction at a time, plus any child transactions. Each transaction belongs to one thread. See below. The MDB_NOTLS flag changes this for read-only transactions.
  • Use an MDB_env* in the process which opened it, without fork()ing.
  • Do not have open an MDB database twice in the same process at the same time. Not even from a plain open() call - close()ing it breaks flock() advisory locking.
  • Avoid long-lived transactions. Read transactions prevent reuse of pages freed by newer write transactions, thus the database can grow quickly. Write transactions prevent other write transactions, since writes are serialized.
  • Avoid suspending a process with active transactions. These would then be "long-lived" as above. Also read transactions suspended when writers commit could sometimes see wrong data.

...when several processes can use a database concurrently:

  • Avoid aborting a process with an active transaction. The transaction becomes "long-lived" as above until a check for stale readers is performed or the lockfile is reset, since the process may not remove it from the lockfile.
  • If you do that anyway, do a periodic check for stale readers. Or close the environment once in a while, so the lockfile can get reset.
  • Do not use MDB databases on remote filesystems, even between processes on the same host. This breaks flock() on some OSes, possibly memory map sync, and certainly sync between programs on different hosts.
  • Opening a database can fail if another process is opening or closing it at exactly the same time.
Author:Howard Chu, Symas Corporation. Copyright:Copyright 2011-2013 Howard Chu, Symas Corp. All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted only as authorized by the OpenLDAP Public License.

A copy of this license is available in the file LICENSE in the top-level directory of the distribution or, alternatively, at <http://www.OpenLDAP.org/license.html>.

Derived From:This code is derived from btree.c written by Martin Hedenfalk.

Copyright (c) 2009, 2010 Martin Hedenfalk <[email protected]>

Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.

你可能感兴趣的:(Lightning Memory-Mapped Database Manager (MDB))