数据库架构导读和笔记

原文：Architecture of Database Systems: pdf, UMichigan的学生review, 别人写的概述

"This paper presents an architectural discussion of DBMS design principles, including process models, parallel architecture, storage system design, transaction system implementation, query processor and optimizer architectures, and typical shared components and utilities."

It's a pretty long 119-page read, here's a heads up on what's in it:

Core Concepts

Process Models
Relational Query Processor
Storage Management
Transactions, Concurrency Control, and Recovery
Shared Components

Only 4 is somewhat standard textbook material (at least it is for Stanford's CS 245). Everything else was pretty new to me.

DBMS Architecture: Main Components

Notes and Quotes

The goal of this section is information condensation to extract the shortest number of snippets that can still reasonably communicate the core concepts discussed in the paper. The source of all content here is the referenced paper.

1. Introduction

Relational Systems: The Life of a Query

2. Process Models

This is basically a discussion of how to address the execution of concurrent user requests and how to map them to operating system processes or threads.

Keywords

A Lightweight Thread Package is an application-level construct that supports multiple threads within a single OS process. Lightweight threads are scheduled by an application-level thread scheduler in user-space without kernel scheduler involvement or knowledge.
A DBMS Client is the software component that implements the API used by application programs to communicate with a DBMS. Some example database access APIs are JDBC, ODBC, and OLE/DB.
A DBMS Worker is the thread of execution in the DBMS
that does work on behalf of a DBMS Client. A single DBMS worker handles all SQL requests from a single DBMS Client.
The topic of interest here is how to map DBMS workers to OS threads or processes.

2.1 Uniprocessors and Lightweight Threads

Assumptions

OS thread support: OS has efficient support for kernel threads; a process can have a very large number of threads; small memory overhead; inexpensive context switches.
Uniprocessor hardware: design for a single machine with a single CPU.

2.1.1. Process per DBMS worker

The OS scheduler manages the timesharing of DBMS workers and the DBMS programmer can rely on OS protection facilities to isolate standard bugs like memory overruns.
Shared in-memory data structures (e.g. lock table, buffer pool) must be explicitly allocated in OS-supported shared memory accessible across all DBMS processes. This requires OS support and some special DBMS coding.
Used by IBM DB2, PostGreSQL, Oracle.
Not great for scaling due to process overhead and context switching.

2.1.2. Thread per DBMS worker

Single multithreaded process hosts all DBMS worker activity; dispatcher thread listens for new client connections.
Each connection gets a new thread.
Issues: OS does not protect threads from each other’s memory overruns and stray pointers; debugging is tricky, especially with race conditions; portability issues due to lack of thread interface standardization.
Extensive use of shared memory.
Scales well for concurrent connections.
Used by: IBM DB2, Microsoft SQL Server, MySQL, Informix, and Sybase

2.1.3. Process pool

A central process holds all DBMS client connections and, as each SQL request comes in from a client, the request is given to one of the processes in the process pool.
Dynamic resizing of process pool enables more efficient use of memory than one process per worker.

2.1.4. Shared Data and Process Boundaries

Disk I/O Buffers

Database I/O requests: heap-resident data structure available to all threads in the shared DBMS address space. In the other two models, the buffer pool is allocated in shared memory available to all processes
Log requests: logs staged to in-memory queue and periodically flushed to log disk FIFO.

Client Communication Buffers