Database System Implementation

Introduction

Megatron 2000 Implementation Details

   To begin, Megatron 2000 uses the file system to store its relations. For example, the relation Students (name, id, dept) would be stored in the file /usr/db/Stu

dent. The file Students has one line for each tuple. Values of components of atuple are stored as character strings, separated by the special marker character #. For instance, the file /usr/db/Students might look like: Smith*123#CS
                                                                                   Johnson#522#EE  
   The database schema is stored in a special file named /usr/db/schema. For each relation, the file schema has a line beginning with that relation name, in
which attribute names alternate with types. The character # separates elements of these lines. For example, the schema file might contain lines such as :
   Students#name#STR#id#INT#dept#STR
   Depts#name#STR#office#STR

   可见schema也是一个表   

   How Megatron 2000 Executes Queries

   SELECT *
   FROM Students
   WHERE id >= 500 I Highld #

  

   Let us consider a common form of SQL query: (只是对上述情况的解释)
   SELECT * FROM R WHERE <Condition>
   Megatron 2000 will do the following:
   1. Read the file schema to determine the attributes of relation R and their types.
   2. Check that the <Condition> is semantically valid for R.
   3. Display each of the attribute names as the header of a column, and draw a line.(显示表头)
   4. Read the file named R, and for each line:(读数据,进行筛选)
      (a) Check the condition, and
      (b) Display the line as a tuple, if the condition is true.

 
  To execute
  SELECT * FROM R WHERE <condition> I T (解释器认为这条语句的目的是生成一个表T)

   Megatron 2000 does the following:
  1. Process query as before, but omit step (3), which generates column headers and a line separating the headers fiom the tuples.
  2. Write the result to a new file /usr/db/T. 写表
  3. Add to the file /usr/db/schema an entry for T that looks just like the entry for R, except that relation name T replaces R. That is, the schema
for T is the same as the schema for R. 写schema


  SELECT office
  FROM Students, Depts
  WHERE Students.name = 'Smith' AND
  Students.dept = Depts.name #

  The algorithm can be described informally as:
  for(each tuple s in Students)
  for(each tuple d in Depts)
  i f ( s and d satisfy the WHERE-condition) 仅筛选数据时的算法
display the office value from Depts;

Overview of a Database Management System

   In Fig. 1.1 we see an outline of a complete DBMS. Single boxes represent system components, while double boxes represent in-memory data structures. The solid lines indicate control and data flow, while dashed lines indicate data flow only.

                                                          Database System Implementation_第1张图片

  For example, the database administrator, or DBA,

  It is shown in Fig. 1.1 as entered by the DBA, who needs special authority
to execute schema-altering commands, since these can have profound effects
on the database. (schema 和 数据 相分离,权限检查也可以在不同的地方进行)

Overview of Query Processing

There are two paths along which user
actions affect the database:
1. Answering the query. The query is parsed and optimized by a query
compiler. The resulting query plan, or sequence of actions to be taken
to answer the query, is passed to the execution engine. The execution
engine issues a sequence of requests for small pieces of data, typically
records or tuples of a relation, to a resource manager that knows about
data files (holding relations), the format and size of records in those files,
and index files, which help find elements of data files quickly. //Engine的作用?

The requests for data are translated into pages and these requests are passed
to the buffer manager. its task is to bring appropriate portions of
the data from secondary storage (disk, normally) where it is kept perma-
nently, to main-memory buffers. Normally, the page or "disk block" is
the unit of transfer between buffers and disk. The buffer manager com-
municates with a storage manager to get data from disk. The storage
manager might involve operating-system commands, but more typically,
the DBMS issues commands directly to the disk controller.
2. Transaction processing. Queries and other actions are grouped into trans-
actions, which are units that must be executed atomically and in isolation,
as discussed in the introduction to this chapter; often each query or mod-
ification action is a transaction by itself
. In addition, the execution of
transactions must be durable, meaning that the effect of any completed
transaction must be preserved even if the system fails in some way right
after completion of the transaction. We divide the transaction processor
into two major parts:
(a) A concurrency-control manager, (or scheduler), responsible for as-
suring atomicity and isolation of transactions, and
(b) A logging and recovery manager responsible for the durability of
transactions.
We shall consider these components further in Section 1.2.4.

Main-Memory Buffers and the Buffer Manager

all DBMS components that need
information from the disk will interact with the buffers and the buffer manager,
either directly or through the execution engine. 

a DBMS offers the
guarantee of durability: that the work of a completed transaction will never be
lost. The transaction manager therefore accepts transaction commands from an
application, which tell the transaction manager when transactions begin and
end, as well as information about the expectations of the application (some may
not wish to require atomicity, for example). The transaction processor performs
the following tasks:
1. Logging: In order to assure durability, every change in the database is
logged separately on disk. The log manager follows one of several policies
designed to assure that no matter when a system failure or "crash" occurs,
a recovery manager will be able to examine the log of changes and restore
the database to some consistent state. The log manager initially writes
the log in buffers and negotiates with the buffer manager to make sure that
buffers are written to disk (where data can survive a crash) at appropriate
times.
2. Concurrency control: Transactions must appear to execute in isolation.
But in most systems, there will in truth be many transactions executing
at once. Thus, the scheduler (concurrency-control manager) must assure
that the individual actions of multiple transactions are executed in such
an order that the net effect is the same as if the transactions had in fact executed in their entirety, one-at-a-time.

 A typical scheduler does
its work by maintaining locks on certain pieces of the database. These
locks prevent two transactions from accessing the same piece of data in
ways that interact badly. Locks are generally stored in a main-memory
lock table, as suggested by Fig. 1.1. The scheduler affects the execution of
queries and other database operations by forbidding the execution engine
from accessing locked parts of the database.

 Properly implemented transactions are commonly said to meet the "ACID

test," where:
• "A" stands for "atomicity," the all-or-nothing execution of trans-
actions.
• "I" stands for "isolation," the fact that each transaction must appear
to be executed as if no other transaction is executing at the same
time.
• "D" stands for "durability," the condition that the effect on the
database of a transaction must never be lost, once the transaction
has completed.
The remaining letter, "C," stands for "consistency." That is, all databases
have consistency constraints, or expectations about relationships among
data elements (e.g., a certain attribute is a key, students may not take
more than 8 courses at a time, and so on). Transactions are expected to
preserve the consistency of the database. (事务的每一个操作应当满足数据库的一致性,单个操作被认为是一个操作)
fact executed in their entirety, one-at-a-time. A typical scheduler does
its work by maintaining locks on certain pieces of the database. These
locks prevent two transactions from accessing the same piece of data in
ways that interact badly. Locks are generally stored in a main-memory
lock table, as suggested by Fig. 1.1. The scheduler affects the execution of
queries and other database operations by forbidding the execution engine
from accessing locked parts of the database.
3. Deadlock resolution: As transactions compete for resources through the
locks that the scheduler grants, they can get into a situation where none
can proceed because each needs something another transaction has. The
transaction manager has the responsibility to intervene and cancel ("ab-
ort") one or more transactions to let the others proceed.

The Query Processor

1. The query compiler, which translates the query into an internal form called
a query plan. The latter is a sequence of operations to be performed on 

the data. Often the operations in a query plan are implementations of
"relational algebra" operations, which are discussed in Section 6.1 and
with which you may be familiar already. The query compiler consists of
three major units:
(a) A query parser, which builds a tree structure from the textual form
of the query.
(b) A query preprocessor, which performs semantic checks on the query
(e.g., making sure all relations mentioned by the query actually ex-
ist) , and performing some tree transformations to turn the parse tree
into a tree of algebraic operators representing the initial query plan.
(c) A query optimizer, which transforms the initial query plan into the
best available sequence of operations on the actual data.

The query compiler uses metadata and statistics about the data to decide
which sequence of operations is likely to be the fastest. For example, the
existence of an index can make one plan much faster than another.

The execution engine, which has the responsibility for executing each of
the steps in the chosen query plan. The execution engine interacts with
most of the other components of the DBMS, either directly or through
the buffers. It must get the data from the database into buffers in order
to manipulate that data. It needs to interact with the scheduler to avoid
accessing data that is locked, and with the log manager to make sure that
all database changes are properly logged.


你可能感兴趣的:(Database System Implementation)