【数据库】数据库笔记

Introduction

Main content

  1. the data models
  2. Sql language and user interface
  3. Key principles of DBMS mainly architecture, query optimization, concurrency control, recovery
  4. the security and integrity constrains of databases
  5. Database design
  6. new research and applica
  7. tion fields

What is database

definition

​ A very large, integrated collection of data

Function

​ models real-world enterprise

  1. entities e.g. students and courses
  2. Relation e.g. electives

Why use a DBMS

  1. data independence and efficient access
  2. reduced application development time
  3. data integrity and security
  4. uniform data administration
  5. concurrent access, recovery from crashes

File vs. Database

  1. application must stage large datasets between main memory and secondary storage
  2. special code for different queries
  3. must protect data from inconsistence due to multiple concurrent users
  4. crash recovery
  5. security and access control

Concepts

  1. Data are symbols for describing the things of real world. They are existing from of information
  2. Data model is a collection of concepts and definitions for describing data
  3. a schema is a description of a particular collection of data, using a given data model.
  4. the relational model of data is the most widely used model today
    1. Relation: basically a table with rows and columns
    2. every relation has a schema, which describes the columns or fields

The ANSI-SPARC architecture

在这里插入图片描述

  1. many views: view describe how users see the data
  2. conceptual schema: conceptual schema defines logical structure
  3. Physical schema: physical schema describes the fields and indexes uses

What is DBMS

definition

​ A software package designed to store and manage databases

History of DBMS

  1. no data model
  2. simple file operation
  3. network data model
  4. hierarchy data model
  5. rational data model

database system

  1. database system = applications + DBMS + Database + DBA database administration
  2. DBMS is the core of database system
    1. high level user interface
    2. query processing and optimization
    3. catalog management
    4. concurrency control and recovery
    5. integrity constraints checking
    6. Access control

Data Models

Hierarchical Data Model

Basic idea

​ because many things in real world are organized in hierarchy, hierarchical model manages to describe real world in a tree structure

Basic concepts

Record
Field
PCR (parent-child-relationship)

​ the most basic data relationship in hierarchical model

【数据库】数据库笔记_第1张图片

Hierarchical data schema
  1. a hierarchical data schema consists of PCRs.
  2. Every PCR expresses one 1:N relationship
  3. every record type can only have one parent

【数据库】数据库笔记_第2张图片

Virtual record

​ use virtual record to represent some other situation that hierarchical model can’t represent

【数据库】数据库笔记_第3张图片

Network data model

Basic idea

  1. the basic data structure is “set”, it represent a 1:N relationship between thing in real world. “1” side is called owner, and “N” side is called member
  2. One record type can be the owner of multi sets, and also can be the owner of multi sets. Many sets form a network structure to express real world
  3. It breaks through the limit of hierarchical structure, so can express non-hierarchical data more easily
  4. record and data items: data items are similar as field in hierarchical model, but it can be a vector
  5. Link record type: used to represent self relationship, end to end relationship

Set

basic unit for network data schema

【数据库】数据库笔记_第4张图片

Link record type

​ use link record type to represent some relationship that it can’t represent directly

self relationship
end to end relationship

【数据库】数据库笔记_第5张图片

L represent LINK

Relation data model

basic idea

  1. the basic data structure is “table” or relation, The things and the relationships between them in real world are all expressed as tables, so it can be researched in strict mathematic methods. It raises the database technology to a theory height.

features

  1. based on set theory, high abstract level
  2. shield all lower details, simple and clear, easy to understand
  3. Can establish new algebra system relation algebra
  4. Non procedure query language SQL
  5. soft link the essential difference with former data models

Soft link

【数据库】数据库笔记_第6张图片

Some concepts

Attributes and domain
  1. the features of an entity in real world are expressed as attributes in relational model
  2. the value scope of an attribute is called its domain example: age is an positive integer and it can’t be larger than 1000
Relation and tuple
  1. an entity of real world can be expressed as one or more thant one relations
  2. a relation is a N-ary relationship defined on all of its attribute domain R = ( A 1 , A 2 … A n ) R=(A_1, A_2 \dots A_n) R=(A1,A2An)
  3. This is called the schema of T, and n is the number of attributes, called the degree of R.
Primary key
  1. a set of attributes is a candidate key for a relation if
    1. no two distinct tuples can have same values in this set of attributes
    2. this is not true for any subset of this set of attributes id is unique, and id+name is also uneque
      1. super key id is candidate key , “id+name” is a super key
      2. if there’s more than 1 key for a relation, one of the keys is chosen to be the primary key, and the others are called alternate key.
      3. if the primary key consists of all attributes of a relation, it’s called all key
  2. the key can decide a tuple uniquely sid is a key for students, and the set {sid , gpa} is a super key
Foreign key

Set of attributes in one relation that is used to “refer” to a tuple in another rational like a rational pointer

ER Data Model

  1. entity(E): Real-world object distinguishable from other objects. An entities is described using a set of attriburte
  2. entity set:a collection of similar entities
    1. all entities in an entity set have the same set fo attribute
    2. each entity has a key
    3. each attribute has a domain
    4. permit combined or multi-valued attribute
  3. relationship®: Association among two or more entities
    1. relationship can have attributes
  4. relationship set: Collection of similar relationships

【数据库】数据库笔记_第7张图片

Object-Oriented Data Model

Relational algebra

Basic operations

  1. section( σ \sigma σ):Select a subset of row from relation
  2. projection ( π \pi π):Deletes unwanted columns from relation
  3. cross-product ( × \times ×) allows us to combine two relations
  4. set-differences(-) Tuple in reln.1 but not in reln2
  5. Union ( ∪ \cup ): Tuples in reln.1 and in reln.2

{ σ , π , ∪ − × \sigma,\pi,\cup -\times σ,π,×} is a complete operation set

the algebra is “closed”

Other operations

  1. condition join ( σ C ( R × S ) \sigma_C(R\times S) σC(R×S))
  2. division / ≡ ∃ ( x , y ) ∈ A ∀ y ∈ B ≡   π x ( A ) − π x ( ( π x ( A ) × B ) − A ) /\equiv \exists(x,y) \in A \forall y\in B \equiv\ \pi_x(A)-\pi_x((\pi_x(A)\times B)-A) /(x,y)AyB πx(A)πx((πx(A)×B)A)
  3. outer union ∪ ‾ \underline{\cup} : the values of attributes which don’t exist in original tuples are filled as NULL

Relational Calculus

calculus needs to describe the procedures but algebra doesn’tr

  1. Two flavors:
    1. tuple relational calculus: variables range over tuples
    2. domain relational calculus: variables range over domain elements

Tuple relational calculus

Example:

Query has the form:

t < a t t r i b u t e   l i s t > ∣ P ( t ) {t{}|P(t)} t<attribute list>P(t)

t is called tuple variable

Answer includes all tuples t that make the formula P(t be true)

Example: find all sailors’ name whose rating above 7 and younger than 50

t [ N ] ∣ t ∈ S a i l o r s ∧ t . T > 7 ∧ t . A < 50 {t[N]|t\in Sailors \land t.T>7 \land t.A<50} t[N]tSailorst.T>7t.A<50

Domain relational calculus

  1. Example:
    1. Query has the from: { < x 1 , x 1 … x n > ∣ P ( x 1 , x 1 … x n … x n + m ) } \{|P(x_1,x_1\dots x_n\dots x_{n+m})\} {<x1,x1xn>P(x1,x1xnxn+m)}
    2. x 1 , x 1 … x n x_1,x_1\dots x_n x1,x1xn are called domain variables, x 1 , x 1 … x n x_1,x_1\dots x_n x1,x1xn appear in result
    3. answer include all tuples < x 1 , x 1 … x n > <x1,x1xn> that make the formula P ( x 1 , x 1 … x n … x n + m ) P(x_1,x_1\dots x_n\dots x_{n+m}) P(x1,x1xnxn+m) be true
    4. formula is recursively defined, starting with simple atomic formulas and building bigger and better formulas using the logical connectives
Formula
atomic formula
  1. a formula with atomic operation
  2. < x 1 , x 1 … x n > ∈ R n a m e \in Rname <x1,x1xn>∈Rname, or X op Y or X op constant op is one of > < = ≤ ≥ ≠ > < = \le \ge \ne ><=≤≥=
Definition
  1. atomic formula
  2. or ¬ p , p ∧ q , p ∨ q \lnot p, p\land q, p\lor q ¬p,pq,pq where p and q are formulas
  3. ∃ X ( P ( X ) ) o r ∀ X ( P ( X ) ) \exists X(P(X)) or \forall X(P(X)) X(P(X))orX(P(X)) where X X X is free in P ( X ) P(X) P(X)if use quntifier to X, then X is bounded. if X is not bounded then X is free

queries that have infinite number of answers are called unsafe

example: find all sailors with a rating above 7

< I , N , T , A > ∣ < I , N , T , A > ∈ S a i l o r s ∧ T > 7 {}|\in Sailors \land T>7 <I,N,T,A><I,N,T,A>∈SailorsT>7

Differences and Similarities between relational calculus and relational algebra

  1. differences
    1. relational algebra needs to specify the order of operations
    2. relational calculus only needs to indicate the logic condition the result must be fulfilled
  2. similarities:
    1. they are equivalent in terns of expression
    2. sql language can express any query that is expressible in relational algebra or relational calculus

User Interfaces and SQL Language

Content

  1. query language
    1. formal query language
    2. tabular query language
    3. graphic query language
    4. limited natural language query language
  2. interface and maintaining tolls
  3. APIs
  4. class library

Important terms and concepts

  1. base tabel
  2. view
  3. data type supported
  4. null
  5. unique
  6. default
  7. primary key
  8. foreign key
  9. check integration constrain

Conceptual evaluation strategy

semantics of an SQL query defined ni terms of the folowing conceptual evaluation strategy:

  1. Compute the cross-product of relation-list.
  2. Discard resulting tuples if they fail qualifications.
  3. Delete attributes that are not in target-list.
  4. If DISTINCT is specified, eliminate duplicate rows.

Levels of abstraction: ANSI-SPARC Architecture

【数据库】数据库笔记_第8张图片

  1. views describe how users see the data
  2. conceptual schema defines logical structure
  3. physical schema describes the files and indexes used

Query Language

Category

  1. Data definition language(DDL): used to define delete or alter data schema
  2. Query language(QL): used to retrieve data
  3. Data Manipulation Language(DML):used to insert,delete,update data
  4. Data control language(DCL): used to control users; access authority to data

Basic SQL query

【数据库】数据库笔记_第9张图片

  1. compute the cross-product of relation-list
  2. discard resulting tuples if they fail qualifications
  3. delete attributes that are not in target list
  4. if DISTINCT is specified, eliminate duplicate rows

Union

definition

UNION can be used to compute the union of any two union-compatible set of tuples

example
  1. question: find the sid of sailors who’ve reserved a red or a green boat

  2. solution1 use or condition:

    SELECT S.sid FROM Sailors S, Boat B, Reserves R
    WHERE S.sid=R.sid AND R.bid=B.bid And (B.color='red' OR B.color='green')
    
  3. solution2: use UNIUON:

    SELECT S.sid FROM Sailors S,Boat B, Reserves R
    WHERE S.sid=R.sidAND R.bid = B.bid And (B.color='red')
        UNION 
        (SELECT S.sid FROM Sailors S,Boat B, Reserves R WHERE S.sid=R.sid AND R.bid = B.bid And (B.color='green'))
    

Intersect

  1. question: find sid’s of sailors who’ve reserved a red and a green boat

  2. solution 1" use AND condition use or condition:

    SELECT S.sid FROM Sailors S,Boat B, Reserves R WHERE S.sid=R.sid AND R.bid = B.bid And (B.color='red' AND B.color='green')
    
  3. solution2: use INTERSECT :

    SELECT S.sid FROM Sailors S,Boat B, Reserves R WHERE S.sid=R.sid AND R.bid = B.bid And (B.color='red')
    INTERSECT
    SELECT S.sid FROM Sailors S,Boat B, Reserves R WHERE S.sid=R.sid AND R.bid = B.bid And (B.color='green')
    

Nested queries

  1. IN:

     SELECT S.sname FROM Sailor S WHERE S.sid IN (SELECT R.sid FROM Reserves R WHERE R.bid=103)
    
  2. EXISTS :

    SELECT S.sname FROM Sailors S WHERE EXISTS (SELECT * FROM Reserves R WHREE R.bid = 103 And S.sid=R.sid)
    

Division in SQL

  1. question: find sailors who’ve reserved all boats

  2. solution1: EXCEPT :

    SELECT S.sname
    FROM Sailors S
    WHERE NOT EXISTS(
        (SELECT B.bid FROM Boat B)
        EXCEPT
        (SELECT B.bid FROM Reserves R WHERE R.sid=S.sid)))
    
  3. solution2:

    SELECT S.sname
    FROM Sailors S
    WHERE NOT EXISTS
    	(SELECT B.bid FROM Boats B
    	WHERE NOT EXISTS
    		(SELECT R.bid
    		FROM Reserves R 
    		WHERE R.bid=B.bid And R.sid=S.sid))
    

Aggregate Operator

Aggregation Operators
  1. COUNT(*)
  2. COUNT([DISTINCT] A)
  3. SUM([DISTINCT] A)
  4. SUM([DISTINCT] A)
  5. AVG([DISTINCT] A)
  6. MAX(A)
  7. MIN(A)
Example
  1. find those ratings for which the average age is the minimum over all ratings aggregate operations cannot be nested

    -- wrong
    SELECT S.rating
    FROM Sailors S
    WHERE S.age = (SELECT MIN(AVG(S2.age)) FROM Sailors S2)
    
    -- right
    SELECT Temp.rating
    FROM (
    	SELECT S.rating, AVG(S.age) AS avgage
        FROM Sailors S
        GROUP BY S.rating
    ) AS Temp
    WHERE Temp.avgage = (SELECTMIN(Temp.avgage FROM Temp))
    

Grouping

【数据库】数据库笔记_第10张图片

  1. find age of the youngest sailor with age $\ge$18, for each rating with at least 2 such sailors

    SELECT S.rating, MIN(S.age) AS minage 
    FROM Sailors S
    WHERE S.age .= 18
    GROUP BY S.rating
    HAVING COUNT(*) > 1
    
  2. find age of the youngest sailor with age $\ge$18, for each rating with at least 2 such sailors and every sailor under 60 the every keyword

    SELECT S.rating, MIN(S.age) AS minage 
    FROM Sailors S
    WHERE S.age .= 18
    GROUP BY S.rating
    HAVING COUNT(*) > 1 AND EVERY (S.age <= 60)
    
  3. for each red boat, find the number of reservations for this boat grouping over a join of two relations

    SELECT B.bid, COUNT(*) AS scount
    FROM Boats B, Reserves R
    WHERE R.bid = B.bid AND B.color='red'
    GROUP BY B.bid
    
    -- use having to replace the where
    SELECT B.bid, COUNT(*) AS scount
    FROM Boats B, Reserves R
    WHERE R.bid = B.bid
    GROUP BY B.bid, B.color='red'
    HAVING B.color='red'
    
  4. find age of the youngest sailor with age > 18, for each rating with at least 2 sailors subquery in having

    SELECT S.rating, MIN(S.age)
    FROM Sailors S
    WHERE S.age>18
    GROUP BY S.rating
    HAVING 1<(
    	SELECT COUNT(*)
        FROM Sailors S2
        WHERE S2.rating = S.rating
    )
    

Cast Expression

  1. change the expression to the target data type
  2. valid target type
  3. use
    1. match function parameters
    2. change precision while calculating
    3. assign a data type to NULL value
-- Students(name, school)
-- Soldiers(name, service)

CREATE VIEW propects (name school service) AS
	SELECT name, school, CAST(NULL AS Varchar(20))
	FROM Students
UNION
	SELECT name, CAST(NULL AS Varchar(20)), service
	FROM Soldiers

Case Expression

SELECT name, CASE status 
				WHEN 1 THEN 'Active Duty'
				WHEN 2 THEN 'Reserve'
				WHEN 3 THEN 'Special Assignment'
				When 4 THEN 'Retired'
				ELSE 'Unknown'
			 END AS status
		FROM Officers

subquery

  1. scalar sub-query: the result of a sub-query is a single value. It can be used in the place where a value can occur
  2. table expression: the result of a sub-query is a table. It can be used in the place where a table can occur
  3. common table expression: in some complex query, a table expression may need occurring more than on time in the same SQL statements. in this case, we use key word WITH
Scalar Sub-query

definition: the result of a sub-query is a single value. It can be used in the place where a value can occur

  1. find the departments’ names whose average bonus is higher than average salary
SELECT d.deptname
FROM dept AS d
WHERE (SELECT avg(bonus) FROM emp WHERE deptno=d.deptno)
	>
	(SELECT avg(salary) FROM emp WHERE deptno=d.deptno)
  1. list the deptno, deptname, and the max salary of all departments located in New York
SELECT d.deptno, d.deptname, 
(SELECT MAX(salary) FROM emp WHERE deptno=d.deptno AS maxpay)
FROM dept AS d
WHERE d.location='New York'
Table Expression

definition: the result of a sub query is a table. It can be used in the place where a table can occur

  1. fin departments whose total payment is greater than 200000
SELECT deptno, totalpay
FROM
(SELECT deptno, SUM(salary)+SUM(bonus) AS totalpay FROM emp GROUP BY deptno) AS payroll
WHERE totalpay>200000
Common Table Expression

definition: in some complex query, a table expression may need occurring more than one time in the same SQL statements.

  1. find the department who has the highest total payment
WITH payroll (deptno, totalpay) AS
	(SELECT deptno, sum(salary)+sum(bonus) FROM emp GROUP BY deptno)
SELECT deptno
FROM payroll
WHERE totalpay = (SELECT MAX(totalpay) FROM payroll)

Outer Join

  1. the extension of join operation

  2. In join operation only matching tuples fulfilling join conditions are left in results; outer joins will keep unmated tuples, the vacant part is set NULL

  3. left outer join ( ∗ ⋈ ) (*\bowtie) () : keep all tuples of left relation in the result

  4. right outer join ( ⋈ ∗ ) (\bowtie*) (): keep all tuples of right relation in the result

  5. full outer join ( ∗ ⋈ ∗ ) (*\bowtie*) (): keep all tuples of left and right relations in the result

Recursion

​ If a common table expression uses itself in its definition, this is called recursion.

  1. find all employees under the management of Hoover and whose salary is more than 100000

    WITH agents (name, salary) AS
    	((SELECT name, salary FROM FedEmp WHERE manager='Hoover') 
    	UNION ALL
    	(SELECT f.name, f.salary FROM agents AS a, FedEmp AS f WHERE f.manager=a.name))
    SELECT name
    FROM agents
    WHERE salary>100000
    
  2. find how much rivets are used in one wing recursive caculation

    WITH wingpart(subpart, qty) AS
    	(SELECT subpart, qty FROM components WHERE part = 'wing')
    	UNION ALL
    	(SELECT c.subpart, w.qty*c.qty FROM wingpart w, components c WHERE w.subpart=c.part)
    SELECT sum(qtu) AS qty FROM wingpart WHERE subpart='rivet'
    
  3. find the lowest total cost route from SFO to JFK recursive search

    WITH trips (destination, route, nsegs, totalcost) AS
    	(SELECT destination, CAST(destination AS varchar(20)),1,cost FROM flights WHERE origin='SFO')
    	UNION ALL
    	(SELECT f.destination, CAST(t.route||','||f.destination AS varchar(20)), 
         t.nesgs+1,t.totalcost+f.cost FROM trips t, flights f WHERE t.destination=f.origin 
         AND f.destination !=  'SFO' AND f.origin!='JFK' AND t.nsegs <=3)
    SELECT route, totalcost FROM trips WHERE destination='JFK' AND totalcost = (SELECT min(totalcost) FROM trips WHERE destination='JFK')
    

Data Manipulation Language

  1. Insert: insert a tuple into a table
  2. Delete: delete tuples fulfill qualifications
  3. Update: update the attributes’ value of tuples fulfill qualifications

View in SQL

  1. general view
    1. virtual tables derived base tables
    2. Logical data independence
    3. security of data
    4. update problems of view
  2. temporary view and recursive query
    1. WIEH
    2. RECURSIVE

Embedded SQL

​ In order to access database in programs, and take further process to the query results, need to combine SQL and programming language

Usage of Embedded SQL in C

  1. begin with EXEC SQL, end with ;
  2. through host variables to transfer information between C and SQL. Host variables should be defined begin with EXEC SQL
  3. in SQL statements, should add ; before host variables to distinguish with SQL’s own variable or attributes’ name
  4. In host language such as C, host variables are used as general variables
  5. Can’t define host variables as Structure
  6. A special host variable SQLCA*(SQL Communication Area)* EXEC SQL INCLUDE SQLCA
  7. Use SQLCA.SQLCode to justify the state of result
  8. use indicator(short int) to teat NULL in host language

【数据库】数据库笔记_第11张图片

Example of host variables defining
EXEC SQL BEGIN DECLARE SECTION;
	char SNO[7];
	char GIVENSNO[7];
	char CNO[6];
	char GIVENCNO[6];
	float GRACDE;
	short GRQADEEI; /*indicator of GRADE*/
EXEC SQL END DECLARE 
// CONNECT
EXEC SQL CONNECT :uid IDENTIFIED BY :pwd;

// Execute DDL or DML Statements
EXEC SQL INSERT INTO SC(SNO,CNO,GRADE)
VALUES(:SNO, :CNO, :GRADE);

// Execute Query Statements
EXEC SQL SELECT GRADE
INTO :GRADE :GRADEI
FROM SC
WHERE SNO=:GIVENSNO AND
CNO=:GIVENCNO;
Cursor
// Define a cursor
EXEC SQL DECLARE  CURSOR FOR
SELECT …
FROM …
WHERE …
    
EXEC SQL OPEN 
// Fetch data from cursor
EXEC SQL FETCH 
INTO :hostvar1, :hostvar2, …;
// SQLCA.SQLCODE will return 100 when arriving the end of cursor
CLOSE CURSOR 
    
// an example of query with cursor
EXEC SQL DECLARE C1 CURSOR FOR
SELECT SNO, GRADE
FROM SC
WHERE CNO = :GIVENCNO;
EXEC SQL OPEN C1;
if (SQLCA.SQLCODE<0) exit(1);/* There is error in query*/
while (1) {
EXEC SQL FETCH C1 INTO :SNO, :GRADE:GRADEI
if (SQLCA.SQLCODE==100)break;
/* treat data fetched from cursor, omitted*/
}
EXEC SQL CLOSE C1;
Dynamic SQL
// dynamic SQL executed directly
EXEC SQL BEGIN DECLARE SECTION;
char sqlstring[200];
EXEC SQL END DECLARE SECTION;


char cond[150];
strcpy( sqlstring, ”DELETE FROM STUDENT WHERE ”);
printf(“ Enter search condition :”);
scanf(“%s”, cond);
strcat( sqlstring, cond);

EXEC SQL EXECUTE IMMEDIATE:sqlstring;
// Dynamic SQL with dynamic parameters
EXEC SQL BEGIN DECLARE SECTION;
char sqlstring[200];
int birth_year;
EXEC SQL END DECLARE SECTION;

strcpy( sqlstring, ”DELETE FROM STUDENT WHERE YEAR(BDATE) <= :y; ”);
printf(" Enter birth year for delete :");
scanf("%d", &birth_year);
EXEC SQL PREPARE purge FROM :sqlstring;
EXEC SQL EXECUTE purge USING :birth_year;
// Dynamic SQL for query
EXEC SQL BEGIN DECLARE SECTION;
char sqlstring[200];
char SNO[7];
float GRADE;
short GRADEI;
char GIVENCNO[6];
EXEC SQL END DECLARE SECTION;


char orderby[150];
strcpy( sqlstring, ”SELECT SNO,GRADE FROM SC WHERE CNO= :c”);
printf(“ Enter the ORDER BY clause :”);
scanf(“%s”, orderby);
strcat( sqlstring, orderby);
printf(“ Enter the course number :”);
scanf(“%s”, GIVENCNO);
EXEC SQL PREPARE query FROM :sqlstring;

EXEC SQL DECLARE grade_cursor CURSOR FOR query;
EXEC SQL OPEN grade_cursor USING :GIVENCNO;
if (SQLCA.SQLCODE<0) exit(1);/* There is error in query*/
while (1) {
EXEC SQL FETCH grade_cursorINTO :SNO, :GRADE:GRADEI
if (SQLCA.SQLCODE==100)break;
/* treat data fetched from cursor, omitted*/
∶
}
EXEC SQL CLOSE grade_cursor;
// Stored procedure
EXEC SQL
CREATE PROCEDURE drop_student
(IN student_no CHAR(7),
OUT message CHAR(30))
BEGIN ATOMIC
DELETE FROM STUDENT
WHERE SNO=student_no;
DELETE FROM SC
WHERE SNO=student_no;
SET message=student_no || ’droped’;
END;
EXEC SQL
∶
CALL drop_student(); /* call this stored procedure later*/

Database management System

【数据库】数据库笔记_第12张图片

DBMS process structure

  1. single process structure: compiled as a single .exe file
  2. multi process structure: one application process corresponding to one DBMS core process
  3. multi threads structure: only one DBMS process, every application process corresponding to a DBMS core thread

Database Access Management

Access types

  1. query all or most records of a file (>15%)
  2. query some sxpecial record
  3. query some records(<15%)
  4. scope query
  5. update

File Organization

  1. heap file: records stored according to their inserted order, and retrieved sequentially. This is the most basic and general form of file organization
  2. direct file: the record address is mapped through hash function according to some attribute’s value
  3. index file: index + heap file/cluster
  4. Grid structure file: suitable for multi attributes queries
  5. raw disk

Index Technique

  1. B+ Tree very common
  2. Clustering index common
  3. inverted file
  4. dynamic hashing
  5. grid structure file and partitioned hash function
  6. bitmap index used in data warehouse
  7. othres

why do we use B+ tree in DBMS:

​ In the B+ tree, keys are the indexes stored in the internal nodes and records are stored in the leaf nodes. In B tree, keys cannot be repeatedly stored, which means that there is no duplication of keys or records. In B+ tree, the leaf nodes are linked to each other to provide the sequential access. In the B tree, leaf nodes are not linked to each other

​ The B+ tree is a balanced binary search tree. B+ tree ensures that all leaf nodes remain at the same height. In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support random access as well as sequential access

Query optimization

​ “rewrite” the query statements submitted by user first, and then deciding the most efficient operation method and steps

  1. algebra optimization
  2. operation optimization

Equivalent Transform

  1. exchange rul of $\bowtie/ \times: E1 \times E2 \equiv E2\times E1 $
  2. combination rule of ⋈ / × : E 1 × ( E 2 × E 3 ) ≡ ( E 1 × E 2 ) × E 3 \bowtie/\times: E1 \times (E2\times E3) \equiv (E1\times E2) \times E3 /×:E1×(E2×E3)(E1×E2)×E3
  3. Cluster rule of Π : Π A 1 … A n ( Π B 1 … B m ) ≡ Π A 1 … A n w h e n A 1 … A n ⊂ B 1 … B m \Pi:\Pi_{A_1\dots A_n}(\Pi_{B_1\dots B_m})\equiv \Pi_{A_1\dots A_n} when A_1\dots A_n \sub B_1\dots B_m Π:ΠA1An(ΠB1Bm)ΠA1AnwhenA1AnB1Bm

Basic principle

  1. push down the unary operations as low as possible
  2. look for and combine the common sub-expression

operation optimization

  1. nested loop: scan inner loop relation for every tuple in outer loop relation one time
  2. merge scan: order the relation R and S on disk ahead
  3. using index or hash to look for mapping tuples
  4. hasing join

Recovery

Introduction

  1. reduce the likelihood of failures (prevent)
  2. recover from failures
    1. redundancy
    2. should inspect all possible failures

Periodical dumping

backup + log

  1. log: record of all changes on DB since the last backup was mad
    1. some transactions maybe half done: should undo
    2. some transaction have finished but the result have not been written: write thtm

Transaction

A transaction T is a finite sequence of actions on DB exhibiting the following effects(ACID):

  1. Atomic action: nothing or all
  2. Consistency preservation: consistency state of DB
  3. Isolation: concurrent transactions should run as if they are independent each other
  4. durability: the effects of a successfully completed transaciontare permanently reflected in DB
Commit rule and log ahead rule
  1. some relative structure

    1. Active Transaction List(ATL): 记录所有正在执行、尚未提交的TID
    2. Transaction Identifier(TID)
    3. Committed Transaction List(CTL): 记录所有已提交的事务标识符(TID)
    4. Before Image(BI), After Image(AI): 可以看成一个对文件
    5. Check Point(CP)
    6. Message Manager(MM)
  2. commit rule: ensure that the A.I.(After image) is written into the non-volatile memory before the transaction is committed so that even if a failure occurs after the transaction enters the commit stage, the recorded A.I. can still be used to redo and update, so as to ensure that the transaction meets the ACID principle

  3. log ahead rule: if the A.I. is directly written to the database before the transaction is committed, the corresponding B.I. must be written to the log before the transaction is committed so that undo can be done when a failure occurs before the transaction enters the commit stage, and the execution of the transaction meets the ACIS principle

  4. recover strategies:

    1. undo(undo(…)) = undo()
    2. redo(redo(…)) = redo()
  5. three type of update strategy

    1. first write
      1. AI->BD before commit
      2. TID->active list
      3. BI -> log
      4. AI -> DB
      5. TID -> commit list
      6. delete TID from active list 6,7 are commit procedure
    2. write after commit
      1. TID -> active list
      2. AI -> log
      3. TID -> commit list
      4. AI -> DB
      5. delete TID from active list
    3. AI -> DB concurrently with commit
      1. TID -> active list
      2. AI, BI -> log
      3. AI -> DB partially done
      4. TID -> commit list
      5. AI ->DB (complete)
      6. delete TID from active list

Concurrency Control

Introduction to Concurrency

  1. In multi users DBMS, permit multi transaction access the database concurrently
Why
  1. improving system utilization and response time
  2. different transaction may access to different parts of database
problem arise from concurrency
  1. lost update
  2. dirty read
  3. unrepeatable read
How to avoid problems caused by concurrency

Solution: concurrency control methods such as locking method and time stamp method can be used

Serialization: the criterion for concurrency consistency

definition:

  1. suppose { T 1 , T 2 … , T n } \{T_1,T_2\dots,T_n\} {T1,T2,Tn} is a set of transactions execution concurrently. If a schedule of { T 1 , T 2 … , T n } \{T_1,T_2\dots,T_n\} {T1,T2,Tn} produces the same effect on database as some serial execution of this set of transactions, then the schedule is serializable

Locking Protocol

Basic idea

​ Before a concurrent transaction operates on the same data object, it sends a request to the system to lock the operation object. After the transaction’s lock request is approved, it has certain control over the object. Before the transaction releases its lock, other transactions cannot obtain the lock request of the data object and operate on it, thus avoiding access conflict and ensuring the correct execution of concurrent transactions

​ After adopting the locking protocol, there may be problems such as live lock and deadlock, among which the problem must be solved is the deadlock problem caused by the circular waiting between transaction

Locking
  1. definition of two phase lock : In a transaction, if all locks precede all unlocks, then the transaction is called two phase transaction. This restriction is called two phase locking protocol
  2. definition of well form: In a transaction, if it first acquires a lock on the object before operating it, it is called well-formed
  3. definition of serializable: if S is any schedule of well formed and two phase transaction, then S is serializable
X-Lock
  1. only one type of lock, for both read and write
  2. two phase lock: all locks precede all unlocks
  3. well formed: acquire lock before operate it 8
S,X lock
  1. S lock: if read access is intended
  2. X lock: if update access is intended
SUX lock
  1. S lock: if read access is intended
  2. X lock: if update access is intended
  3. U lock: for an update access the transaction fitst acquires a U-lock and then promote it to X-lock.
conclusions
  1. well formed + 2PL: serializable
  2. well formed + PL + unlock update at EOT: seralizable and recoverable
  3. well formed and 2PL + holding all locks to EOT: strict two phase locking transaction

Dead Lock

Prevention
  1. timeout: if a transaction waits for some specified time then deadlock is assumed and the transaction should be abort
  2. detect deadlock by wait-for graph,if there is cycle in the graph, there is a dead lock
  3. requesting all locks at initial time of transaction
  4. request locks in a specified order of resource
  5. abort once conflicted
  6. transaction retry
    1. wait-die: T A T_A TA waits if it is older than T B T_B TB, otherwise it “dies”,and then retry with original timestamp
    2. wound-wait: T A T_A TA waits if it is younger thant T B T_B TB other it “wound” T B T_B TB, and T B T_B TB retry

The Security and Integrity in Database

Introduction

  1. lmain reason
    1. system failure
    2. inconsistency caused by concurrent access
    3. man-caused destruction
    4. the data inputted is incorrected, the updating t4ransaction didn;t obey the rule of consistency preservation;

Security of database

  1. protect databases not be accessed illegally
    1. view and query rewriting
    2. access control
    3. identification and authentication of users
    4. authorization
    5. role
    6. data encryption
    7. audit trail

Security of statistical Database

  1. In many situation, the statistical data is public while the detailed individual data is secret, but some detailed individual data can be derived from public statistical data
Tracker
  1. individual tracker: background: there is only one man who is a male and whose occupation is a programmer, then select from where sex is male and occupation is programmer the basic idea is use the static information to predict the individual information
  2. general tracker: basic the same as individual tracker

Integrity Constrains

Category
  1. static constrain: constrains to database state
    1. inherent constraints, such as 1NF
    2. implicit constraints: implied in data schema such as primary key constrain, foreign key constrain
    3. domain constrains: field values must be of right type
  2. dynamic constraints: constraints while database transferring from one state to another can be combined with trigger
Database Modification
  1. if a is foreign key in r 2 r_2 r2 which references to K1 in r 1 r_1 r1, then
    $$
    1. \Pi_\alpha \sub \Pi_{K_1}(r_1) \text{when referencing}\
    2. t_2[\alpha]\in\Pi_{K_1}(r_1) \text{when insert, also when update }\alpha\
    3. \sigma_\alpha=t1K_1 \text{when delete tuples include } \alpha \text{ or update}
      $$

Definition of Integrity Constrain

  1. indicated with procedure: let programs responsible for the checking of integrity constrain
  2. indicated with ASSERTION: defined with assertion specification, and checked by DBMS automatically
  3. indicated with CHECK clause in base table definition, and checked by DBMS automatically
-- check
CREATE TABLE Reserves (
    sname CHAR(10),
	bid INTEGER,
    day DATE
    PRIMARY KEY (bid, day),
    CONSTRAIN noInterlakeRes
    CHECK('Interlake'!=(SELECT B.bname FROM Boats B WHERE B.bid=bid))
)
-- constrains over multiple relations
-- this is wrong, because when Sailors is empty, the number of Boats tuples can be anything; there is no any constrain when inserting into Boats
-- but this is a good example
CREATE TABLE Sailors(
	sid INTEGER,
    sname CHAR(10),
    age REAL,
    PRIMARY KEY (sid),
    CHECK
    ((SELECT COUNT(S.sid) FROM Sailors S) + (SELECT COUNT(B.bid) FROM Boats B)<100)
)
-- the right solution: use assertion
CREATE ASSERTION smallClub
CHECK
((SELECT COUNT(S.sid) FROM Sailors S) + (SELECT COUNT(B.bid) FROM Boats B)<100)
Triggers
  1. definition: procedure that stars automatically if specified changes occur to the DBMS
  2. three parts
    1. event: active the trigger
    2. condition
    3. action
  3. Execution of rules
    1. immediate execution
    2. deferred execution
    3. decoupled or detached mode
    4. cascading trigger
-- 语法
CREATE TRIGGER <触发子名>
{BEFORE|AFTER} <触发事件>
ON <表名>
[REFERENCING <引用名>]
FOR EACH {ROW|STATEMENT}
WHEN <条件>
<动作>

<触发事件> = INSERT|DELETE|UPDATE[of<属性表>]
<引用名> = OLD[ROW] [AS] <旧元组名>
<引用名> = NEW[ROW] [AS] <新元组名>
<引用名> = OLD[TABLE] [AS] <旧表名>
<引用名> = NEW[TABLE] [AS] <新表名>

-- a creation example
CREATE TRIGGER insert_grade_check
AFTER INSERT ON enroll
REFERENCING NEW TABLE AS NE
FOR EACH STATEMENT
WHEN (EXISTS(SELECT * FROM NE WHERE grade<3.0))
INSERT
INTO failedcourse
SELECT * FROM NE
WHERE NE.grade<3.0

-- another much simplier example
create trigger insertRollback
before insert on accommodation
referencing new as A
for each statement 
when A.check-out-date = NULL
rollback

Database Design

Data Dependency and Normalization of Relational Schema

  1. some dependent relations exist between attributes
  2. function dependency: the most basic kind of data dependencies. The value of one or a group attributes can decide the value of other attributes
  3. Multi-valued dependency: the value of some attributes can decide a group of values of some other attributes
  4. Join Dependency: the constraint of lossless join decomposition

NF

1NF

​ attribute of a relation must be atomic

【数据库】数据库笔记_第13张图片

2NF

​ R$\in$1NF, and no partially function dependency exists between attributes

【数据库】数据库笔记_第14张图片

problem
  1. Insert abnormity: cannot insert the students’ information who have not selected course
  2. Delete abnormity: if a student unselect all courses, his basic information is also lost
  3. Hard to update: because of redundancy, is is hard to keep consistency when update

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VE2yAoKw-1645077600864)(figures/Database/image-20211212192247177.png)]

3NF

3 N F ∈ 2 N F 3NF\in 2NF 3NF2NF, and not transfer function dependency exists between attribute

【数据库】数据库笔记_第15张图片

problem
  1. insert abnormity: before the employees; salary level are decided, the correspondence between salary level and salary can not input
  2. delete abnormity: delete abnormity: if some salary level has only one man, the correspondence between sale level and salary of his level will be lost when the man is deleted
  3. hard to update: because of redundancy, it is hard to keep consistency when update

【数据库】数据库笔记_第16张图片

Database Design method

  1. ER Model and ER Diagram
  2. Procedure oriented method

basic the same as the procedure of software engineering

你可能感兴趣的:(学习笔记,数据库,database,数据库架构)