Introduction
Main content
- the data models
- Sql language and user interface
- Key principles of DBMS mainly architecture, query optimization, concurrency control, recovery
- the security and integrity constrains of databases
- Database design
- new research and applica
- tion fields
What is database
definition
A very large, integrated collection of data
Function
models real-world enterprise
- entities e.g. students and courses
- Relation e.g. electives
Why use a DBMS
- data independence and efficient access
- reduced application development time
- data integrity and security
- uniform data administration
- concurrent access, recovery from crashes
File vs. Database
- application must stage large datasets between main memory and secondary storage
- special code for different queries
- must protect data from inconsistence due to multiple concurrent users
- crash recovery
- security and access control
Concepts
- Data are symbols for describing the things of real world. They are existing from of information
- Data model is a collection of concepts and definitions for describing data
- a schema is a description of a particular collection of data, using a given data model.
- the relational model of data is the most widely used model today
- Relation: basically a table with rows and columns
- every relation has a schema, which describes the columns or fields
The ANSI-SPARC architecture
在这里插入图片描述
- many views: view describe how users see the data
- conceptual schema: conceptual schema defines logical structure
- Physical schema: physical schema describes the fields and indexes uses
What is DBMS
definition
A software package designed to store and manage databases
History of DBMS
- no data model
- simple file operation
- network data model
- hierarchy data model
- rational data model
database system
- database system = applications + DBMS + Database + DBA database administration
- DBMS is the core of database system
- high level user interface
- query processing and optimization
- catalog management
- concurrency control and recovery
- integrity constraints checking
- Access control
Data Models
Hierarchical Data Model
Basic idea
because many things in real world are organized in hierarchy, hierarchical model manages to describe real world in a tree structure
Basic concepts
Record
Field
PCR (parent-child-relationship)
the most basic data relationship in hierarchical model
Hierarchical data schema
- a hierarchical data schema consists of PCRs.
- Every PCR expresses one 1:N relationship
- every record type can only have one parent
Virtual record
use virtual record to represent some other situation that hierarchical model can’t represent
Network data model
Basic idea
- the basic data structure is “set”, it represent a 1:N relationship between thing in real world. “1” side is called owner, and “N” side is called member
- One record type can be the owner of multi sets, and also can be the owner of multi sets. Many sets form a network structure to express real world
- It breaks through the limit of hierarchical structure, so can express non-hierarchical data more easily
- record and data items: data items are similar as field in hierarchical model, but it can be a vector
- Link record type: used to represent self relationship, end to end relationship
Set
basic unit for network data schema
Link record type
use link record type to represent some relationship that it can’t represent directly
self relationship
end to end relationship
L represent LINK
Relation data model
basic idea
- the basic data structure is “table” or relation, The things and the relationships between them in real world are all expressed as tables, so it can be researched in strict mathematic methods. It raises the database technology to a theory height.
features
- based on set theory, high abstract level
- shield all lower details, simple and clear, easy to understand
- Can establish new algebra system relation algebra
- Non procedure query language SQL
- soft link the essential difference with former data models
Soft link
Some concepts
Attributes and domain
- the features of an entity in real world are expressed as attributes in relational model
- the value scope of an attribute is called its domain example: age is an positive integer and it can’t be larger than 1000
Relation and tuple
- an entity of real world can be expressed as one or more thant one relations
- a relation is a N-ary relationship defined on all of its attribute domain R = ( A 1 , A 2 … A n ) R=(A_1, A_2 \dots A_n) R=(A1,A2…An)
- This is called the schema of T, and n is the number of attributes, called the degree of R.
Primary key
- a set of attributes is a candidate key for a relation if
- no two distinct tuples can have same values in this set of attributes
- this is not true for any subset of this set of attributes id is unique, and id+name is also uneque
- super key id is candidate key , “id+name” is a super key
- if there’s more than 1 key for a relation, one of the keys is chosen to be the primary key, and the others are called alternate key.
- if the primary key consists of all attributes of a relation, it’s called all key
- the key can decide a tuple uniquely sid is a key for students, and the set {sid , gpa} is a super key
Foreign key
Set of attributes in one relation that is used to “refer” to a tuple in another rational like a rational pointer
ER Data Model
- entity(E): Real-world object distinguishable from other objects. An entities is described using a set of attriburte
- entity set:a collection of similar entities
- all entities in an entity set have the same set fo attribute
- each entity has a key
- each attribute has a domain
- permit combined or multi-valued attribute
- relationship®: Association among two or more entities
- relationship can have attributes
- relationship set: Collection of similar relationships
Object-Oriented Data Model
Relational algebra
Basic operations
- section( σ \sigma σ):Select a subset of row from relation
- projection ( π \pi π):Deletes unwanted columns from relation
- cross-product ( × \times ×) allows us to combine two relations
- set-differences(-) Tuple in reln.1 but not in reln2
- Union ( ∪ \cup ∪): Tuples in reln.1 and in reln.2
{ σ , π , ∪ − × \sigma,\pi,\cup -\times σ,π,∪−×} is a complete operation set
the algebra is “closed”
Other operations
- condition join ( σ C ( R × S ) \sigma_C(R\times S) σC(R×S))
- division / ≡ ∃ ( x , y ) ∈ A ∀ y ∈ B ≡ π x ( A ) − π x ( ( π x ( A ) × B ) − A ) /\equiv \exists(x,y) \in A \forall y\in B \equiv\ \pi_x(A)-\pi_x((\pi_x(A)\times B)-A) /≡∃(x,y)∈A∀y∈B≡ πx(A)−πx((πx(A)×B)−A)
- outer union ∪ ‾ \underline{\cup} ∪ : the values of attributes which don’t exist in original tuples are filled as NULL
Relational Calculus
calculus needs to describe the procedures but algebra doesn’tr
- Two flavors:
- tuple relational calculus: variables range over tuples
- domain relational calculus: variables range over domain elements
Tuple relational calculus
Example:
Query has the form:
t < a t t r i b u t e l i s t > ∣ P ( t ) {t{}|P(t)} t<attribute list>∣P(t)
t is called tuple variable
Answer includes all tuples t that make the formula P(t be true)
Example: find all sailors’ name whose rating above 7 and younger than 50
t [ N ] ∣ t ∈ S a i l o r s ∧ t . T > 7 ∧ t . A < 50 {t[N]|t\in Sailors \land t.T>7 \land t.A<50} t[N]∣t∈Sailors∧t.T>7∧t.A<50
Domain relational calculus
- Example:
- Query has the from: { < x 1 , x 1 … x n > ∣ P ( x 1 , x 1 … x n … x n + m ) } \{|P(x_1,x_1\dots x_n\dots x_{n+m})\} {<x1,x1…xn>∣P(x1,x1…xn…xn+m)}
- x 1 , x 1 … x n x_1,x_1\dots x_n x1,x1…xn are called domain variables, x 1 , x 1 … x n x_1,x_1\dots x_n x1,x1…xn appear in result
- answer include all tuples < x 1 , x 1 … x n > <x1,x1…xn> that make the formula P ( x 1 , x 1 … x n … x n + m ) P(x_1,x_1\dots x_n\dots x_{n+m}) P(x1,x1…xn…xn+m) be true
- formula is recursively defined, starting with simple atomic formulas and building bigger and better formulas using the logical connectives
Formula
atomic formula
- a formula with atomic operation
- < x 1 , x 1 … x n > ∈ R n a m e \in Rname <x1,x1…xn>∈Rname, or X op Y or X op constant op is one of > < = ≤ ≥ ≠ > < = \le \ge \ne ><=≤≥=
Definition
- atomic formula
- or ¬ p , p ∧ q , p ∨ q \lnot p, p\land q, p\lor q ¬p,p∧q,p∨q where p and q are formulas
- ∃ X ( P ( X ) ) o r ∀ X ( P ( X ) ) \exists X(P(X)) or \forall X(P(X)) ∃X(P(X))or∀X(P(X)) where X X X is free in P ( X ) P(X) P(X)if use quntifier to X, then X is bounded. if X is not bounded then X is free
queries that have infinite number of answers are called unsafe
example: find all sailors with a rating above 7
< I , N , T , A > ∣ < I , N , T , A > ∈ S a i l o r s ∧ T > 7 {}|\in Sailors \land T>7 <I,N,T,A>∣<I,N,T,A>∈Sailors∧T>7
Differences and Similarities between relational calculus and relational algebra
- differences
- relational algebra needs to specify the order of operations
- relational calculus only needs to indicate the logic condition the result must be fulfilled
- similarities:
- they are equivalent in terns of expression
- sql language can express any query that is expressible in relational algebra or relational calculus
User Interfaces and SQL Language
Content
- query language
- formal query language
- tabular query language
- graphic query language
- limited natural language query language
- interface and maintaining tolls
- APIs
- class library
Important terms and concepts
- base tabel
- view
- data type supported
- null
- unique
- default
- primary key
- foreign key
- check integration constrain
Conceptual evaluation strategy
semantics of an SQL query defined ni terms of the folowing conceptual evaluation strategy:
- Compute the cross-product of relation-list.
- Discard resulting tuples if they fail qualifications.
- Delete attributes that are not in target-list.
- If DISTINCT is specified, eliminate duplicate rows.
Levels of abstraction: ANSI-SPARC Architecture
- views describe how users see the data
- conceptual schema defines logical structure
- physical schema describes the files and indexes used
Query Language
Category
- Data definition language(DDL): used to define delete or alter data schema
- Query language(QL): used to retrieve data
- Data Manipulation Language(DML):used to insert,delete,update data
- Data control language(DCL): used to control users; access authority to data
Basic SQL query
- compute the cross-product of relation-list
- discard resulting tuples if they fail qualifications
- delete attributes that are not in target list
- if DISTINCT is specified, eliminate duplicate rows
Union
definition
UNION can be used to compute the union of any two union-compatible set of tuples
example
-
question: find the sid of sailors who’ve reserved a red or a green boat
-
solution1 use or condition:
SELECT S.sid FROM Sailors S, Boat B, Reserves R
WHERE S.sid=R.sid AND R.bid=B.bid And (B.color='red' OR B.color='green')
-
solution2: use UNIUON:
SELECT S.sid FROM Sailors S,Boat B, Reserves R
WHERE S.sid=R.sidAND R.bid = B.bid And (B.color='red')
UNION
(SELECT S.sid FROM Sailors S,Boat B, Reserves R WHERE S.sid=R.sid AND R.bid = B.bid And (B.color='green'))
Intersect
-
question: find sid’s of sailors who’ve reserved a red and a green boat
-
solution 1" use AND condition use or condition:
SELECT S.sid FROM Sailors S,Boat B, Reserves R WHERE S.sid=R.sid AND R.bid = B.bid And (B.color='red' AND B.color='green')
-
solution2: use INTERSECT :
SELECT S.sid FROM Sailors S,Boat B, Reserves R WHERE S.sid=R.sid AND R.bid = B.bid And (B.color='red')
INTERSECT
SELECT S.sid FROM Sailors S,Boat B, Reserves R WHERE S.sid=R.sid AND R.bid = B.bid And (B.color='green')
Nested queries
-
IN:
SELECT S.sname FROM Sailor S WHERE S.sid IN (SELECT R.sid FROM Reserves R WHERE R.bid=103)
-
EXISTS :
SELECT S.sname FROM Sailors S WHERE EXISTS (SELECT * FROM Reserves R WHREE R.bid = 103 And S.sid=R.sid)
Division in SQL
-
question: find sailors who’ve reserved all boats
-
solution1: EXCEPT :
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS(
(SELECT B.bid FROM Boat B)
EXCEPT
(SELECT B.bid FROM Reserves R WHERE R.sid=S.sid)))
-
solution2:
SELECT S.sname
FROM Sailors S
WHERE NOT EXISTS
(SELECT B.bid FROM Boats B
WHERE NOT EXISTS
(SELECT R.bid
FROM Reserves R
WHERE R.bid=B.bid And R.sid=S.sid))
Aggregate Operator
Aggregation Operators
- COUNT(*)
- COUNT([DISTINCT] A)
- SUM([DISTINCT] A)
- SUM([DISTINCT] A)
- AVG([DISTINCT] A)
- MAX(A)
- MIN(A)
Example
-
find those ratings for which the average age is the minimum over all ratings aggregate operations cannot be nested
SELECT S.rating
FROM Sailors S
WHERE S.age = (SELECT MIN(AVG(S2.age)) FROM Sailors S2)
SELECT Temp.rating
FROM (
SELECT S.rating, AVG(S.age) AS avgage
FROM Sailors S
GROUP BY S.rating
) AS Temp
WHERE Temp.avgage = (SELECTMIN(Temp.avgage FROM Temp))
Grouping
-
find age of the youngest sailor with age $\ge$18, for each rating with at least 2 such sailors
SELECT S.rating, MIN(S.age) AS minage
FROM Sailors S
WHERE S.age .= 18
GROUP BY S.rating
HAVING COUNT(*) > 1
-
find age of the youngest sailor with age $\ge$18, for each rating with at least 2 such sailors and every sailor under 60 the every keyword
SELECT S.rating, MIN(S.age) AS minage
FROM Sailors S
WHERE S.age .= 18
GROUP BY S.rating
HAVING COUNT(*) > 1 AND EVERY (S.age <= 60)
-
for each red boat, find the number of reservations for this boat grouping over a join of two relations
SELECT B.bid, COUNT(*) AS scount
FROM Boats B, Reserves R
WHERE R.bid = B.bid AND B.color='red'
GROUP BY B.bid
SELECT B.bid, COUNT(*) AS scount
FROM Boats B, Reserves R
WHERE R.bid = B.bid
GROUP BY B.bid, B.color='red'
HAVING B.color='red'
-
find age of the youngest sailor with age > 18, for each rating with at least 2 sailors subquery in having
SELECT S.rating, MIN(S.age)
FROM Sailors S
WHERE S.age>18
GROUP BY S.rating
HAVING 1<(
SELECT COUNT(*)
FROM Sailors S2
WHERE S2.rating = S.rating
)
Cast Expression
- change the expression to the target data type
- valid target type
- use
- match function parameters
- change precision while calculating
- assign a data type to NULL value
CREATE VIEW propects (name school service) AS
SELECT name, school, CAST(NULL AS Varchar(20))
FROM Students
UNION
SELECT name, CAST(NULL AS Varchar(20)), service
FROM Soldiers
Case Expression
SELECT name, CASE status
WHEN 1 THEN 'Active Duty'
WHEN 2 THEN 'Reserve'
WHEN 3 THEN 'Special Assignment'
When 4 THEN 'Retired'
ELSE 'Unknown'
END AS status
FROM Officers
subquery
- scalar sub-query: the result of a sub-query is a single value. It can be used in the place where a value can occur
- table expression: the result of a sub-query is a table. It can be used in the place where a table can occur
- common table expression: in some complex query, a table expression may need occurring more than on time in the same SQL statements. in this case, we use key word WITH
Scalar Sub-query
definition: the result of a sub-query is a single value. It can be used in the place where a value can occur
- find the departments’ names whose average bonus is higher than average salary
SELECT d.deptname
FROM dept AS d
WHERE (SELECT avg(bonus) FROM emp WHERE deptno=d.deptno)
>
(SELECT avg(salary) FROM emp WHERE deptno=d.deptno)
- list the deptno, deptname, and the max salary of all departments located in New York
SELECT d.deptno, d.deptname,
(SELECT MAX(salary) FROM emp WHERE deptno=d.deptno AS maxpay)
FROM dept AS d
WHERE d.location='New York'
Table Expression
definition: the result of a sub query is a table. It can be used in the place where a table can occur
- fin departments whose total payment is greater than 200000
SELECT deptno, totalpay
FROM
(SELECT deptno, SUM(salary)+SUM(bonus) AS totalpay FROM emp GROUP BY deptno) AS payroll
WHERE totalpay>200000
Common Table Expression
definition: in some complex query, a table expression may need occurring more than one time in the same SQL statements.
- find the department who has the highest total payment
WITH payroll (deptno, totalpay) AS
(SELECT deptno, sum(salary)+sum(bonus) FROM emp GROUP BY deptno)
SELECT deptno
FROM payroll
WHERE totalpay = (SELECT MAX(totalpay) FROM payroll)
Outer Join
-
the extension of join operation
-
In join operation only matching tuples fulfilling join conditions are left in results; outer joins will keep unmated tuples, the vacant part is set NULL
-
left outer join ( ∗ ⋈ ) (*\bowtie) (∗⋈) : keep all tuples of left relation in the result
-
right outer join ( ⋈ ∗ ) (\bowtie*) (⋈∗): keep all tuples of right relation in the result
-
full outer join ( ∗ ⋈ ∗ ) (*\bowtie*) (∗⋈∗): keep all tuples of left and right relations in the result
Recursion
If a common table expression uses itself in its definition, this is called recursion.
-
find all employees under the management of Hoover and whose salary is more than 100000
WITH agents (name, salary) AS
((SELECT name, salary FROM FedEmp WHERE manager='Hoover')
UNION ALL
(SELECT f.name, f.salary FROM agents AS a, FedEmp AS f WHERE f.manager=a.name))
SELECT name
FROM agents
WHERE salary>100000
-
find how much rivets are used in one wing recursive caculation
WITH wingpart(subpart, qty) AS
(SELECT subpart, qty FROM components WHERE part = 'wing')
UNION ALL
(SELECT c.subpart, w.qty*c.qty FROM wingpart w, components c WHERE w.subpart=c.part)
SELECT sum(qtu) AS qty FROM wingpart WHERE subpart='rivet'
-
find the lowest total cost route from SFO to JFK recursive search
WITH trips (destination, route, nsegs, totalcost) AS
(SELECT destination, CAST(destination AS varchar(20)),1,cost FROM flights WHERE origin='SFO')
UNION ALL
(SELECT f.destination, CAST(t.route||','||f.destination AS varchar(20)),
t.nesgs+1,t.totalcost+f.cost FROM trips t, flights f WHERE t.destination=f.origin
AND f.destination != 'SFO' AND f.origin!='JFK' AND t.nsegs <=3)
SELECT route, totalcost FROM trips WHERE destination='JFK' AND totalcost = (SELECT min(totalcost) FROM trips WHERE destination='JFK')
Data Manipulation Language
- Insert: insert a tuple into a table
- Delete: delete tuples fulfill qualifications
- Update: update the attributes’ value of tuples fulfill qualifications
View in SQL
- general view
- virtual tables derived base tables
- Logical data independence
- security of data
- update problems of view
- temporary view and recursive query
- WIEH
- RECURSIVE
Embedded SQL
In order to access database in programs, and take further process to the query results, need to combine SQL and programming language
Usage of Embedded SQL in C
- begin with EXEC SQL, end with ;
- through host variables to transfer information between C and SQL. Host variables should be defined begin with EXEC SQL
- in SQL statements, should add ; before host variables to distinguish with SQL’s own variable or attributes’ name
- In host language such as C, host variables are used as general variables
- Can’t define host variables as Structure
- A special host variable SQLCA*(SQL Communication Area)* EXEC SQL INCLUDE SQLCA
- Use SQLCA.SQLCode to justify the state of result
- use indicator(short int) to teat NULL in host language
Example of host variables defining
EXEC SQL BEGIN DECLARE SECTION;
char SNO[7];
char GIVENSNO[7];
char CNO[6];
char GIVENCNO[6];
float GRACDE;
short GRQADEEI;
EXEC SQL END DECLARE
// CONNECT
EXEC SQL CONNECT :uid IDENTIFIED BY :pwd;
// Execute DDL or DML Statements
EXEC SQL INSERT INTO SC(SNO,CNO,GRADE)
VALUES(:SNO, :CNO, :GRADE);
// Execute Query Statements
EXEC SQL SELECT GRADE
INTO :GRADE :GRADEI
FROM SC
WHERE SNO=:GIVENSNO AND
CNO=:GIVENCNO;
Cursor
// Define a cursor
EXEC SQL DECLARE CURSOR FOR
SELECT …
FROM …
WHERE …
EXEC SQL OPEN
// Fetch data from cursor
EXEC SQL FETCH
INTO :hostvar1, :hostvar2, …;
// SQLCA.SQLCODE will return 100 when arriving the end of cursor
CLOSE CURSOR
// an example of query with cursor
EXEC SQL DECLARE C1 CURSOR FOR
SELECT SNO, GRADE
FROM SC
WHERE CNO = :GIVENCNO;
EXEC SQL OPEN C1;
if (SQLCA.SQLCODE<0) exit(1);/* There is error in query*/
while (1) {
EXEC SQL FETCH C1 INTO :SNO, :GRADE:GRADEI
if (SQLCA.SQLCODE==100)break;
/* treat data fetched from cursor, omitted*/
}
EXEC SQL CLOSE C1;
Dynamic SQL
// dynamic SQL executed directly
EXEC SQL BEGIN DECLARE SECTION;
char sqlstring[200];
EXEC SQL END DECLARE SECTION;
char cond[150];
strcpy( sqlstring, ”DELETE FROM STUDENT WHERE ”);
printf(“ Enter search condition :”);
scanf(“%s”, cond);
strcat( sqlstring, cond);
EXEC SQL EXECUTE IMMEDIATE:sqlstring;
// Dynamic SQL with dynamic parameters
EXEC SQL BEGIN DECLARE SECTION;
char sqlstring[200];
int birth_year;
EXEC SQL END DECLARE SECTION;
strcpy( sqlstring, ”DELETE FROM STUDENT WHERE YEAR(BDATE) <= :y; ”);
printf(" Enter birth year for delete :");
scanf("%d", &birth_year);
EXEC SQL PREPARE purge FROM :sqlstring;
EXEC SQL EXECUTE purge USING :birth_year;
// Dynamic SQL for query
EXEC SQL BEGIN DECLARE SECTION;
char sqlstring[200];
char SNO[7];
float GRADE;
short GRADEI;
char GIVENCNO[6];
EXEC SQL END DECLARE SECTION;
char orderby[150];
strcpy( sqlstring, ”SELECT SNO,GRADE FROM SC WHERE CNO= :c”);
printf(“ Enter the ORDER BY clause :”);
scanf(“%s”, orderby);
strcat( sqlstring, orderby);
printf(“ Enter the course number :”);
scanf(“%s”, GIVENCNO);
EXEC SQL PREPARE query FROM :sqlstring;
EXEC SQL DECLARE grade_cursor CURSOR FOR query;
EXEC SQL OPEN grade_cursor USING :GIVENCNO;
if (SQLCA.SQLCODE<0) exit(1);/* There is error in query*/
while (1) {
EXEC SQL FETCH grade_cursorINTO :SNO, :GRADE:GRADEI
if (SQLCA.SQLCODE==100)break;
/* treat data fetched from cursor, omitted*/
∶
}
EXEC SQL CLOSE grade_cursor;
EXEC SQL
CREATE PROCEDURE drop_student
(IN student_no CHAR(7),
OUT message CHAR(30))
BEGIN ATOMIC
DELETE FROM STUDENT
WHERE SNO=student_no;
DELETE FROM SC
WHERE SNO=student_no;
SET message=student_no || ’droped’;
END;
EXEC SQL
∶
CALL drop_student(…);
Database management System
DBMS process structure
- single process structure: compiled as a single .exe file
- multi process structure: one application process corresponding to one DBMS core process
- multi threads structure: only one DBMS process, every application process corresponding to a DBMS core thread
Database Access Management
Access types
- query all or most records of a file (>15%)
- query some sxpecial record
- query some records(<15%)
- scope query
- update
File Organization
- heap file: records stored according to their inserted order, and retrieved sequentially. This is the most basic and general form of file organization
- direct file: the record address is mapped through hash function according to some attribute’s value
- index file: index + heap file/cluster
- Grid structure file: suitable for multi attributes queries
- raw disk
Index Technique
- B+ Tree very common
- Clustering index common
- inverted file
- dynamic hashing
- grid structure file and partitioned hash function
- bitmap index used in data warehouse
- othres
why do we use B+ tree in DBMS:
In the B+ tree, keys are the indexes stored in the internal nodes and records are stored in the leaf nodes. In B tree, keys cannot be repeatedly stored, which means that there is no duplication of keys or records. In B+ tree, the leaf nodes are linked to each other to provide the sequential access. In the B tree, leaf nodes are not linked to each other
The B+ tree is a balanced binary search tree. B+ tree ensures that all leaf nodes remain at the same height. In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support random access as well as sequential access
Query optimization
“rewrite” the query statements submitted by user first, and then deciding the most efficient operation method and steps
- algebra optimization
- operation optimization
Equivalent Transform
- exchange rul of $\bowtie/ \times: E1 \times E2 \equiv E2\times E1 $
- combination rule of ⋈ / × : E 1 × ( E 2 × E 3 ) ≡ ( E 1 × E 2 ) × E 3 \bowtie/\times: E1 \times (E2\times E3) \equiv (E1\times E2) \times E3 ⋈/×:E1×(E2×E3)≡(E1×E2)×E3
- Cluster rule of Π : Π A 1 … A n ( Π B 1 … B m ) ≡ Π A 1 … A n w h e n A 1 … A n ⊂ B 1 … B m \Pi:\Pi_{A_1\dots A_n}(\Pi_{B_1\dots B_m})\equiv \Pi_{A_1\dots A_n} when A_1\dots A_n \sub B_1\dots B_m Π:ΠA1…An(ΠB1…Bm)≡ΠA1…AnwhenA1…An⊂B1…Bm
- …
Basic principle
- push down the unary operations as low as possible
- look for and combine the common sub-expression
operation optimization
- nested loop: scan inner loop relation for every tuple in outer loop relation one time
- merge scan: order the relation R and S on disk ahead
- using index or hash to look for mapping tuples
- hasing join
Recovery
Introduction
- reduce the likelihood of failures (prevent)
- recover from failures
- redundancy
- should inspect all possible failures
Periodical dumping
backup + log
- log: record of all changes on DB since the last backup was mad
- some transactions maybe half done: should undo
- some transaction have finished but the result have not been written: write thtm
Transaction
A transaction T is a finite sequence of actions on DB exhibiting the following effects(ACID):
- Atomic action: nothing or all
- Consistency preservation: consistency state of DB
- Isolation: concurrent transactions should run as if they are independent each other
- durability: the effects of a successfully completed transaciontare permanently reflected in DB
Commit rule and log ahead rule
-
some relative structure
- Active Transaction List(ATL): 记录所有正在执行、尚未提交的TID
- Transaction Identifier(TID)
- Committed Transaction List(CTL): 记录所有已提交的事务标识符(TID)
- Before Image(BI), After Image(AI): 可以看成一个对文件
- Check Point(CP)
- Message Manager(MM)
-
commit rule: ensure that the A.I.(After image) is written into the non-volatile memory before the transaction is committed so that even if a failure occurs after the transaction enters the commit stage, the recorded A.I. can still be used to redo and update, so as to ensure that the transaction meets the ACID principle
-
log ahead rule: if the A.I. is directly written to the database before the transaction is committed, the corresponding B.I. must be written to the log before the transaction is committed so that undo can be done when a failure occurs before the transaction enters the commit stage, and the execution of the transaction meets the ACIS principle
-
recover strategies:
- undo(undo(…)) = undo()
- redo(redo(…)) = redo()
-
three type of update strategy
- first write
- AI->BD before commit
- TID->active list
- BI -> log
- AI -> DB
- …
- TID -> commit list
- delete TID from active list 6,7 are commit procedure
- write after commit
- TID -> active list
- AI -> log
- …
- TID -> commit list
- AI -> DB
- …
- delete TID from active list
- AI -> DB concurrently with commit
- TID -> active list
- AI, BI -> log
- AI -> DB partially done
- …
- TID -> commit list
- AI ->DB (complete)
- delete TID from active list
Concurrency Control
Introduction to Concurrency
- In multi users DBMS, permit multi transaction access the database concurrently
Why
- improving system utilization and response time
- different transaction may access to different parts of database
problem arise from concurrency
- lost update
- dirty read
- unrepeatable read
How to avoid problems caused by concurrency
Solution: concurrency control methods such as locking method and time stamp method can be used
Serialization: the criterion for concurrency consistency
definition:
- suppose { T 1 , T 2 … , T n } \{T_1,T_2\dots,T_n\} {T1,T2…,Tn} is a set of transactions execution concurrently. If a schedule of { T 1 , T 2 … , T n } \{T_1,T_2\dots,T_n\} {T1,T2…,Tn} produces the same effect on database as some serial execution of this set of transactions, then the schedule is serializable
Locking Protocol
Basic idea
Before a concurrent transaction operates on the same data object, it sends a request to the system to lock the operation object. After the transaction’s lock request is approved, it has certain control over the object. Before the transaction releases its lock, other transactions cannot obtain the lock request of the data object and operate on it, thus avoiding access conflict and ensuring the correct execution of concurrent transactions
After adopting the locking protocol, there may be problems such as live lock and deadlock, among which the problem must be solved is the deadlock problem caused by the circular waiting between transaction
Locking
- definition of two phase lock : In a transaction, if all locks precede all unlocks, then the transaction is called two phase transaction. This restriction is called two phase locking protocol
- definition of well form: In a transaction, if it first acquires a lock on the object before operating it, it is called well-formed
- definition of serializable: if S is any schedule of well formed and two phase transaction, then S is serializable
X-Lock
- only one type of lock, for both read and write
- two phase lock: all locks precede all unlocks
- well formed: acquire lock before operate it 8
S,X lock
- S lock: if read access is intended
- X lock: if update access is intended
SUX lock
- S lock: if read access is intended
- X lock: if update access is intended
- U lock: for an update access the transaction fitst acquires a U-lock and then promote it to X-lock.
conclusions
- well formed + 2PL: serializable
- well formed + PL + unlock update at EOT: seralizable and recoverable
- well formed and 2PL + holding all locks to EOT: strict two phase locking transaction
Dead Lock
Prevention
- timeout: if a transaction waits for some specified time then deadlock is assumed and the transaction should be abort
- detect deadlock by wait-for graph,if there is cycle in the graph, there is a dead lock
- requesting all locks at initial time of transaction
- request locks in a specified order of resource
- abort once conflicted
- transaction retry
- wait-die: T A T_A TA waits if it is older than T B T_B TB, otherwise it “dies”,and then retry with original timestamp
- wound-wait: T A T_A TA waits if it is younger thant T B T_B TB other it “wound” T B T_B TB, and T B T_B TB retry
The Security and Integrity in Database
Introduction
- lmain reason
- system failure
- inconsistency caused by concurrent access
- man-caused destruction
- the data inputted is incorrected, the updating t4ransaction didn;t obey the rule of consistency preservation;
Security of database
- protect databases not be accessed illegally
- view and query rewriting
- access control
- identification and authentication of users
- authorization
- role
- data encryption
- audit trail
Security of statistical Database
- In many situation, the statistical data is public while the detailed individual data is secret, but some detailed individual data can be derived from public statistical data
Tracker
- individual tracker: background: there is only one man who is a male and whose occupation is a programmer, then select from where sex is male and occupation is programmer the basic idea is use the static information to predict the individual information
- general tracker: basic the same as individual tracker
Integrity Constrains
Category
- static constrain: constrains to database state
- inherent constraints, such as 1NF
- implicit constraints: implied in data schema such as primary key constrain, foreign key constrain
- domain constrains: field values must be of right type
- dynamic constraints: constraints while database transferring from one state to another can be combined with trigger
Database Modification
- if a is foreign key in r 2 r_2 r2 which references to K1 in r 1 r_1 r1, then
$$
- \Pi_\alpha \sub \Pi_{K_1}(r_1) \text{when referencing}\
- t_2[\alpha]\in\Pi_{K_1}(r_1) \text{when insert, also when update }\alpha\
- \sigma_\alpha=t1K_1 \text{when delete tuples include } \alpha \text{ or update}
$$
Definition of Integrity Constrain
- indicated with procedure: let programs responsible for the checking of integrity constrain
- indicated with ASSERTION: defined with assertion specification, and checked by DBMS automatically
- indicated with CHECK clause in base table definition, and checked by DBMS automatically
CREATE TABLE Reserves (
sname CHAR(10),
bid INTEGER,
day DATE
PRIMARY KEY (bid, day),
CONSTRAIN noInterlakeRes
CHECK('Interlake'!=(SELECT B.bname FROM Boats B WHERE B.bid=bid))
)
CREATE TABLE Sailors(
sid INTEGER,
sname CHAR(10),
age REAL,
PRIMARY KEY (sid),
CHECK
((SELECT COUNT(S.sid) FROM Sailors S) + (SELECT COUNT(B.bid) FROM Boats B)<100)
)
CREATE ASSERTION smallClub
CHECK
((SELECT COUNT(S.sid) FROM Sailors S) + (SELECT COUNT(B.bid) FROM Boats B)<100)
Triggers
- definition: procedure that stars automatically if specified changes occur to the DBMS
- three parts
- event: active the trigger
- condition
- action
- Execution of rules
- immediate execution
- deferred execution
- decoupled or detached mode
- cascading trigger
CREATE TRIGGER <触发子名>
{BEFORE|AFTER} <触发事件>
ON <表名>
[REFERENCING <引用名>]
FOR EACH {ROW|STATEMENT}
WHEN <条件>
<动作>
<触发事件> = INSERT|DELETE|UPDATE[of<属性表>]
<引用名> = OLD[ROW] [AS] <旧元组名>
<引用名> = NEW[ROW] [AS] <新元组名>
<引用名> = OLD[TABLE] [AS] <旧表名>
<引用名> = NEW[TABLE] [AS] <新表名>
CREATE TRIGGER insert_grade_check
AFTER INSERT ON enroll
REFERENCING NEW TABLE AS NE
FOR EACH STATEMENT
WHEN (EXISTS(SELECT * FROM NE WHERE grade<3.0))
INSERT
INTO failedcourse
SELECT * FROM NE
WHERE NE.grade<3.0
create trigger insertRollback
before insert on accommodation
referencing new as A
for each statement
when A.check-out-date = NULL
rollback
Database Design
Data Dependency and Normalization of Relational Schema
- some dependent relations exist between attributes
- function dependency: the most basic kind of data dependencies. The value of one or a group attributes can decide the value of other attributes
- Multi-valued dependency: the value of some attributes can decide a group of values of some other attributes
- Join Dependency: the constraint of lossless join decomposition
NF
1NF
attribute of a relation must be atomic
2NF
R$\in$1NF, and no partially function dependency exists between attributes
problem
- Insert abnormity: cannot insert the students’ information who have not selected course
- Delete abnormity: if a student unselect all courses, his basic information is also lost
- Hard to update: because of redundancy, is is hard to keep consistency when update
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-VE2yAoKw-1645077600864)(figures/Database/image-20211212192247177.png)]
3NF
3 N F ∈ 2 N F 3NF\in 2NF 3NF∈2NF, and not transfer function dependency exists between attribute
problem
- insert abnormity: before the employees; salary level are decided, the correspondence between salary level and salary can not input
- delete abnormity: delete abnormity: if some salary level has only one man, the correspondence between sale level and salary of his level will be lost when the man is deleted
- hard to update: because of redundancy, it is hard to keep consistency when update
Database Design method
- ER Model and ER Diagram
- Procedure oriented method
basic the same as the procedure of software engineering