Filesystem Database System
Data Redundancy and inconsistency Levels of Abstraction:Physical,Logiccal,View
Difficulty in accessing data
Data isolation
Integrity problem
Atomicity of updates
Multiple user for current access
Security problem
Schema:logical structure of database
Instance:the actual content of database in a particular point in time.
Physical Data Independence – the ability to modify the physical schema without changing the logical schema
Entity Relationship Model
Entities and Relationships between entities.
DDL
DDL compiler generates a set of tables stored in a data dictionary
DML
SQL: widely used non-procedural language
Entity:is a object
Entity Set:: An entity set is a set of entities of the same type that share the same properties.
Domain – the set of permitted values for each attribute
Attribute types:Simple and composite attributes.Single-valued and multi-valued attributesDerived attributes
Relationship: is an association among several entities
Relationship Set: is a mathematical relation among n >=2 entities, each taken from entity sets
Degree of a Relationship Set:
Relationship sets that involve two entity sets are binary (or degree two). Generally, most relationship sets in a database system are binary.
E-R Diagrams:
Rectangles represent entity sets.
Query Processing Basic Steps: (a picture in the ppt)
1. Parsing and translation
Parsing checks syntax, verifies relations.Translate the query into its internal form. This is then translated into relational algebra.
2. Optimization
Relational algebra can be expressed by many forms , and choose the one with lowest cost.
3. Evaluation The query-execution engine takes a query-evaluation plan, executes that tlan, and returns the answers to the query.
Measure of Query cost:
Time cost :disk accesses,CPU,or network communication
(br denotes number of blocks containing records from relation r)
A1(Linear Search)----scan each file blocks to check whether satisfy the selection contition. Cost = br or br /2(when the selection is on a key attribute)
A2(Binary search)----applicable if the selection is an equality comparison on the attribute on which file is ordered Cost = [log2(br)]
A3(primary index on candidate key ,equality ):
Cost = HT + 1
A4(priamry index on non-key ,equality )
Cost =HT +number of blocks containing retrieved records
A5 (equality on search-key of secondary index).
if search-key is a candidate key
Cost = HTi + 1
Retrieve multiple records if search-key is not a candidate key
Cost = HTi + number of records retrieved
A6(primary index comparison )
For δA > V(r) use index to find first tuple >= v and scan relation sequentially from there
For δA<V (r) just scan relation sequentially till first tuple > v; do not use index
A7 (secondary index, comparison).
For δA> V(r) use index to find first index entry >= v and scan index sequentially from there, to find pointers to records.
For δA<V (r) just scan leaf pages of index finding pointers to records, till first entry > v
A8(conjunctive selection using one index).
Select a combination of qi and algorithms A1 through A7 that results in the least cost
A9 (conjunctive selection using multiple-key index).
Use appropriate composite (multiple-key) index if available
A10 (conjunctive selection by intersection of identifiers).
Requires indices with record pointers.
Use corresponding index for each condition, and take intersection of all the obtained sets of record pointers.
Then fetch records from file
A11 (disjunctive selection by union of identifiers).
Applicable if all conditions have available indices.
Otherwise use linear scan.
Use corresponding index for each condition, and take union of all the obtained sets of record pointers.
Then fetch records from file
Use linear scan on file
----------------------------------------------------------------------------------------------------------------
For relations that fit in memory, techniques like quicksort can be used. For relations that don’t fit in memory, external ort-merge is a good choice.
Let M denote memory size (in pages).
Thus total number of disk accesses for external sorting:
br ( 2 [log M–1(br / M)]+1)
( r is called the outer relation and s the inner relation of the join.)
Nested-loop join :
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition q
if they do, add tr • ts to the result.
end
end
Requires no indices and can be used with any kind of join condition.so Expensive!
the worst case Cost =nr * bs + br disk accesses.
Cost `=br + bs disk accesses.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition
if they do, add tr • ts to the result.
end
end
end
end
Worst case estimate: Cost = br * bs + br block accesses.
Best case: br + bs block accesses.
nImprovements to nested loop and block nested loop algorithms:
Cost = [br / (M-2)] * bs + br
Cost of the join: br + nr * c
If indices are available on join attributes of both r and s, use the relation with fewer tuples as the outer relation.
Sort both relations on their join attribute (if not already sorted on the join attributes).
Can be used only for equi-joins and natural joins
br + bs + the cost of sorting if relations are unsorted.
hybrid merge-join:
Applicable for equi-joins and natural joins.
2004年11月23日11:07:52