


Chapter 1:  Introduction



Filesystem           Database System

Data Redundancy and inconsistency       Levels of Abstraction:Physical,Logiccal,View

Difficulty in accessing data

Data isolation

Integrity problem

Atomicity of updates

Multiple user for current access

Security problem


Some concepts:


Schema:logical structure of database

Instance:the actual content of database in a particular point in time.

Physical Data Independence – the ability to modify the physical schema without changing the logical schema



Entity Relationship Model


Entities and Relationships between entities.


DDL compiler generates a set of tables stored in a data dictionary



SQL: widely used non-procedural language




Chapter 2:  Entity-Relationship Model

Entity:is a object

Entity Set:: An entity set is a set of entities of the same type that share the same properties.

Domain – the set of permitted values for each attribute

Attribute types:Simple and composite attributes.Single-valued and multi-valued attributesDerived attributes

Relationship: is an association among several entities

Relationship Set: is a mathematical relation among n >=2 entities, each taken from entity sets

Degree of a Relationship Set:

Relationship sets that involve two entity sets are binary (or degree two).  Generally, most relationship sets in a database system are binary.


E-R Diagrams:

Rectangles represent entity sets.



Query Processing Basic Steps: (a picture in the ppt)

1.        Parsing and translation 

Parsing checks syntax, verifies relations.Translate the query into its internal form. This is then translated into relational algebra.

2.        Optimization

Relational algebra can be expressed by many forms , and choose the one with lowest cost.

3.    Evaluation The query-execution engine takes a query-evaluation plan, executes that tlan, and returns the answers to the query.


Measure of Query cost:

Time cost :disk accesses,CPU,or network communication


Selection Operation:

File Scan:

(br  denotes number of blocks containing records from relation r)

A1(Linear Search)----scan each file blocks to check whether satisfy the selection contition. Cost = br or br /2(when the selection is on a key attribute)

A2(Binary search)----applicable if the selection is an equality comparison on the attribute on which file is ordered  Cost = [log2(br)]

Index scan – search algorithms that use an index

A3(primary index on candidate key ,equality ):

Cost = HT + 1

A4(priamry index on non-key ,equality )

Cost =HT +number of blocks containing retrieved records

A5 (equality on search-key of secondary index).

if search-key is  a candidate key

Cost = HTi + 1

Retrieve multiple records if search-key is not a candidate key

Cost =  HTi + number of records retrieved

Selections Involving Comparisons(Relation is sorted on A )

A6(primary index  comparison )

For δA > V(r) use index to find first tuple >= v and scan relation sequentially from there

For δA<V (r) just scan relation sequentially till first tuple > v; do not use index

A7 (secondary index, comparison).

For δA> V(r)  use index to find first index entry >= v and scan index sequentially  from there, to find pointers to records.

For δA<V (r) just scan leaf pages of index finding pointers to records, till first entry > v

Complex Selections: Conjunction

A8(conjunctive selection using one index).

Select a combination of qi and algorithms A1 through A7 that results in the least cost


A9 (conjunctive selection using multiple-key index).

Use appropriate composite (multiple-key) index if available

A10 (conjunctive selection by intersection of identifiers).

Requires indices with record pointers.

Use corresponding index for each condition, and take intersection of all the obtained sets of record pointers.

Then fetch records from file

Complex Selections: Disjunction

A11 (disjunctive selection by union of identifiers).

Applicable if all  conditions have available indices. 

Otherwise use linear scan.

Use corresponding index for each condition, and take union of all the obtained sets of record pointers.

Then fetch records from file


Use linear scan on file





For relations that fit in memory, techniques like quicksort can be used.  For relations that don’t fit in memory, external ort-merge is a good choice.

Let M denote memory size (in pages).

External Sort-Merge


Cost:Thus total number of disk accesses for external sorting:

br ( 2 [log M–1(br / M)]+1)



Join Operation:

( r is called the outer relation and s the inner relation of the join.)

Nested-loop join :

for each tuple tr in r do begin
            for each tuple
ts  in s do begin
test pair (tr,ts) to see if they satisfy the join condition q
                      if they do, add tr • ts to the result.

Requires no indices and can be used with any kind of join condition.so Expensive!


the worst case Cost =nr * bs + br  disk accesses.

Cost `=br  + bs disk accesses.

Block nested-loop join :

           for each block Br of r do begin
          for each block Bs of s do begin
                     for each tuple tr in Br  do begin
                              for each tuple ts in Bs do begin
                                       Check if (tr,ts) satisfy the join condition
                                       if they do, add tr ts to the result.


Worst case estimate:  Cost = br * bs + br  block accesses.

Best case: br + bs block accesses.

nImprovements to nested loop and block nested loop algorithms:

Cost =  [br  / (M-2)] * bs + br


Indexed Nested-Loop Join

Cost of the join:  br  + nr * c

note: If indices are available on join attributes of both r and s, use the relation with fewer tuples as the outer relation.



Sort both relations on their join attribute (if not already sorted on the join attributes).

Can be used only for equi-joins and natural joins

Cost = br + bs    +    the cost of sorting if relations are unsorted.

hybrid merge-join:


Applicable for equi-joins and natural joins.












