1.补充视频

1.1 Computing minimal cover

Steps to produce Functions:
1.ReduceRight: ensure all functional dependencies have one attribute on right hand side;
2.ReduceLeft: remove redundant attributes on left hand side;
3.RemoveRedundant: eliminate redundant functional dependencies

例题：
R = {ABCDEG}
F = {A—>BCD, B—>CDE, AC—>E}

After step 1:
A—>B, A—>C, A—>D, B—>C, B—>D, B—>E, AC—>E

After step 2:
consider AC—>E, if remove A, it will be C—>E, while in step 1, C cannot produce anything; if remove C, A—>E, based on step 1, A—>B, B—>E, so A—>E, it works.
A—>B, A—>C, A—>D, B—>C, B—>D, B—>E, A—>E

After step 3:
consider A—>B, if we delete it, based on the rest, A—>C, A—>D, A—>E, but C/D/E cannot produce B, so it cannot be removed.
consider A—>C, if we delete it, based on the rest, A—>B, B—>C, so it can be removed.
A—>B, A—>D, B—>C, B—>D, B—>E, A—>E
consider A—>D, if we delete it, based on the rest, A—>B, B—>D, so it can be removed.
A—>B, B—>C, B—>D, B—>E, A—>E
consider B—>C, it cannot be removed.
consider B—>D, it cannot be removed.
consider B—>E, it cannot be removed.
consider A—>E, if we delete it, based on the rest, A—>B, B—>E, so it can be removed.
A—>B, B—>C, B—>D, B—>E

This is the minimal cover, and the primary key should be AB

1.2 BCNF normalisation

Inputs: a schema R, a set Functional Dependencies(FDs)
Output: a set Res of BCNF schemas

BCNF requirement:
for all FDs X —> Y
either
X —> Y is trivial
or
X is a (super)key

method:
Res = {R}
while (any schema S in Res is not in BCNF) {
choose an fd X —> Y on S that violates BCNF
Res = (Res - S) U (S - Y) U XY
}
Res - S含义是出去S关系的剩余schema，S - Y 同理

例题：
R = ABCDEFGH
FDs = ABH—>C, A—>DE, BGH—>F, F—>ADH, BH—>GE

解法：
（1）找key
关系中F—>ADH是影响最广的，而推出F的是BGH，在BGH三者间，BH—>GE，所以BH更重要，而BH二者间没有推出关系，所以尝试从BH作为key推导。
BH—>GE(BEGH), BGH—>F(BEFGH), F—>ADH(ABDEFGH), ABH—>C(ABCDEFGH)，推导出完整的R，所以key是BH。
（2）根据FDs寻找违背BCNF的关系
A—>DE最明显，因为A不是key，BCNF不允许A和其他element建立关系。根据上述方法
Res = {ABCDEFGH}

Fix A—>DE
XY = ADE, S - Y = ABCFGH

Res = {ADE, ABCFGH}
ADE commits with BCNF
（3）根据FDs寻找违背BCNF的关系
R1 = ABCFGH
FDs = ABH—>C, BGH—>F, F—>AH, BH—>G（因为DE被放置在上一步的符合关系中，所有右侧涉及DE的都删除，剩余关系成立）
再次选择key，尝试继续从BH推导：
BH—>G(BGH), BGH—>F(BFGH), F—>AH(ABFGH), ABH—>C(ABCFGH)，成立
Key = BH

这次发现F—>AH违背BCNF关系
Fix F—>AH
XY = FAH, S - Y = BCFG

Res = {ADE, FAH, BCFG}
ADE, FAH commit with BCNF
对于剩下的BCFG而言，剩下的所有关系都不成立，ABH—>C中H不在，BGH—>F中H不在，F—>AH中AH都不在，BH—>G中H不在
所以BCFG commits with BCNF

最后结果是{ADE, FAH, BCFG}

1.3 3NF normalisation

Inputs: a schema R, a set F of Functional Dependencies(FDs)
Output: a set Res of 3NF schemas

3NF requirement:
for all FDs X—>Y
either
X—>Y is trivial
or
X is a (super)key
or
Y is a single attribute and Y is in the key

methods:
F’ = condensed minimal cover of F
Res = {}
for each fd X—>Y in F’ {
if (no schema S in Res contains XY)
Res = Res U XY
}
if (no schema S in Res contains key(R))
Res = Res U key(R)

例题：
R = ABCDEFGH
F = ABH—>C, A—>D, C—>E, BGH—>F, F—>AD, E—>F, BH—>E
(1)minimal cover:
ABH—>C, A—>D, C—>E, BGH—>F, F—>A, F—>D, E—>F, BH—>E
BH—>C, A—>D, C—>E, BH—>F, F—>A, F—>D, E—>F, BH—>E
BH—>C, BH—>E, F—>A, F—>D, E—>F
(2)key:
key = BHG
(3)F’:
F’ = BH—>CE, E—>F, F—>AD

BHCE(key = BH), EF(key = E), FAD(key = F), BHG(key = BHG)

2. Week9 Continue

继续上周的normal forms，重点是BCNF和3NF。简述如下：

2.1 BCNF

对于所有的FDs X-->Y
（1）要么X是superkey
（2）要么Y是X的子集
做BCNF decomposition的时候，从全集R开始，对于任意一个不符合BCNF的FD关系，拆解出来，更新FDs，直到全部符合BCNF。

2.2 3NF

对于所有的FDs X-->Y
（1）要么X是superkey
（2）要么Y是X的子集
（3）或者Y是candidate key的一个attribute
做3NF decomposition的时候，从空集开始。先把所有的FDs做minimal cover，过程是先拆解右侧FDs，再拆解左侧FDs，最后检查有没有任何FD是冗余的。将所有minimal cover组合成table，如果其中未包括candidate key，要加入candidate key形成一个table。

3. DBMS Architecture

DBMS应用中常见的technique有两类：query processing（QP）用来提升查询效率，transaction processsing（TxP）用来保障数据库安全。

在数据库效率考量中：
join比子查询效率高，如果subquery中包括外层查询的信息，那么效率更低；
要尽可能先筛选filter后输出返回；
减少在where和group by语句中的函数使用；
如果已有的数据很少做更新，那么可以建立indexes提升查询效率，代价是每次update都要同时update真实关系和index。

例如：

select name from Employee
where  salary > 50000 and
       empid in (select empid from WorksIn
                 where  dept = 'Sales')

使用了subquery效率低，可以更新为：

SalesEmps = (select empid from WorksIn where dept='Sales')
foreach e in Employee {
    if (e.empid in SalesEmps && e.salary > 50000)
        add e to result set
}

或者写成：

select name
from   Employee join WorksIn using (empid)
where  Employee.salary > 5000 and
       WorksIn.dept = 'Sales'

还可以更新为：

SalesEmps = (select * from WorksIn where dept='Sales')
foreach e in (Employee join SalesEmps) {
    if (e.salary > 50000)
        add e to result set
}

DBMS的构成如下：
--File manager: manages allocation of disk space and data structures
--Buffer manager: manages data transfer between disk and main memory
--Query optimiser: translates queries into efficient sequence of relational operations
--Recovery manager: ensures consistent database state after system failures
--Concurrency manager: controls concurrent access to database
--Integrity manager: verifies integrity constraints and user privileges

DBMS Architecture

4. Relational Algebra

Relational Algebra (RA)通常包括：
--operands: relations, or variables representing relations
--operators that map relations to relations
--rules for combining operands/operators into expressions
--rules for evaluating such expressions

4.1 Selection

可以理解为筛选tuples，减少rows，课上使用的表达法是Sel[expr](Rel)，标准表达法是σexpr(Rel)。
例如：SelC，从关系r(R)中选择一个条件为C的子集tuples。

4.2 Projection

可以理解为筛选attributes，减少columns，课上使用的表达法是Proj[A, B, C](Rel)，标准表达法是ΠA,B,C(Rel)。
例如：Proj[X](r)，从关系r(R)中选择attributes X。

4.3 Product

课上使用的表达法是Rel1 Join[expr] Rel2。
例如：r ✖️ s，要注意r和s可能有attribute命名相同，为了避免名字的冲突，要对一些attribute重命名。

4.4 Join

（1）Natural Join: r Join s，选取r和s中attribute一致的所有tuples。
r Join s = Proj[R U S](Sel[match](r ✖️ s))
（2）Theta Join: r JoinC s，在条件C下join，如果条件C是True，则与笛卡尔积一致(r ✖️ s)

4.5 Union

r1 U r2，与集合关系一致，但是和sql不同，RA不允许重复element。

4.6 Intersection

r1 ∩ r2，与集合关系一致，但是和sql不同，RA不允许重复element。

4.7 Difference

r1 - r2，代表只属于r1，不属于r2的所有elements。

4.8 Division

r/s，显示的column是r - s的attributes，显示的tuples是所有在r中包含全部s的tuples value的value。

RA Division

4.9 Rename

课上使用的表达法是Rename[schema](Rel)。
例如：如果expression E returns a relation R(A1, A2...An)，那么Rename[S(B1, B2...Bn)](E)。tuples还是E，但是Ai改名为Bi

COMP9311 Database Systems WEEK10