1.sdlc software development life circle
- a. gathering requirement & analysis
- b. design
sign documents by those parties
changing request needs more money - c. development
insert ,update,deleate - d. testing
- e. implementation / deployment
- f. maintaince
in ddlc, design means data modeling
design = data modeling:
- A.conceptual: concept bule paper / rough sketch / paper drawings
eg: based on the requirements, how many tables we need?
- B. logical design: using software (er dirgram)
- C. physical design:
based on er dirgram, create databse in sql server/ mysql
there are three steps during design:
conceptual design: more like sketch, or blue print
logical design: using software
physical design: based on er dirgram, create db
3. ddlc: database development life circle
eg: like building a house,learn the customer's requirements,then design
4. gathering requirement & analysis
business meeting:
4.1 JRD: joint requirement definition
between users,management,and head techs/ project leaders
INCULDING: SME( SUBJECT MATTER EXPERT) 行业专家+一点点it技能 / business analyst / a bridge between tech person and nontech person(business analyst is SME,sme不一定是ba)
find what the customer's requirement
focus on user
what we need to do?
如果customer 不说话: question; use case; one to one interview
如果requirement,要团队去analysis their requirement
4.2 JAD: joint application design
- don't need user, between head techs,developers, and tech team
- develop a plan to meet the requirements of the users,or the requirement from the JRD
- still need SME here,
- how we are going to do?
review process
find out what user really want
translation from users
4.3 alternative methods:
- Face to Face Interviews: result extra information;
- Anonymous Questionnaires: between customer and it company
- Feedback Surveys or Equivalent : between cilient and it company, give them cilient a survey
5. Types of Documentation
- MOM: minutes of meeting / moment of meeting
taking note in meeting
5.1 BRD: business required document
- find out user requirements from JRD meetings
- agree and finalize what the users hope to have accomplished
5.2 FRD: function requirement document
- decide approach from JAD meetings
- agree and finalize technical approach to user
5.3 alternative documents:
其实是 subset of BRD
BRD FRD都是main document !!!!!!
- stakeholder analysis: find all those invloved
- business analysis plan: plan of analysis
- Current state analysis: how long need of the project?
- scope statement specification: AKA BRD (as known as BRD)
- data dictionary: explain your database (following document defines tables and columns in a db)
5.4 why document?
- Keep track of changes
- Share information among all persons
- Finalize and find agreement on requirements :after brd,frd
- Prevent changes or legal issues 避免客户修改requirement
how we can appoach this requirement?
what process to achieve the requirement?
6.1 Waterfall:
- Linear Design
- Straight Shot
- Strict in Timing
customer want you design database and they need deadline
6.2 Agile: 敏捷的
- Flexible Design: no boss, every one need be resopnsible ,everyone was flat
- Anticipating(预期的) Changes: 一个人走了,整个项目可能凉了
- High Dependency on Communication
need a lot communication, talk with developers ,managers
6.3 Scrum: 扭打,混乱,并列争球
Empirical Design: have product owner has: scrum master (facility manager,sm只负责product,不负责reporting); sm has team
Scrum Master: 相等于SME (subject matter expert)
Work Divided in Sprints of 1,2, or 3 weeks: story=task,story has bunch of work,
small piece of story calles sprints
every sprints < 3 weeks , review meeting(let customer see demo)
people in charge different people, report to manager / ba/users
the prupose of retrospective meeting: to analyze the most recently completed sprint
6.3.2 what's the difference between the product backlog and the sprint backlog?
the product backlog contains everything we might ever work on, while the spring backlpg contains just the things we'll work on during one sprint.
6.3.4 should the team expect to know all the tasks necessary to complete the committed pbis during the sprint planning meeting?
no,only 60% of the tasks are likely to be identified during the sprint planning meeting. other tasks,such as unanticipated dependencies, will be discovered during sprint execution.
6.3.5 in scrum, is it acceptable to postpone testing until another sprint?
no, in scrum, teams attempt to build a potentially shippable product increment every sprint.
6.3.7 what means of done?
properly tested, refactored, potentially shippable
6.4 Spiral: 螺旋
- Hybrid Design
- Analyze Risks of Project and Use Different Methods as Needed
eg: using in scientific
POC: Proof of Concept
only when you have poc, can continue next step.
you need check every step, minize the risk
eg: 把数据库从本地到云端,之前先要有poc, 让customer知道多少天,会有什么问题
7.variables to monitor
- requirements
- schedule / time of year
- budget
- size of project : small,waterfall large: different
- size of team: 4 people waterfall 20 people: communication ---spiral,
requirement, need more communication
schedule: different timing
it is important to choose Methodologies
budget 就是money, waterfall
8.business process
- requirements
- development
- Q&A
- Sample batch
- public
9. data modeling
9.1 the purpose of data modeling?
9.1.1 before project
- Blueprint
Find and plan out the overall design of the project - Assess the Requirements
Creating rough sketches and designs can provide new information on the compatibility of the user requirements and technical planning
9.1.2 during the project
- Uniform Design
Having a data model in place sets the blueprint for what everyone will follow - Project Tracking
With the project designed in data modeling, assessments can be made on time to create and schedule the project accordingly
10. data modeling terminology
10.1 terminology
entity: structure holding data (table)
attribute:column lable,name
tuples:rows in a table
domain:set of valid values for an attribute
eg: phone number var(10)null:absence of any value
relationship:how entities relate
degree: how many entities in a relationship 大多3个以内
cardinality:measure of participation
explain the relationship
10.2 cardinality
- Cardinality is the number of times the entity participates in the relationship
- Good reason needed for specific cardinality
- Cardinality =/= Degree
10.2.2 Cardinality Options
- Maximum Cardinality
Can’t exceed a certain number - Minimum Cardinality
Must be a specific minimum number - Fixed Cardinality
Must always be a specific number
一个客人可以有最少0个订单(有可能客人不买东西) -> min
weak relationship
partial participation
eg: one student have one to five courses???
10.3 degree
10.3.1 type of degree
- unary (degree = 1)
Only one entity involved - binary
Two entities involved in relationship - ternary
Three entities involved in relationship
11. keys used in data modeling
- primary key:unique identifier
must be unique, not allow null value
follow 1NF
composite key can be primary key, it rares
- foreign key:points to PK
A foreign key in one table points to a primary key in another table. Foreign keys prevent actions that would leave rows with foreign key values when there are no primary keys with that value.
FK can have null value :orphan record 外键表空值没有的,叫做orphan record
An orphaned record is a record whose foreign key value references a non-existent primary key value.
candidate key:key to be used in place of PK
before choosing primary keyalternate key: any candidate keys not being used
composite key:key made of multiple keys
unique and not null
- unique key:unique identifer
allow one null value
null is different other null
- surrogate key:代理关键字
data warehouse PK
11.2 关于unique 值中的null
sql server默认null is not equal to other null
11.3 join的问题:只有一个表格
在写query之前,记得dry run
stricted left join
select e.*, m.*
from employe e left join manager m
on e.eid=m.mgid
where m.id is null;
select *
from employe e1 left join employe e2
on e1.eid=e2.mgid
where e2.id is null;
12.concept in data modeling
12.1 conceptual phase
- First phase in the data modeling steps
- Here we design a basic simplified overview of the project
- Restrictions (technical or user) not taken into account in beginning.
- Rough design to be built upon
- Good in understanding the concept as a whole
12.2 logical phase
- Second phase in data modeling
- Here we take into account the user requirements and begin to see what restrictions we have
- Incorporates using software like Visio or ERwin
- Apply logical understanding to overall sections of concept
-can define constraints
12.3 physical phase
- Third phases of data modeling
- Take designs from logical phase and apply the technical restrictions
- Design the physical overall structure of a database or data warehouse
- Final blue prints
- ER-Diagrams with Crow’s Feet(physical) ; CHEN'S notation(logical)
Crow’s Feet(physical) ; CHEN'S notation(logical)
12.4 difference
in normalization, DML(insert,delete,update) GOOD (只需要插入/修改一个表格); DQL(data query language:select) BAD(因为需要join)
in denormalization, DML BAD; DQL GOOD(不用join)
13.0 data Anomalies
Without normalization many problems can occur when trying to load an integrated conceptual model into the DBMS. These problems arise from relations that are generated directly from user views are called anomalies.
not consitence(not uniform, something you want to do, but you are not able to do)
and not relaiblility
13.0.1 insert Anomalies
- An insertion anomaly is the inability to add data to the database due to absence of other data.
For example, assume Student_Group is defined so that null values are not allowed. If a new employee is hired but not immediately assigned to a Student_Group then this employee could not be entered into the database. This results in database inconsistencies due to omission.
因为primary key没有值,也不能insert
eg:还有一种情况: 新的教授来了没有学生,不能给他加入table
13.0.2 delete Anomalies
- A deletion anomaly is the unintended loss of data due to deletion of other data.
For example, if the student group Beta Alpha Psi disbanded and was deleted from the table above, J. Longfellow and the Accounting department would cease to exist. This results in database inconsistencies and is an example of how combining information that does not really belong together into one table can cause problems.
13.0.3 update Anomalies
- An update anomaly is a data inconsistency that results from data redundancy and a partial update.
For example, each employee in a company has a department associated with them as well as the student group. If A. Bruchs’ department is an error it must be updated at least 2 times or there will be inconsistent data in the database. If the user performing the update does not realize the data is stored redundantly the update will not be done properly.
sid sname cid cname profname
一个学生可以有很多课程,如果要更新一个教授的名字,学生学号是1, 其他教授的名字也可能变
(没有 primary key, sid 不是unique的)
13.1 what is noemalization?
- Normalization is a concept in databases and data warehouses in where we focus on meeting certain forms
- Often times involves break down tables into smaller data sets
- Prevent Update, Insert, Deletion Anomalies (
- Isolate Data so that changes are not propagated throughout the database
13.2 main goal of normalization
- The main focus of normalization is to reduce redundancy and create a well structured series of tables without error or inconsistencies
- This is increase the speed and optimize the data in a way so that when queries are written, the data is sorted in an efficient manner and easy to fetch
13.3 When should we Normalize?
- Normalization is an expensive process
- Designing can be difficult
- Good for final designs, not testing
- More tables leads to more time
- Joins are costly, having more tables can cause slowing
13.4 What are Functional Dependencies?
- Functional Dependencies are how different attributes relate in a table.
- At this level, we focus on individual tables
- We see how individual attributes relate to the keys in the table
- Primary Key & Candidate Keys = Prime Attributes
- Attributes that aren’t keys = Non-Prime Attributes
参考资料 https://opentextbc.ca/dbdesign01/chapter/chapter-11-functional-dependencies/
A functional dependency (FD) is a relationship between two attributes, typically between the PK and other non-key attributes within a table. For any relation R, attribute Y is functionally dependent on attribute X (usually the PK), if for every valid instance of X, that value of X uniquely determines the value of Y. This relationship is indicated by the representation below :
X ———–> Y
The left side of the above FD diagram is called the determinant, and the right side is the dependent. Here are a few examples.
13.5 Types of Dependencies
Full Dependencies
Depends on all Prime Attributes FullyPartial Dependencies
Depends on some Prime Attributes
a non-prime attribute is functionally dependent on part of a candidate key.
The StudentName can be determined by StudentID, which makes the relation Partial Dependent.
- Transitive Dependences
Depends on an attribute that depends on a Prime Attribute
eg: author(aid,author,book,author_nationality)
author_nationality depends on author,and author depends on author
13.6 How do Dependencies affect Normalization
- The types of dependencies each show how a column relates to the rest of the data in the table
- Good data should be identified by the Prime Attributes
- Dependencies decide Normal Forms
13.7 The Normalization Process
Going through the normalization process is broken down into formal steps, these are normal forms
In each form, we’ll be focusing on different aspects of the attributes and table design
13.7.1 1NF
- we find and remove redundant values by often times breaking down a large table into smaller groups
- Redundancy can be seen via dependencies
Each table cell should contain a single value.
Each record needs to be unique.
choose the primary key
eg: customers(id,name,age,address,orders)
the problem is one customer id can have multiple orders,it will cause data redundancy!
13.7.2 2NF
- Meets all of 1NF
- Make sure all non-prime attributes are fully dependent on a prime attribute
13.7.3 3NF
- Meet 1NF and 2NF
- Every non-prime attribute is non-transitively dependent on the prime attributes
14. attributes
- Key Attribute
primary attribute
Multivalued Attribute Double Ellipses
A multivalued attribute may have one or more values for a particular entity.
For example, Location as the attribute of an entity called ENTERPRISE is multivalued, because each enterprise can have one or more locations.Composite Attribute
Composite attributes are not atomic because they are assembled using some other atomic attributes.
can be divided into subparts.
A typical example of a composite attribute is a person’s address, which is composed of atomic attributes, such as City, Zip, and Street.Derived Attribute
imaginary ciecle
- Weak Entity
The weak entity in DBMS do not have a primary key and are dependent on the parent entity. It mainly depends on other entities.
- Weak/Partial Relationship
when cardinolity min is 0
eg: customer order customer没有买东西
14.2 chen's notation
14.2.1 chen's notation VS crow's notation
crow's notation: for the technical team,they handel and develop model
chen's notation: put them in business requirement
15. database integrity
to make database consistency and reliability
15.1 Entity Integrity
- Design of the table or entity
- Strong PK with no nulls or repeats
15.2 User-Defined Integrity
- Rules or constraints applied by the user to maintain rules of design
基于domain integrity的基础,添加客户的需要。
eg: 给primary key设置成1,2,3,4 这种是domain integrity.
eg: 给primary key设置成e1,e2,e3 这种就是user-defined integrity.
15.3 Domain Integrity
- Correct and proper domains specified with proper use of columns
email address, ensure info correctly
15.4 Referential Integrity
- Proper FK setup with proper PK reference
- Good design for connection and joins
The purpose of referential integrity is to prevent orphans and keep references in sync so that this hypothetical situation never occurs.
15.5 database design guideline
16. er-diagram
Entity Relationship Diagram
- Used to Create or Design a Blueprint of the Database or Data Warehouse
- Design Entities, Domains, and show Relationships
16.1 different notation
16.1.1 chen's notation
Entity: Boxed
Use a Singular term for a person, place, or ideaAttribute: Oval or Circle
Relationship: Diamond
Use descriptive verb phrases-
16.1.2 crow's feet notation
- Uses crow feet like connecting points between tables
- Each “foot” that connects one entity to another describes the technical relationship
- Used in Physical Phase
16.2 er with software /er-win
16.3 converting er to tables
16.3.1 rules
Each entity type becomes a table
Each single-valued attribute becomes a column
Derived attributes are ignored
Composite attributes are represented by components
Multi-valued attributes are represented by a separate table
Subject(Teach_ID, Subjects)
image.png The key attribute of the entity type becomes the primary key of the table
Many to many changes to one to many and many to one with addition of conjunction table
16.3.2 Weak Entity and Strong Entity
Weak entity types are converted into a table of their own, with the primary key of the strong entity acting as a foreign key in the table
This foreign key along with the key of the weak entity form the composite primary key of this table
弱实体集单独成为一个表格,把强实体集的主键借过来,和弱实体集的键组成一个新的primary key
16.3.3 relationship Unary Relationships
one to one
one to many binary relationships
- Binary Weak Relationship
Binary Strong Relationship
Binary Many to Many
16.3.4 relation schema
?????????/ what is relation schema
17. Reverse and Forward Engineering
17.1 Forward Engineering
- Process of building from the ground up
- Taking basic knowledge and building with it
- Commonly done with new systems being designed
- Take requirements and build from those
17.2 Reverse Engineering
- Process of breaking something down to understand how it works, and then rebuilding it back up with changes
- Commonly done to improve or redesign systems in place for a company
17.3 Use in Data Modeling
- In the data modeling process reverse or forward engineering can happen
- Reverse engineer a database from physical phase, to logical, and then conceptual. Make changes and rebuild from conceptual
- Forward engineering would build a database from requirements gathered and start from nothing