Data Modeling

Basic Concepts

  • Data Subjects / Entities: Something that "exists", like student, grade, etc. Not equal to database tables. Database tables are sometimes artificial, duplicative(aggregates)

    • Strong Entity: has a primary key.
    • Weak Entity: has the partial key which acts as a discriminator between the entities of a weak entity set.
  • Data Attributes of the Data Subjects: field / database column, like ID, name, etc.
  • Relationship between Data Subjects: like instructor teaches a class, class is taught by an instructor.

    • Gerund: A relationship that also exhibits characteristics of an entity, and can have attributes attached to it.
  • Business Rules applied to our data: Cardinality(一个列中不同值的个数), mandatory or optional relationships, permissible attribute values (like NULL), data change dynamics.

Modeling

  • Systems Modeling

    • Data Modeling

      • Classic ER (Entity–Relationship)
      • Post Classic ER
    • System Modeling

      • Semantic
      • UML: Unified Modeling Language

Database Design vs Data Modeling

Database Design

  • Specific DBMS(Database Management System) model (e.g. relational)
  • Goes below schema to physical storage
  • Implementation/product specific restrictions from the very beginning

Data Modeling

  • Conceptual / Semantic level
  • Unconstraint by RDBMS(Relational Database Management System) or other implementation rules
  • Closer to real world

Data Modeling Life Cycle

Conceptual Modeling
Logical Modeling
Physical Modeling

Data Modeling Methodologies

Transactional:

  • Conceptual level: mirror real world
  • Logical level:

    • Relational: data normalization with deliberate denormalization
    • Non Relational: NoSQL, OODBMS(Object-Oriented Database Management System) constructs, etc.
  • Physical level: blocks/tracks, MPP(Massively Parallel Processing) distribution, etc.

Analytical (DW):

  • Conceptual level: dimensional
  • Logical level:

    • Relational: fact and dimension tables
    • Non Relational: cubes, culumnardatabases, etc.
  • Physical level: blocks/tracks, MPP(Massively Parallel Processing) distribution, AWS buckets, HDFS name nodes and data nodes, etc.

Classic ER Notation / Chen Notation

Data Modeling_第1张图片

Multi Valued Attribute (MVA): like one person can have multiple email address

Crow's Foot Notation

More closed aligned with logical modeling
Data Modeling_第2张图片

Normalization

1NF: the key. requires multi-valued attributes to be converted to some other data structure.
2NF: the whole key
3NF: nothing but the key
Normal Form Violations:

  • 1NF violations (repeating groups): move offending data to a seperate table. For example, a student can have multiple email address, and an email address can be shared by multiple students. So instead of keeping email in Student table, we can seperate it out to a Email table to avoid duplication of other information.
  • 2NF violations: partial key dependencies: Grade table, course Name partial dependency on course ID, so remove course name from the Grade table, and have a Course table storing course ID and course name.
  • Transform many to many relationship: decompose M:M relationship into multiple "semantically equivalent" relationships.
  • Adding foreign keys

Softwares

Powerpoint
Visio: general drawing tool
CA ERwin: specializaed data modeling tool

你可能感兴趣的:(data)