【网摘】Data Modeling 简介

数据(Data)是描述事物的符号记录。
模型(Model)是现实世界的抽象。
数据模型(Data Model)是数据特征的抽象,是 数据库管理的教学形式框架。 数据库系统中用以提供信息表示和操作手段的形式构架。数据模型包括数据库数据的结构部分、数据库数据的操作部分和数据库数据的约束条件。
数据模型所描述的内容包括三个部分:数据结构、 数据操作、数据约束。
1)数据结构:数据模型中的数据结构主要描述数据的类型、内容、性质以及数据间的联系等。数据结构是数据模型的基础, 数据操作和约束都基本建立在数据结构上。不同的数据结构具有不同的操作和约束。
2)数据操作:数据模型中数据操作主要描述在相应的数据结构上的操作类型和操作方式。
3)数据约束:数据模型中的数据约束主要描述数据结构内数据间的语法、词义联系、他们之间的制约和依存关系,以及数据动态变化的规则,以保证数据的正确、有效和相容。
数据结构是目标类型的集合。目标类型是数据库的组成成分,一般可分为两类:数据类型、数据类型之间的联系。数据类型如 DBTG(数据库任务组)网状模型中的记录型、数据项,关系模型中的关系、域等。联系部分有DBTG网状模型中的系型等。 数据操作部分是操作 算符的集合,包括若干操作和推理规则,用以对目标类型的有效实例所组成的数据库进行操作。数据约束条件是 完整性规则的集合,用以限定符合数据模型的数据库状态,以及状态的变化。约束条件可以按不同的原则划分为数据值的约束和数据间联系的约束;静态约束和动态约束;实体约束和实体间的参照约束等。
随着数据库学科的发展,数据模型的概念也逐渐深入和完善。早期,一般把数据模型仅理解为数据结构。其后,在一些 数据库系统中,则把数据模型归结为数据的逻辑结构、物理配置、存取路径和 完整性约束条件等四个方面。现代数据模型的概念,则认为数据结构只是数据模型的组成成分之一。数据的物理配置和存取路径是关于 数据存储的概念,不属于数据模型的内容。此外,数据模型不仅应该提供数据表示的手段,还应该提供 数据操作的类型和方法,因为数据库不是静态的而是动态的。因此,数据模型还包括数据操作部分。
数据模型按不同的应用层次分成三种类型:分别是概念数据模型、逻辑数据模型、物理数据模型。

概念模型

概念模型(Conceptual Data Model),是面向 数据库用户的现实世界的模型,主要用来描述世界的概念化结构,它使数据库的设计人员在设计的初始阶段,摆脱计算机系统及DBMS的具体技术问题,集中精力分析数据以及数据之间的联系等,与具体的数据管理系统(Database Management System,简称DBMS)无关。概念数据模型必须换成逻辑数据模型,才能在DBMS中实现。
概念模型用于信息世界的建模,一方面应该具有较强的语义表达能力,能够方便直接表达应用中的各种语义知识,另一方面它还应该简单、清晰、易于用户理解。
在概念数据模型中最常用的是 E-R模型、扩充的E-R模型、 面向对象模型及谓词模型。

逻辑模型

逻辑模型(Logical Data Model),这是用户从数据库所看到的模型,是具体的DBMS所支持的数据模型,如网状数据模型(Network Data Model)、 层次数据模型(Hierarchical Data Model)等等。此模型既要面向用户,又要面向系统,主要用于 数据库管理系统(DBMS)的实现。

物理模型

物理模型(Physical Data Model),是面向计算机物理表示的模型,描述了数据在储存介质上的组织结构,它不但与具体的DBMS有关,而且还与操作系统和硬件有关。每一种逻辑数据模型在实现时都有起对应的物理数据模型。DBMS为了保证其独立性与可移植性,大部分物理数据模型的实现工作由系统自动完成,而设计者只设计索引、聚集等特殊结构。

重要模型编辑

层次模型、网状模型和关系模型是三种重要的数据模型。这三种模型是按其数据结构而命名的。前两种采用格式化的结构。在这类结构中实体用记录型表示,而记录型抽象为图的顶点。记录型之间的联系抽象为顶点间的连接弧。整个数据结构与图相对应。对应于树形图的数据模型为层次模型;对应于网状图的数据模型为网状模型。关系模型为非格式化的结构,用单一的 二维表的结构表示实体及实体之间的联系。满足一定条件的二维表,称为一个关系(见 关系数据库)。
名称 解释 优点 缺点
层次模型 将数据组织成一对多关系的结构,层次结构采用关键字来访问其中每一层次的每一部分
存取方便且速度快;结构清晰,容易理解; 数据修改和数据库扩展容易实现;检索关键属性十分方便
结构呆板,缺乏灵活性;同一属性数据要存储多次, 数据冗余大(如公共边);不适合于拓扑空间数据的组织

  网状模型
用连接指令或指针来确定数据间的显式连接关系,是具有多对多类型的数据组织方式
能明确而方便地表示数据间的复杂关系;数据冗余小
网状结构的复杂,增加了用户查询和定位的困难;需要存储数据间联系的指针,使得数据量增大;数据的修改不方便(指针必须修改)

  关系模型
以记录组或数据表的形式组织数据,以便于利用各种地理实体与属性之间的关系进行存储和变换,不分层也无指针,是建立空间数据和属性数据之间关系的一种非常有效的数据组织方法
结构特别灵活,概念单一,满足所有 布尔逻辑运算和数学运算规则形成的查询要求;能搜索、组合和比较不同类型的数据;增加和删除数据非常方便;具有更高的数据独立性、更好的安全保密性
数据库大时,查找满足特定关系的数据费时;对空间关系无法满足
其中应用最广泛的是关系模型,在逻辑数据类型中最常用的是 层次模型、 网状模型、 关系模型。 [1]  格式化模型可以描述复杂的数据结构,用存取路径实现数据间的联系。从60年代后期以来,在各种计算机系统上建立了许多层次模型和网状模型的数据库管理系统。这些系统成功地应用于数据处理。但这类系统具有以下缺点。首先是使用了许多与 数据操作任务无关的概念,如 DBTG系统中的系,用户必须按照存取路径存取数据库中的数据。其次,不支持集合处理,即未提供一次处理多个记录的功能。第三,没有充分认识端点用户直接与数据库对话的需求,缺乏为适应非预期查询而增加系统设施的能力。这些缺点降低了程序和数据的独立性,影响应用程序员的工作效率,限制端点用户对数据库的使用。
关系模型严格符合现代数据模型的定义。数据结构简单清晰。存取路径完全向用户隐蔽,使程序和数据具有高度的独立性。关系模型的数据语言非过程化程度较高,用户性能好,具有集合处理能力,并有定义、操纵、控制一体化的优点。关系模型中,结构、操作和完整性规则三部分联系紧密。 关系数据库系统为提高程序员的生产率,以及端点用户直接使用数据库提供了一个现实基础。
关系模型研究的一个课题,是扩展关系模型和把数据模型形式化。1981年提出可把数据模型看成是抽象的 程序设计语言。数据库是变量,数据库模式是变量的类型。数据库状态是某类变量所取的值。 数据定义语言给出说明数据库变量的语法, 数据操纵语言是数据模型中操作类型的具体实现,而 数据库管理系统是某个数据定义语言和数据操纵语言的具体实现。这样,就可以用程序设计语言的形式概念和方法来研究数据模型。







原文:【网摘】Data Modeling 简介

Resource:

Data Modeling Resource Center: http://infogoal.com/dmc/dmcdmd.htm

Data Warehousing Tutorial:  http://www.infogoal.com/datawarehousing/

What is Data Modeling?

Data modeling is the process of creating and extending data models which are visual representations of data and its organization. TheERD Diagram(Entity Relationship Diagram) is the most popular type of data model. Data models exist at multiple levels including:


  • The Conceptual Data Model describes data from a high level. It defines the problem rather than the solution from
  • the business point of view. It includes entities and their relationships. Typically the conceptual data model is
  • developed first.

  • The Logical Data Model describes a logical solution to a data project. It provides more details than the conceptual data model and is nearly ready for the creation of a database. These details includeattributes, the individual
  • pieces of information that will be included. Typically the logical data model is developed second.

  • The Physical Data Model describes the implementation of data in a physical database. It is the blueprint for the
  • database. Typically the physical data model is developed third.

================================================================================

Data Modeling Basics

from http://www.infogoal.com/datawarehousing/data_modeling_basics.htm

Data models whether used for data warehousing or other purposes tend to be designed using commonly understood symbols and terminologies. This section will provide you with that grounding.

There are three levels of data modeling:

  • Conceptual Data Model - a high level model that describes a problem.
  • Logical Data Model - a detailed data model that describes a solution using business terms.
  • Physical Data Model - a detailed data model that defines database objects. This model is needed to implement the models in a database and produce a working solution.

An entity is an object of interest to the enterprise. An entity can be a: person, organization, place, thing, activity, event, abstraction or idea. 

Entities are represented as rectangles in the data model:

【网摘】Data Modeling 简介_第1张图片

An attribute is a characteristic of an entity.  It may contain data values, for example:

  • order number

  • account balance amount

  • marital status code

  • item description

  • skill level code

A relationship is an association between entities.  Relationships are expressed as verbs or verb phrases.  Data modeling focuses on binary relationships.for example:

customer places order

region contains district

employee has developed skill

sales representative is responsible for territory

===============================================================================

How to read a data model

from http://www.stevehoberman.com/readadatamodel.htm

Cardinality represents the symbols on both ends of a relationship that define the number of instances of each entity that can participate in the relationship. It is through cardinality that the data rules are captured and enforced. Without cardinality, the most we can say about a relationship is that two entities are connected in some way through a rule. For example,person and company have some kind of relationship, but we don’t know much more than this.

The domain of values to choose from to represent cardinality on a relationship is limited to three values:zero, one, or many. Many (some people read it asmore) means any number greater than one. We can’t specify an exact number (other than through documentation), as in “A car has four tires.” We can only say, “A car has many tires.”

Each side of a relationship can have a combination of zero, one, or many. Through the specification of one or many, the structural portion of the cardinality represents the quantity of each entity instance in the relationship. The RI portion of the cardinality focuses on ensuring valid values through the specification of zero or one.

Each of the cardinality symbols are illustrated through the following example ofproduct and order line. A product is something that an organization sells in the hope of making a profit. An order line exists for each product on a particular order. For example, an order for five widgets and two doodads would mean two order lines in one order, with each order line tying back to a particular product. Formalizing the rules between product and order line, we have

Each product can appear on one or many order lines.

Each order line must contain one and only one product.

Fig. 5.1 captures these business rules.

Fig. 5.1 Product and order line, take 1

You need to know only three symbols to read any relationship. The small vertical line means “one.” The circle means “zero.” The triangle with a line through the middle means “many.” Some people call the “many” symbol acrow’s foot. The label (“Appear on”) on the line in this example helps in reading the relationship and understanding the rule that the relationship represents.

Every relationship has a parent and child. The parent entity appears on the “one” side of the relationship, and the child appears on the “many” side of the relationship. When you read a relationship, it is a good practice to start on the “one” side, because the entity there is typically the more independent of the two entities and sometimes provides a context for the child entity. Starting with the parent also adds consistency when interpreting the rules.

We use the word each in reading each relationship, starting with the parent side. 

The relationship in fig. 5.1 is therefore read as follows:

Each product can appear on zero, one, or many order lines.

Each order line must belong to one and only one product.

This relationship most closely matches the original business rule that we talked about several paragraphs earlier. Let’s change the cardinality slightly between these two entities and read the resulting business rules.

Fig. 5.2 Product and order line, take 2

Fig. 5.2 is interpreted as follows:

Each product can appear on one or many order lines.

Each order line can belong to zero or one product.

Fig. 5.3 Product and order line, take 3

Fig. 5.3 is interpreted as follows:

Each product can appear on zero or one order line.

Each order line can belong to one and only one product.

Fig. 5.4 Product and order line, take 4

Fig. 5.4 is interpreted as follows:

Each product can appear on one or many order lines.

Each order line can belong to one or many products.

================================================================================

Data Modeling for Data Warehouse

from http://www.infogoal.com/datawarehousing/data_models_for_data_warehousing_and_business_intelligence.htm

A data model is a graphical view of data created for analysis and design purposes. Data modeling includes designing data warehouse databases in detail, it follows principles and patterns established inArchitecture for Data Warehousing and Business Intelligence.

If you need to understand this subject from the beginning check the article, Data Modeling Basics to learn key terms and concepts.

Data warehouse modeling includes:

  • Top Down / Requirements Driven Approach
  • Fact Tables and Dimension Tables
  • Multidimensional Model/Star Schema
  • Support Roll Up, Drill Down, and Pivot Analysis
  • Time Phased / Temporal Data
  • Operational Logical and Physical Data Models
  • Normalization and Denormalization
  • Model Granularity : Level of Detail

Areas that require specialized patterns are:

Data Mart / Frontroom - uses dimension modeling - the ROLAP star schema or the MOLAP cube

Data warehouse / Backroom - uses normalized ERD

  • Staging / Landing Area - looks like source system
 
 
Data Modeling Tools

Data modeling involves visualizing data through use of graphical tools, so you will want to obtain a data modeling software package or use graphical capabilities in existing software. See the Data Management CenterData Modeling Directory for a list of data modeling tools and other resources.

We have used open source software to develop examples for this article so that readers will able to learn on their own without licensing fees. Data models have been developed using DBDesigner4 fromFabForce.Net. The target database is MySQL fromSun Software.

Data Modeling for Business Intelligence

It is best to organize data to best meet the needs of its users.  Business intelligence commonly performs analytic operations on data such as:

  • Query by multiple criteria
  • "Slice and dice"
  • Drill Down
  • Roll Up

The "Dimensional Data Model" otherwise known as the "Star Schema" was developed by Ralph Kimball in the 1980s to support these business needs.  This approach has stood the test of time and is the recommended way to organize data for business query and analysis.

The two major table types of the Star Schema are the Fact and the Dimension.  The Fact contains quantitative measurements while the Dimension contains classification information.  Each Fact is surrounded by the Dimensions that provide context to it, given the appearance of a star.

【网摘】Data Modeling 简介_第2张图片

The Order Fact with dimensions is a classic example.  In this case the Order Fact measurers order quantity and currency amount.  Dimensions of Calendar Date, Product, Customer, Geo Location and Sales Organization put the Order Fact into context.

This star schema supports looking orders like a cube, enabling slicing and dicing by customer, time and product.

【网摘】Data Modeling 简介_第3张图片

Surrogate Keys Improve Data Mart Efficiency and Performance

Surrogate keys, typical stored as integers, improve efficiency and increase performance. Joins between facts and dimensions are faster with integers.  Indexes on integers are compact and provide rapid access.

 
Facts - the Data Mart Measuring Stick

Facts contain quantitative measurements.  They focus on the answering the questions: how much and how many.

The grain is a determinant of the level of detail of the data mart fact.  A fact can be fine grained and represent a single event or transaction or it can be course grained and aggregate measurements over a period of time.

 
Dimensions Put Data Mart Facts in Context

Dimensions enable business intelligence users to analyze data using simple queries.  They focus on questions of: who, when, where and what.  Typical dimensions include:

  • Time period / calendar
  • Product
  • Customer
  • Household
  • Market Segment
  • Geographic Area

The primary key of a dimension should a surrogate key and is typically an integer.  This primary key is then related to facts to put the facts into context.  Much of the data in a dimension is descriptive and stored character format. It often contains both code and expanded values such as territory_code and territory_name to simplify and speed up query.

Time Dimensions are an important part of almost every dimensional model.  We recommend that you establish Time Dimension tables rather hard coded date logic.
Use the finest grain - probably daily and account for days, weeks, quarters, seasons, holidays, etc.

Avoid Data Mart Snowflakes

Dimensions are directly related to facts to enable simple and rapid query.  The snowflake is an extension to a dimension intended to reduce storage and duplication.  It has the undesirable side effect of complicating and slowing queries. 

  
Bridge Tables Implement Data Mart Hierarchies

The bridge table supports a many to many relationship between facts and dimensions.  For example, a bridge table could show the percentage commission split between multiple sales reps (dimension) and a sale (fact).

  
Data Modeling Slowly Changing Dimensions

At times, dimension data must change and that must be handled in the data mart.  Ralph Kimball has identified the followingslowly changing dimension (SCD) types that are widely recognized in data mart design:

(1) SCD Type 1 ---- Data is overwritten and prior data is not retained.

(2) SCD Type 2 ---- A new row with the changed data.

(3) SCD Type 3 ---- Update attributes with in the dimension row.  For example, we could maintain both current customer status code and prior customer status code.

=================================================================================

你可能感兴趣的:(【网摘】Data Modeling 简介)