The dimensional model is the best way to meet our primary design goals:
· To present the needed information to users as simply as possible
· To return query results to the users as quickly as possible
· To provide relevant information that accurately tracks the underlying business processes
事实表的主键是由多个维度表的外键组成的复合主键。
无加性事实:一致的,如单价;
半加性事实:随着查询上下文而变化,如月末余额。适合在SSAS中处理。
强烈建议在事实表中定义尽可能详细的事实数据。并且所有的事实应在同一粒度级别上。
维度是维度模型的基础,用于描述业务对象。
维度属性是描述业务对象的数据,一般不经常变化。
维度具有一个或多个层次关系,或一对多关系。
维度表是不符合三范式设计原则的,存有冗余数据,有得于用户理解,并有且于查询性能。
Adventure Works |
Business Priority |
<-- Conformed Dimensions --> |
||||||||||||||||
Data Warehouse Bus Matrix Business Process |
Date (Order, Start, Ship) |
Product |
Promotion |
End Customer |
Employee |
Reseller |
Page |
Internet Registered User |
Part |
Vendor |
Shipper |
Problem |
Account |
Department |
Currency (Source, Dest.) |
Benefits Plan |
||
Advertising |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
TV |
|
x |
x |
x |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
x |
x |
x |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Online |
|
x |
x |
x |
x |
|
|
|
|
|
|
|
|
|
|
|
|
Promotions |
|
x |
x |
x |
x |
|
x |
|
|
|
|
|
|
|
|
|
|
|
Co-op programs |
|
x |
x |
x |
|
x |
x |
|
|
|
|
|
|
|
|
|
|
|
Web Site Marketing |
|
x |
x |
x |
x |
|
|
x |
x |
|
|
|
|
|
|
|
|
|
PR |
|
x |
x |
x |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Orders Forecasting |
2 |
x |
x |
x |
|
x |
x |
|
|
|
|
|
|
|
|
|
|
|
Reseller Orders |
1 |
x |
x |
x |
|
x |
x |
|
|
|
|
|
|
|
|
|
|
|
Internet Orders |
1 |
x |
x |
x |
x |
|
|
x |
x |
|
|
|
|
|
|
|
|
|
Purchasing |
|
x |
x |
|
x |
x |
|
|
|
x |
x |
x |
|
|
|
|
|
|
Parts Inventory |
|
x |
x |
x |
|
|
|
|
|
x |
x |
|
|
|
|
|
|
|
Manufacturing |
6 |
x |
x |
|
|
|
|
|
|
x |
|
|
|
|
|
|
|
|
Finished Goods Inv. |
|
x |
x |
x |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Shipping |
|
x |
x |
x |
x |
x |
x |
|
|
|
|
x |
|
|
|
|
|
|
Returns |
5 |
x |
x |
|
x |
x |
x |
|
|
|
|
x |
|
|
|
|
|
|
Registration cards |
|
x |
x |
|
x |
|
|
|
|
|
|
|
|
|
|
|
|
|
Customer Calls |
4 |
x |
x |
x |
x |
x |
x |
|
|
x |
|
|
x |
|
|
|
|
|
Web Support |
|
x |
x |
|
x |
x |
x |
x |
x |
|
|
|
x |
|
|
|
|
|
Financial Forecasting |
|
x |
x |
x |
x |
x |
x |
|
|
|
x |
|
|
x |
x |
|
|
|
Exchange Rate Mgmt. |
3 |
x |
|
|
|
|
|
|
|
|
|
|
|
|
|
x |
|
|
GL-Revenue & Expense |
|
x |
|
|
|
|
|
|
|
|
|
|
|
x |
x |
|
|
|
Cost Accounting |
|
x |
x |
|
|
|
|
|
|
|
|
|
|
x |
x |
|
|
|
Payroll |
|
x |
|
|
|
x |
|
|
|
|
|
|
|
|
x |
|
|
|
Benefits Enrollment |
|
x |
|
|
|
x |
|
|
|
|
|
|
|
|
|
|
x |
代理键的优点如下:
n 保护DW系统不随源系统的改变改变
n 有得于集成多个源系统的数据
n 可以向DW中增加源系统中不存在的数据,如对缺失值指向维度表的一行
n 允许跟踪维度表中随时间变化的属性
n 数值型的代理键在关系数据库与AS中效率都很高
如果维度的属性可以改变值,需要跟踪这种变化,就要建立缓慢变化维(SCD)。
有三种类型:
n Type 1: 覆盖原值,不保留历史
n Type 2: 保留全部历史
n Type 3: 保留上次历史
应用案例:如需要按地区统计历史销售量,如果地区重新划分过,只有保留历史才能保留正确分析。
日期维应使用有意义的日期字段作为代理键。这称为角色扮演维。
如果相关属性的维,如transaction ID
部分使用雪花模式是合理的,尤其当维度模型十分复杂时,但务必让全业务用户不要直接接触到如此复杂的模型。
现实世界中存在两种复杂关系:
n 事实表与维度表间的多对多关系
n 维度表之间的多对多关系
如一个销售单可能有多个销售员参与,这时就要建立bridge table
但这样引入一个风险,多次重复计算,如统计销售量时
Note |
Analysis Services 2005 has new functionality to support many-to-many dimensions. Analysis Services expects exactly the same kind of structure that we described in this section. They call the bridge table an intermediate fact table, which is exactly what it is |
即父子维度实现——parent-child data structure
如要保留父子维度的历史,将维度表转化为事实表
Each subcategory includes a mix of color, size, and weight attributes, for example. Therefore, most of the columns in the Product table do not make sense at the Subcategory level. This results in a much shorter dimension, hence the common term shrunken dimension.
经常在初始设计完成后发现,还有一些属性不属于任何维,但通常很重要,其实是代码表
四各处理方式:
1. 不处理,放在事实表中
2. 为每个属性建立一个维,以保证不会遗忘它们
3. 单独建立一个维存放这些属性
4. …
There are three fundamental types of fact tables in the DW/BI system: transaction, periodic snapshot, and accumulating snapshot
Note |
Transaction fact tables are clearly what Analysis Services was designed for. Your Analysis Services database can accommodate periodic and accumulating snapshots, but you do need to be careful. The problem is not the model, but the process for updating the data. Snapshot fact tables—particularly accumulating snapshots—tend to be updated a lot in the ETL process. This is expensive but not intolerable in the relational database. It’s far more expensive for Analysis Services, which doesn’t really support fact table updates at all. For snapshot facts to work in Analysis Services for even moderate-sized data sets, you’ll need the Enterprise Edition feature that allows dimensional database partitioning. |
Figure 2.9: The dimensional modeling process flow diagram
Table 2.5: Major Participants in Creating the Dimensional Model |
|
PARTICIPANT |
PURPOSE/ROLE IN MODELING PROCESS |
Data modeler |
Primary responsibility |
Business analyst |
Analysis and source expert, business definitions |
Data steward |
Drive agreement on enterprise names, definitions and rules |
Business power user |
Describe and refine data sources and business rules from a user perspective |
Source system developer |
Source expert, business rules |
DBA |
Design guidance, Early learning |
ETL designer |
Early learning |
ETL developer |
Early learning |
Steering Committee |
Naming, business definitions, model validation |
确定技术架构:STAGE DB->relational db -> OLAP
模型的初始设计在EXCEL文件中进行,这么易于进行修改。模型变得稳定以后,则可以将模型转换成标准建模工具建模。
参考资料:http://www.lifecycle-toolkit.com/tools/nmngcnv/modelstd/namest.htm
通过查看数据,对数据质量状况进行评估,并发现数据间的关系:
n The domain and distribution of the data in each column in a table
n Relationships between columns, such as hierarchies and derivations
n Relationships between tables, such as hidden foreign key relationships or columns with similar content
n Common patterns in the data, such as telephone numbers, or zip codes, or money
n Data quality problems, such as outlier values or exceptions to a common pattern, or exceptions to a relationship
一般通过工具进行数据查看工作。
通过与源系统开发人员、报表开发人员和业务用户会谈,进一步了解。
四个步骤:
1) 选择业务过程,即建模的主题域。
2) 选择粒度
3) 选择维度
4) 选择事实表
日期、产品(BOM),销售区域、员工、客户和零售商、促销、货币
role-playing dimension means the same table will be used multiple times, either through views, synonyms, or physical copies of the table.
与事实表有关的问题的有:
n 导出字段
n 计算分摊
n 共享维度
n 衰退维度