《精通数据仓库设计》中英对照_第三章

《精通数据仓库设计》中英对照_第三章

第二部分 模型开发

数据仓库应该表示企业数据的各个方面,这些方面以主题域和业务数据模型开始。我们将在第3章使用一个假想的公司,指导一步一步地开发这两个模型。然后在第4章使用这个业务模型作为起点,使用一系列转换步骤开发数据仓库数据模型。后面4章钻研数据仓库数据模型的各个具体方面,并包含有演示示例。第一个是无所不在购物天堂(GOSH)。GOSH是一个国内连锁店,设计向国际扩张。第二个是美食公司(DFC)。DFC是一个拥有大量消费者的打包食品生产商,生产多种类型食物,从面粉倒听装食品,到日常食品,速冻午餐、冰激凌等。

数据仓库集成多个数据源的数据,并长时间保存。在源系统中的键可能指向本系统唯一的记录,但是到了数据仓库后可能不再这样。在第5章,我们讨论操作型系统键结构导致的问题,及在数据仓库模型中如何处理这种问题。数据仓库区别于其他系统的一个属性是历史性,在第6章,我们介绍数据仓库日历建模的重要性及在数据模型里维护历史数据的不同方法。这些方法帮助我们掌握数据仓库不同寻常的属性,也就是,用于抓住数据在某个时间的快照。

7章和第8章钻研数据仓库常见的两种数据类型的建模——层次和事务。数据仓库的设计影响折中方案,即要承认源系统(一般为关系型)的结构,也要承认流行的多维数据集市结构。对层次与事务的处理提供达到平衡的技术。本部分的最后,也就是第9章,我们讨论确保数据仓库性能良好的步骤,描述如何优化物理数据仓库模式。

 

PART 2 Model Development

The data warehouse should represent the enterprise perspective of the data, and that perspective starts with the subject area and business data models. Using a fictitious company, we provide a step-by-step process to develop these two models in Chapter 3. Then using the business data model as the starting point, Chapter 4 develops the data warehouse data model using eight sequential transformation steps. The following four chapters delve into specific aspects of the data warehouse data model and include case studies demonstrating the principles. These case studies primarily use two company scenarios to develop the business case. The first is the General Omnificent Shopping Haven (GOSH).GOSH is a national department store chain with designs to expand internationally. The second is The Delicious Food Company (DFC). DFC is a large consumer packaged goods manufacturer that produces a wide range of food products, from powders and canned goods to dairy products, frozen dinners, and ice cream.

The data warehouse integrates data from multiple sources and stores the integrated form of that data for a long period of time. The keys that are used in each source system may have uniquely identified records within each system, but they may not be appropriate for the data warehouse. In Chapter 5, we review the problems posed by key structures in the operational systems and how these should be addressed in the data warehouse data model. One of the distinguishing characteristics of the data warehouse is its historical perspective. In Chapter 6, we explain the importance of modeling the calendar in the data warehouse and different approaches for maintaining the historical perspective in this data model. These approaches help us deal with the unusual nature of the data warehouse, that is, it is used to capture snapshots of the data over time.

Chapters 7 and 8 delve into modeling two types of data frequently stored in the data warehouse – hierarchies and transactions. The design of the data warehouse reflects a compromise. It recognizes both the structure of the source systems (typically relational) and the structure of the popular dimensional data marts. The treatment of the hierarchies and transactions provides techniques for striking the right balance. We close this part of the book with a chapter on the steps needed to ensure that the data warehouse performs well. In Chapter 9, we describe what is needed to optimize the physical data warehouse schema.

 

第三章 理解业务模型

所有的应用系统都包含基于数据的信息供公司使用,数据仓库也一样。业务数据模型表示那些数据,是所有系统的模型的基础,包括数据仓库模型。在第2章,我们描述了一个第三范式模型如何提供数据一致性及限制数据冗余性。我们在本章将降到,业务模型是一个三范式模型。因为数据仓库的目标之一是给企业提供一个关于事实和表征的一致视图,从满足这些标准的模型开始非常重要;因此,业务模型是数据仓库模型的基础。使用八个定义良好的步骤转换业务模型,从而创建数据仓库模型,这会在第4章介绍。

31  业务场景

32  主题域模型

321  关于特定行业的考虑

322  主题域模型开发过程

323  Zenith汽车公司的主题域模型

33  业务数据模型

34  小结

 

Chapter 3 Understanding the Business Model

All application systems, as well as the data warehouse, contain information based on the data used by the company. The business data model represents that data and is the foundation for all systems’ models, including the data warehouse model. In Chapter 2, we described how a third normal form model provides data consistency and restricts data redundancy. As we will present in this chapter, the business model is a third normal form model. Since one of the objectives of the data warehouse is to provide a consistent view of the facts and figures for the enterprise, it is important to start with a model that meets those criteria; therefore, the business model is used as the foundation for the data warehouse model. Building the data warehouse model consists of transforming the business data model using eight well-defined steps, and this is covered in Chapter 4.

一个完全开发的业务数据模型可能包含上百个实体。定义主要信息分组的主题域模型是管理这些实体的一个好方法,它提供一个逻辑方法把这些实体分组。本章以描述主题域模型开始,重点介绍它怎样帮助确保数据仓库模型的一致性和管理冗余。然后,我们列出业务数据模型,它与主题域模型的关系,及其开发步骤。关于业务数据模型的常见抱怨是太深奥并且实践价值有限,这一节消除这些观念,并且演示这个模型是一个用速记符号(矩形和线条)描述业务的方式,它对应用系统后续开发带来很大便利。它对建立主题域模型和业务数据模型提供一个高层级的描绘。我们已经在本书的“推荐阅读”章节包含了关于这个话题几本喜爱的图书。

这是一本关于“怎样进行”数据仓库建模的书,纵观本书第2和第3部分,建模的概念会使用实际的场景来演示。我们使用一个业务场景来演示建模活动,这在本章的开始部分介绍。

A fully developed business data model may contain hundreds of entities. A subject area model, which defines the major groupings of information, is a good way to manage these entities by providing a logical approach for grouping the entities. This chapter begins by describing the subject area model, with particular emphasis on how it helps ensure consistency and manage redundancy in the data warehouse model. We then outline the business data model, its relationship to the subject area model, and the steps required to develop it. A common complaint about the business data model is that it is esoteric and of limited practical value. The section on the business data model dispels these concerns and demonstrates that this model is a means of describing the business in a shorthand notation (that is, rectangles and lines) that facilitates the subsequent development of supporting application systems. It provides a high-level description of the process for building the subject area model and business data model. Complete books have been written on this subject alone, and they should be consulted for additional information. We’ve included some of our favorite books on this topic in the “Recommended Reading” section of this book.

This is a “how to” book on data warehouse modeling. Throughout Parts Two and Three of the book, the modeling concepts will be demonstrated using practical scenarios. We use a business scenario to demonstrate the modeling activities, and it is described at the beginning of this the chapter.

1.   业务场景

我们使用一个汽车制造工厂的场景,在本章用来开发主题域模型和业务模型,在第4章用来开发数据仓库模型。根据业务场景的描述,我们潜入主题域模型。

我们的汽车制造工厂名叫杰力士汽车公司(ZAC)。ZAC创建于1935年,生产两款汽车——杰力士,及更高档豪华的途克多。每种款式都有描述汽车类型的型号,每个型号有3个序列可选。表3.1描述了这些型号,序列号在表3.2描述。

Business Scenario

We use a business scenario of an automobile manufacturer to develop the subject area model and business data model in this chapter and the data warehouse data model in Chapter 4. Following the description of the business scenario, we will dive into the subject area model.

Our automotive manufacturing firm is named Zenith Automobile Company (ZAC). ZAC was founded in 1935, and manufactures two makes of automobile —Zeniths and the higher-end luxury Tuxedos. Each of these makes have models that describe the type of car, and each model has three series available. The models are described in Table 3.1, and the series are described in Table 3.2.

Table 3.1 Car Models  汽车型号

MAKE

款式

MODEL NAME

样式名称

TARGET GROUP

目标群体

DESCRIPTION

描述

Zenith

Zipster

The young at heart(and age)

The Zipster is a sporty, subcompact-class car with a small price tag, excellent gas mileage, and limited options. This is the low-end offering in the Zenith line of cars.

Zenith

Zombie

Older retired drivers with a limited income

The Zombie is a compact sized, four-door automobile, noted for its economical upkeep and good gas mileage.

Zenith

Zoo

Families with small children

The Zoo is a four-door, mid-size car. The car is moderately priced and has good gas mileage.

Zenith

Zoom

Sports car enthusiast of modest means seeking excitement

The Zoom is a moderately expensive, big-engine performance car that offers quick response, agile handling, and fast acceleration.

Zenith

Zeppelin

Luxury minded individual

The Zeppelin is the top-of-the-line Zenith car offering unsurpassed quality and features. It is a four door, full sized model.

Tuxedo

Topsail

Young professionals

The Topsail is a mid-sized, two-door sedan equipped with a full complement of luxury features, including leather seats, an eight-way power-adjustable seat, a tilt steering wheel, and a high-tech car alarm.

Tuxedo

Tiara

The truly discriminating sophisticated driver

The Tiara is a full-sized four-door, sedan that is the top of the line Tuxedo automobile and is priced accordingly. It has many of the same features found in the Topsail but offers every conceivable luxury, including seat and outside

mirror heaters.

Tuxedo

Thunderbolt

Wealthy sports car enthusiasts

The Thunderbolt marks an acknowledged milestone in sports cars. It combines all the breathtaking performance of a thoroughbred with the ease of operation, comfort, and reliability of a passenger car.

 

所有ZAC汽车都通过全美各地的代理商销售,代理商是独立的实体,作为ZAC的代理商,有零售权,受ZAC的制度管理,制度之一就是需要他们每月提交财务报表。代理商驻扎在销售片区,片区组成销售地区,销售地区再组成销售大区。所有分配都是到销售地区级次,激励程序由ZAC公司开发。

All of ZAC’s cars are sold through dealers throughout the United States. Dealers are independent entities, but to retain their right to serve as ZAC dealers, they are governed by ZAC’s rules. One of those rules requires them to submit monthly financial statements. The dealers are located within sales areas, which are grouped into sales territories, which are grouped into sales regions. Allocations are made at the sales area level, and incentive programs are developed by ZAC corporate.

 

 

Table 3.2 Car Series  3.2 汽车系列

 

MAKE

款式

SERIES NAME

系列名称

ACRONYM

缩写

DESCRIPTION

描述

Zenith

No Frills

无装饰

NF

This is the base level containing no upgrades. Base level consists of vinyl seats, low-end carpeting, smaller engines, manual transmissions, and three paint colors.

Zenith

Some Frills

少许装饰

SF

This is the next level and comes with upgraded fabric for the interior seats, moderately upgraded carpet, automatic transmission, larger engines, tinted windows, radio, five paint colors including metallic colors, and so on.

Zenith

Executive Frills

高级装饰

EF

The cars in this series come with leather interior, high-quality carpet, automatic transmission, larger engines, air conditioning, tinted windows, cruise control, power windows and locks, radio/tape player, eight paint colors including metallic colors, and so on. This series is not available for the Zipster or the Zombie.

Tuxedo

Pricey Frills

超值装饰

PF

Cars in this series come with leather interior, radio/tape deck, air conditioning, optional automatic transmission, cruise control, power windows and door lock, and keyless entry system.

Tuxedo

Decadent Frills

豪华装饰

DF

Cars in this series come with all the features for the CF Series plus tinted windows, antitheft car alarm, moon roof, and radio/tape player/CD player with eight speakers.

Tuxedo

Truly Decadent

超豪华装饰

TDF

Cars in this series have all the Frills features listed for the PF Series plus power-operated moon roof, advanced sound system and insulation, automatic climate control system, dual illuminated vanity mirrors, and heated front seats.

 

多年来,ZAC开发了很多系统,平台有主机、微型机、甚至PC机,建立和(或)购买其他自动化生产工具,导致多个分离的系统和数据库。现在,拥有IBM 3090, DEC VAX, Tandem, Sun, HP,还有PC机和苹果机等。孰不分布在DB2VSAMEnscribe文件,Non-stop SQL,RDB, Oracle, Sybase, and Informix等数据库。终端用户使用Paradox, Rbase, Microsoft Access, and Lotus Notes等之类的工具。不用说,数据分布在上公司百个分离的数据库里,很多都不能访问。

ZAC 刚开始用一个信息引擎来再造业务,再造的第一个项目效果显著,它是一个包含代理商信息的数据仓库,帮助公司渡过了严峻的形势。这个数据仓库的主题域是汽车和代理商,很少涉及到激励程序和销售组织。

这个数据仓库的推动力是这些主题域的数据今天不容易得到,导致机会丢失、金钱浪费、高管对公司的运营情况、方向及销售把握不清。经过公司关键人物的会晤,ZAC决定开发一个数据仓库及一系列数据集市,以解决如下问题:

 

Over the years, ZAC has developed a myriad of systems on mainframes, minicomputers, and even PCs. It built and/or bought other automobile manufacturing facilities, which resulted in even more disparate systems and databases. Currently, it has IBM 3090s, DEC VAXs, Tandems, Suns, and HPs, plus PCs and Macintoshes. Their data is spread out in DB2, VSAM and Enscribe files, Non-stop SQL, RDB, Oracle, Sybase, and Informix. End users have tools such as Paradox, Rbase, Microsoft Access, and Lotus Notes. Needless to say, the data is spread out in hundreds of disparate databases throughout the company, with many in inaccessible formats.

ZAC is just beginning an information engineering effort to reengineer its business. The first project the reengineering effort highlighted as critical to the survival of the company is a data warehouse containing information about its dealers’ car sales. The subject areas it has identified for this warehouse are Automobiles and Dealers, with less emphasis on Incentive Programs and Sales Organizations.

The impetus for the data warehouse is the fact that the data from these subject areas is not easily obtained today, causing opportunities to be lost, money to be wasted, and high-level executives to be uneasy about the direction and health of their company and their automotive sales. Based on interviews with key stakeholders, the ZAC decided to undertake development of a data warehouse and a set of data marts that could answer the following questions:

■■每月销售趋势如何,即每个代理商、销售片区、地区、大区、州和大都市圈(MSA)销售的各款式、样式、系列、颜色(MMSC)的数量和金额。

■■每月的库存情况如何,即每个代理商、销售片区、地区、大区、大都市圈(MSA),每个MMSC的库存数量。

■■每月销售数量和金额有怎样的放射形态,即每个MMSC,每个代理商、销售片区、地区、大区的销售与去年同期、前年同期比较的变化情况。

■■每月实际销售情况(数据与金额)的趋势如何,即每个MMSC,每个代理商、销售片区、地区、大区的实际销售与计划目标的比较,用户需要这些信息,包括每月汇总数,及年累计数(YTD)。

■■每月的历史(两年前比较)情况如何,既每MMSC的零售数量和金额与批发代理商的对比。

■■每个MMSC每月累计销售情况与去年累计的比较如何。

■■每月销售趋势是什么,即激励措施导致每个MMSC,每个代理商、销售片区、地区、大区的销售数量和金额。

■■每月平均时间趋势如何,即每个代理商收到新车型后的销售速度,每个MMSC,每个代理商、销售片区、地区、大区的销售数量和金额。

■■每月平均销售价格如何。

■■每个代理商的付款时间如何。

■■车型改变前后的销售情况对比。

 

■■ What is the monthly sales trend in terms of quantity and dollar amounts sold of each make, model, series, and color (MMSC) for a specific dealer, by each sales area, sales territory, and sales region, for each state and for each metropolitan statistical area (MSA)?

■■ What is the pattern in the monthly quantity of inventory by MMSC for each dealer, by each sales area, sales territory, sales region, and MSA?

■■ How does the monthly quantity and dollars of sold automobiles by MSC译者注:应该是MMSC having a particular emissions type—by Dealer, Factory, Sales Area, Sales Territory, and Sales Region—compare with the same time frame last year and the year before?

■■ What is the trend in monthly actual sales (dollars and quantities) of MMSC for each dealer, sales area, sales territory, and sales region compared to their objectives? Users require this information both by monthly totals and cumulative year to date (YTD).

■■ What is the history (two-year comparisons) of the monthly quantity of units sold by MMSC and associated dollar amounts by retail versus wholesale dealers?

■■ What are the monthly dollar sales and quantities by MMSC this year to date as compared to the same time last year for each dealer?

■■ What is the monthly trend in sales dollars and quantities by MMSC for particular types of incentive programs, by dealer, sales area, sales territory, sales region, and MSA?

■■ What is the monthly trend in the average time it takes a dealer to sell a particular MMSC (called velocity and equal to the number of days from when a dealer receives the car to the date it is sold) by sales area, sales territory, sales region, and MSA?

■■ What was the monthly average selling price of an MMSC for each dealer, sales area, sales territory, sales region, and MSA?

■■ How many days was a dealer placed on credit hold for this month only and for the entire year? In addition, what was the total number of months in the past two years that the dealer was put on credit hold?

■■ Compare monthly sales dollars and quantities from the last body style (body style is make + model) to the current body style for each sales region? Body styles change every four years.

 

2.   主题域模型

数据仓库由主题域组织起来,所以以主题域模型作为数据仓库模型的开始是非常自然的方法论。数据仓库的面向业务性把它和传统的应用系统区分开。在传统的操作型系统里,虽然数据模型应该从主题域模型开始,但这一步常常被省略。因为操作型系统面向特殊的业务功能与流程,设计的重点在于高效地处理有关事务。因此,它的模型也是以事务处理能力为重点,使用的流程大大影响数据的组织。在数据仓库里,主题倾向性保留在物理数据库设计的核心。核心的业务流程在源操作型系统和数据集市描述,数据集市的数据来源于数据仓库,而核心的数据仓库设计是面向主题的。

就如我们在第2章指出的那样,主题域是企业感兴趣的物理项、概念、人们、地点、事件等的主要分组,我们也指出主题域模型可以快速开发。一个企业可以参考其他企业已经开发的主题域模型,而不必从草稿开始。有很多主题域普通适用于企业,几乎所有的组织都有客户、供应商、产品和设施,这些都是可选的主题域。我们后面将会讲道,从一个通用的模型开始是一个好的起点,例如表3.3所示。

 

Subject Area Model

A data warehouse is organized by subject area, so it is only natural that the methodology for a data warehouse data model should begin with the subject area model. The subject-orientation of the data warehouse distinguishes it from a traditional application system. In the traditional operational system, although the data model should begin with a subject area model, this step is often omitted. Since the operational system is oriented toward specific business functions and processes, its design needs to emphasize the efficiency with which it can process the related transactions. Its model, therefore, is adjusted to emphasize the transaction-processing capabilities, with the processes that use it greatly influencing the data’s organization. With the data warehouse, the subject orientation remains at the core of the physical database design. The core business processes are depicted in the source operational systems and with the data marts that people use to obtain data from the data warehouse, but the core data warehouse design remains subject oriented.

As we indicated in Chapter 2, subject areas are major groupings of physical items, concepts, people, places, and events of interest to the enterprise. We also indicated that the subject area model can be developed very quickly. An organization developing its first subject area model can benefit from work performed by others so that it doesn’t need to start from scratch. There are many subject areas that are common across industries; virtually all organizations have customers, suppliers, products, and facilities. These are candidates for subject areas. A good point at which to start, as explained later in this chapter, is a generic model, such as the one shown in Table 3.3.

Table 3.3 Common Subject Areas   公共主题域

SUBJECTAREA

主题域

DEFINITION

定义

EXAMPLES

示例

REMARKS

注意

Business Environment

业务环境

Conditions, external to the company which affect its business activities

影响公司业务活动的外部条件

•Regulation  •Competition

•License

规章制度

竞争对手

执照

These are often not implemented in a data warehouse.

这些一般在数据仓库里没有实现。

Communications

沟通

Messages and the media used to transmit the messages

消息及用于传达消息的媒体

•Advertisement

•Audience

•Web Site

广告

观众

网站

These often pertain to marketing activities, though Content they can apply to internal and other communications.

这些经常属于市场活动,通过可以使用的内容,用于内部及其他沟通。

Customers1

顾客

People and organizations who acquire and/or use the company’s products

获得并且/或者使用公司产品的人与组织

•Customer

•Prospect

•Consumer

顾客

潜在顾客

消费者

The definition provides for capturing potential customers(prospects) distinguishing between parties who buy the product and for those who use it.

这个定义区分潜在客户与购买产品及使用产品者之间的区别。

External Organizations

外部组织

Organizations, except Customers and Suppliers, external to the company

公司外部组织,除了供应商和客户以外

•Competitor

•Partner

•Regulator

竞争对手

合作伙伴

政府调节员

The exclusion of Customers and Suppliers is consistent with the subject areas’ being mutually exclusive.

排除客户和供应商,使主题域非斥

Equipment

设备

Movable machinery, devices, and tools and their integrated components

可移动的机械、设备、工具及其组装部件

•Computer

•Vehicle

•Crane

计算机

车辆

起重机

Software that is integral to equipment is included within this subject area; other software is included within the Information subject area.

作为构成完整设备的一部分软件也包括在这个主题域里;其他软件包含在信息主题域里。

Facilities

设施

Real estate and structures and their integrated components

实体不动产与构造及其综合部件

•Real Estate

•Building

•Mountain

实体不动产

建筑物

山脉

Integrated components (for example, an alarm system within a building) are often included as part of the facility unless a company is specifically interested in those components.

综合部件(如一个建筑物里的警报系统)常常作为设施的一部分,除非一个公司对这些部件特别关注。

Financials

财务

Information about money that is received, retained, expended, or tracked by the company

公司关于金钱的收取、占有、支出、跟踪等信息。

•Money

•Receivable

•Payable

现金

应收

应付

 

Human Resources1

人力资源

Individuals who perform work for the company and the formal and informal organizations to which they belong

为公司工作的个人及它们所属的正式、非正式的组织

•Employee

•Contractor

•Position

职员

承包人

职位

 

Includes prospective(for example, applicants) and former (for example, retirees) employees. Some companies prefer to establish the organizational structure within a separate subject area.

包括预期的(例如应聘者)与曾经的(例如退休人员)职员。有些公司更喜欢在一个分开的主题域建立组织结构专题。

 

Information

信息

Facts and the information about facts and mechanisms that manage them

事实与关于事实的信息及管理它们的机制

•Application System

•Database

•Meta Data

应用系统

数据库

元数据

This includes the information about the company’s computing environment, and also includes non-electronic information.

包括公司计算环境的信息,还包括电子信息。

 

Locations

位置

Geographical points or areas

地理位置或区域

•Geopolitical Boundary

•Country

•Address

地理边界

国家

地址

This can be expanded to include electronic locations such as email addresses and phone numbers

这可以拓展到电子位置,例如email地址,电话号码等。

Materials

原材料

Goods and services that are used or consumed by the company or that are included in a piece of equipment, facility, or product

公司使用或消耗的商品或服务,或者包含设备、设施、产品上小部件

•Chemical

•Fuel

•Supply

化工产品

燃料

供应品

 

Sometimes, a product is used as a component of another product. When this is the case, a relationship between the relevant entities will be indicated in the business data model.

有时,一个产品是另一个产品的部件,这种情况,相关实体之间的关系会在业务数据模型指明。

Products

产品

Goods and related services that the company or its competitors provide or make available to Customers

公司或者竞争对手提供或者顾客可以得到的商品与相关服务

•Product

•Service

•Advice

产品

服务

建议

 

Competitor items that the company does not provide are often included to facilitate monitoring these and to support future decisions.

公司不能提供的竞争对手的项目常常包含进来,方便监控及支持将来的决策

Sales

销售

Transactions that shift the ownership or control of a product from the Company to a Customer

把产品的所有权或控制权从公司交给顾客的交易

•Sales Transaction

•Sales Transaction Detail

•Credit Memo

销售交易

销售交易明细

贷货通知

Sales is actually an associative subject area in that it is the intersection of the Customer, Store, Product, and so on. Some companies may choose to include the entities related to sales within one of those subject areas instead.

销售事实上是一个管理主题域,它和客户、存贮、产品等有关联。有些公司更愿意把销售作为这些有关主题域的一个实体。

 

Suppliers1

供应商

Legal entities that provide the company with goods and services

提供给公司商品和服务的法人实体

•Broker

•Manufacturer

•Supplier

经纪人

制造厂家

供应商

In the case of a contractor, the person doing the work is included in Human Resources, and the company that provides that person is included in Suppliers.

在承包人的情况,执行工作的人在人力资源主题里,提供人力的公司在供应商主题里。,

 

 

1另一个方法是用“伙伴”主题域代理顾客、外部组织、人力资源、供应商等。伙伴是一个有用的概念,避免在物理实现上的重复,区分主要的伙伴(如顾客、外部组织、人力资源、供应商)能使这些主题域模型更好理解及使用。

1 Another approach is to create “Parties” as a subject area in lieu of Customers, External Organizations, Human Resources, and Suppliers. While Parties may be a useful concept to avoid duplication in a physical implementation, distinguishing among the major parties (for example, Customers, External Organizations, Human Resources, and Suppliers) improves comprehension and usage of the subject area model.

 

进一步的指导,我们建议你考虑你所处的行业特征,下一节描述开发主题域模型时要考虑的具体的行业特征。

As a further aid, we recommend that you consider characteristics specific to your industry. The next section describes considerations for organizations in specific industries embarking on development of a subject area model.

 

2.1. 对具体行业的考虑

每个行业都有特征普通适合于这个行业内的公司,考虑这些不同点,创建主题域模型的模型可以更加简化。一些示例如下:

Considerations for Specific Industries

Each industry has characteristics that are common to companies within that industry. By understanding these distinctions, the process of creating the subject area model can be further simplified. Some examples follow.

 

零售行业

建立零售行业的主题域模型特别要考虑以下问题:

■■在零售行业,重点是常常按层次划分销售组织。因此,这个行业的公司倾向于把表3.3中的人力资源主题域分成两个主题域:人力资源和内部组织。

■■设施当然也是零售商感兴趣的,一个特殊的设施,仓库,常常分出来作为一个单独的主题域。

■■零售商一般不创造产品,常常指的是销售项,这回替代产品主题域,并对定义做相应的调整。

Retail Industry Considerations

Special considerations for building the subject area model in the retail industry are:

■■ Within the retail industry, major emphasis is often placed on the sales organization hierarchy. Companies in this industry would, therefore, tend to separate the Human Resources subject area as described in Table 3.3 into two subject areas: Human Resources and Internal Organizations.

■■ While facilities are certainly of interest to retailers, one particular facility, the Store, is often of major interest. As a result, stores are sometimes distinguished as a separate subject area.

■■ Retailers typically don’t create products and often refer to what they sell as Items. This would replace the Products subject area, with the definition adjusted accordingly.

 

制造业:

建立制造业的主题域模型特别要考虑以下问题:

■■在制造业,制造设施受到特别关注,因此常常单独作为一个主题域。

■■制造过程常常产生废物,而且有法律管理废物。废物有时作为独立的主题域。

Manufacturing Industry Considerations

Special considerations for building the subject area model in the manufacturing industry are:

■■ Within the manufacturing industry, the manufacturing facilities are of particular interest, so these are often distinguished within a separate subject area.

■■ Waste is often produced as part of the manufacturing process, and there are laws that govern the waste. Waste is sometimes isolated as a separate subject area.

 

公用事业

建立公用事业的主题域模型特别要考虑以下问题:

■■在公用事业行业,对发电设备(例如,发电车间)特别感兴趣,这些常常分成独立的主题域。

■■电子网络或者气体管道包含物理与逻辑部件。物理部件由实际的电线、开关、管道、阀门等组成,逻辑部件由装载容量、网络拓扑结构等组成。有时会把这些分成两个独立的主题域:设备指物理部件,网络指逻辑部件。

Utility Industry Considerations

Special considerations for building the subject area model in the utility industry are:

■■ Within the utility industry, power-producing facilities (for example, power plants) are of particular interest, and these may be distinguished into separate subject areas.

■■ The electrical network or gas pipeline consists of both physical and logical components. The physical components consist of the actual wires, switches, pipes, valves, and so on; the logical components consist of the load-carrying capacity, network topology, and so forth. These are sometimes split into two subject areas with Equipment addressing the physical components and Networks addressing the logical components.

 

财产及意外伤害保险行业

建立财产及意外伤害保险行业的主题域模型特别要考虑以下问题:

■■财产及意外伤害保险行业常常处理保费、保单和索赔,每一种都被当作一个独立的主题域。

■■在财务主题域,这些公司也需要处理贮备金,因为贮备金的重要性,常常被当作一个独立的主题域。

■■顾客的定义需要调整为保单所有人及保单收益人,在某些方面,这类似于购买产品的顾客和使用产品的消费者。

Property and Casualty Insurance Industry Considerations

Special considerations for building the subject area model in the property and casualty insurance industry are:

■■ The property and casualty insurance industry typically deals with premiums, policies, and claims. Each of these is usually treated as a separate subject area.

■■ In the Financials subject area, these companies also need to deal with reserves, and due to the importance of the reserves, they could be treated in a separate subject area.

■■ The definition of customer needs to be adjusted to incorporate the concept of the party that owns an insurance policy and the party that may benefit from a claim. In some respects, this is similar to the concept of the customer who buys a product and the consumer who uses it.

 

石油行业

建立石油行业的主题域模型特别要考虑的是:油井和炼油厂会描述为设施,因为他们在这个行业的重要性,每一个都要有一个单独的主题域。

 

Petroleum Industry Considerations

A special consideration for building the subject area model in the petroleum industry is that wells and refineries could be described as facilities, but due to their significance within this industry, each deserves to be its own subject area.

 

医疗卫生行业

建医疗卫生行业的主题域模型特别要考虑以下问题:

■■在医疗卫生行业,有几种类型的供应商,包括保健设施、内科医师,药剂师等等。对每一类都要考虑他们在主题域模型的位置。

■■在有些医疗卫生行业的公司,唯一感兴趣的顾客是患者,这样,顾客主题域要改名为患者。

Health Industry Considerations

Special considerations for building the subject area model in the health industry are:

■■ There are several types of suppliers in the health industry, including the healthcare facility, the physician, the pharmacist, and so on. Consideration needs to be given to each of these to determine their positioning in the subject area model.

■■ In some companies within the health industry, the only customer of interest is the patient, and the Customers subject area would then be named Patients.

2.2. 主题域模型开发过程

在本章的前面我们已经提出,主题域模型可以在几天内完成开发。有三种主要开发主题域模型的方法:

■■封闭房间

■■面谈

■■小型会议

每一种方法,你都可以从零开始,也可以用一个通用的模型开始,两种方法都有效,选择基于具体的爱好和背景。这三种主要的方法总结在表3.4,我们推荐使用第三种方法——小型会议——如果方便的话。在后面章节我们会解释理由。

Subject Area Model Development Process

As stated earlier in this chapter, the subject area model can be developed in a matter of days. There are three major ways of developing the subject area model:

■■ Closed room

■■ Interviews

■■ Facilitated sessions

In each of the methods, you have the option of either starting from a clean slate or using a generic model as the starting point. Both approaches are valid, and the selection depends on the participants’ preferences and background. The three major methods are summarized in Table 3.4. We recommend that the third approach—the use of facilitated sessions—be used if feasible. We explain why in the sections that follow.

 

3.4 主题域模型开发方法比较

 

方法

描述

好处

缺点

封闭房间

数据建模员基于拥有的信息在隔离环境下开发,然后提交审批。

建模员理解流程

模型可以快速开发

 

建模员可能拥有的业务知识不够多。

业务人员没有参与感。

面谈

与关键业务代码一一面谈,建模员使用这些信息创建模型,然后提交审批。

每个人都可能参与模型开发。

参与者拥有业务知识。

获得一些业务所有权。

一一面谈花费更多的时间。

虽然得到了业务知识,但是没有取得一致意见。

小型会议

就是领导一群业务代表一起开发主题域模型。

参与者拥有业务知识。

得到业务所有权。

通过交流取得一致意见。

安排需要的参与者的日程可能比较困难。

 

 

Table 3.4 Subject Area Model Development Options

METHOD

DESCRIPTION

ADVANTAGES

DISADVANTAGES

Closed Room

Data modeler(s) develop the subject area model in a vacuum, based on information they have, and then submit it for approval.

•Modelers understand the possess

•A model can be developed quickly.

 

•Modelers may not process Sufficient . business knowledge.

•The business has no sense of ownership.

Interviews

Key business representatives are interviewed individually, and the modelers use this information to create the model.The result is then submitted for approval.

•Each person has the opportunity to contribute to the model. •Contributors possess thebusiness knowledge.

•Some business ownership is obtained.

 

•Individual interviews take more time.

•While business knowledge is obtained, consensus isn’t built.

 

Facilitated Sessions

A facilitator leads a group of business representatives in the development of the subject area model.

•Contributors possess the business knowledge.

•Business ownership is generated.

•Consensus is developed through the interaction.

•Scheduling the required

participants may be difficult.

 

 

 

2.2.1.    封闭开发

封闭开发使建模员自顾埋头苦干,而很少或者根本没有与业务人员参与,它的前提是建模专家拥有开发主题域模型所需要最重要的技能,且进一步假设建模员理解业务。当使用这种方法时,建模员基于他自己对业务的理解来开发模型。建模员把企业信息归纳成15——20个主要的组,每一个组作为一个主题域。一旦完成,建模员对每个主题域创建定义,并确保所有的定义互相排斥。

一般不推荐这种开发方法。建模员即使有很少对整个业务知识了解得足以开发一个持久的主题域模型。在很多方面,模型更像艺术,而不是科学。例如,建模员需要决定是否把人力资源作为一个独立的主题域,或者为职员、承包人、应聘这等等创建人力资源主题域,在加上一个独立的内部组织主题域用于处理职位、组织层次、工作分类等等。从建模来看,两种方法都是正确的,同时,定义影响主题域的范围。常常基于个人的偏好而做出决定,重要的是不是建模员来做决定。使用这种方法开发的模型提交审批后,业务代表倾向于把它当作另一个信息技术实践,使它很难获得支持。

有些情况这个方法是必要的。如果建模员不能得到足够的业务支持来创建模型,只能选择使用这种方法,否则不能创建模型。在这种情况下,有一个比较接近的模型总比没有模型好。建模员应当做好充分准备对模型进行调整及不断得到业务关于模型建设性的批评。虽然在只有少许业务支持的情况下,主题域模型的工作可以i继续,但是当开始开发业务数据模型时业务支持没有到来,就要严肃的考虑终止项目了。

 

Closed Room Development

Closed room development entails the modelers working on their own with little or no involvement by business representatives. It is in keeping with a philosophy that the modeling expertise is the most important skill needed in developing the subject area model. It further presumes that the modeler understands the business. When this approach is used, the modeler develops the subject area model based on his or her perceptions of the business. The process that the modeler typically uses consists of trying to group the enterprise’s information into 15–20 major groupings, each of which would be a subject area. Once this is done, the modeler would create a definition for each one and would ensure that all of the definitions are mutually exclusive.

This approach is generally not recommended. The modeler rarely, if ever, fully understands the entire business sufficiently to create a durable subject area model. There are some aspects of the model that are more art than science. For example, the modeler needs to decide whether to keep Human Resources as a single subject area or to create a Human Resources Subject Area for the employees, contractors, applicants, and so on, and a separate Internal Organizations Subject Area for the positions, organizational hierarchy, job classifications, and so on. Either approach is correct from a modeling perspective, as long as the definitions reflect the scope of the subject area. The decision is often based on people’s preferences and it is important that the modeler not be the one to make this decision. When a model developed using this approach is subsequently presented for review, the business representatives are prone to treat this as another information technology exercise, thus making it difficult to garner support for it.

There are circumstances under which this approach is necessary. If the modeler cannot get sufficient business support to create the model, then the choice becomes whether to use this approach or to have no model. When this situation exists, it is better to have a model that is likely to be close than to have no model at all. The modeler should fully expect that adjustments will be needed and should continuously try to gain constructive business criticism of the model. While work on the subject area model can proceed with minimal business support, if the business support is not forthcoming when work on the business data model begins, serious consideration should be given to halting the project.

2.2.2.    面谈开发

面谈是从各个业务代表处获得信息的优秀方法。建立面谈的第一个挑战是决定需要谁参与。因为主题域模型代表整个企业,从组织结构图开始是一个好多方法。建模员应该会见企业内代码主要部门的人员,根据他们现在的位置或者以前的位置。

要会见的企业代码人数大概10——15人,每个人都要求描述他/她所在领域的高层次的工作流。使用这些信息,面谈者应当尝试定义每个人感兴趣的主要的信息组及他们之间的交叉点。下面给出一个“与销售经理面谈”的示例。

Development through Interviews

Interviews provide an excellent means of obtaining information from individual business representatives. The first challenge in setting up the interviews is determining who needs to participate. Since the subject area model represents the entire enterprise, a good place to start is the organizational chart. The modeler should interview people who represent the major departments in the enterprise either by their current position or by virtue of their previous positions.

A reasonable representation of the enterprise should be available by interviewing 10–15 people. Each of these people should be asked to describe the high-level workflow in his or her area. Using this information, the interviewer should try to identify the major groupings of information of interest to each person and the interactions among them. A sample interview is provided in the “Interview with the Sales Executive” sidebar.

与销售经理面谈

以下是示例会谈的开始:

采访者:早上好,Jim(销售副总)。非常感谢你从百忙之中抽出时间与我谈谈。

(采访者应当简要介绍谈话的目的及从JIM那里得到信息的重要性)

采访者:请你概要的说说销售过程。

销售副总:客户来到我们商店四处看看,选择他们想要的东西,把它们放进购物车。在收款台,这些物品通过电子扫描,终端会提醒销售员介绍促销产品,然后销售员询问客户是否对这些感兴趣,并尝试获得客户的电话号码。如果数据库中已经存在,销售员会确认客户姓名和地址;如果是一个新客户,销售员尝试获得客户的姓名和地址并输入数据库。我们已经成功地获取了70%的客户信息。然后,客户离开商店。

采访者:基于我们的讨论,我定义了几个主要感兴趣的事情:客户、商店、销售人员、销售事务和商品,对不对?

销售副总:我们的销售人员可以获得促销信息,这给我们带来很大价值。我想这也和重要。

采访者:谢谢,我遗漏了这点。让我们谈谈这些事情之间的关系。客户来到这个商店——客户可以从其他地方购买吗?

销售副总:现在没有,但是我们正考虑建立一个电子商务平台。

采访者:是所有的客户都是个人客户,还是有一些作为组织的代表?

销售副总:我们的客户可能是消费者,也可能是企业代表。

采访者:对待这两类客户有什么不同吗?

(面谈继续,采访者基于从回答者获得的信息提出更深入的问题)

Interview with the Sales Executive

Following is the beginning of a sample interview:

Interviewer: Good morning, Jim (vice president of sales). I appreciate your taking time from your busy schedule to speak with me this morning.

 (The interviewer would then briefly explain the purpose of the interview and the importance of getting information from Jim’s perspective.)

Interviewer: Please describe the sales process to me at a high level.

Sales VP: Our customers come into our store and look around. They select items that they would like and then place them in a cart. At the checkout counter, the items they’ve selected are electronically scanned. The terminal alerts the salesperson to promotional items, then the salesperson asks the customer about his or her interest in these. The salesperson also tries to obtain the customer’s phone number. If it is already in our database, the salesperson confirms the customer’s name and address; if it is a new one, the salesperson tries to get the customer’s name and address and enters them into our database. We’ve been successful in obtaining information to identify about 70 percent of our customers. The customer then leaves the store.

Interviewer: Based on our discussion, I’ve identified the following major things of interest: customers, stores, salespeople, sales transactions, and items. Is that correct?

Sales VP: We gain a lot of value from having the promotional information available to our salespeople. I think that’s important, too.

Interviewer: Thanks, I missed that one. Let’s take a look at the relationships among these things. The customer comes into the store—can customers buy the items elsewhere?

Sales VP: Not at this time, but we’re considering establishing an electronic commerce facility.

Interviewer: Are all the customer’s individual consumers, or are some considered representatives of organizations?

Sales VP: Our customers may be either consumers or representatives of businesses.

Interviewer: Is there any difference in the treatment of the two types of customers?

(Interview continues with the interviewer delving further into items based on the answers received.)

 

面谈的主要产物应该是一系列主题域及其定义(从受访者的观点)。得到的这些信息用于帮助创建主题域模型,也为业务模型提供信息。通过深入面谈,我们较好的利用了业务代表的时间。之后创建业务模型时,我们可以以面谈得到的这些信息作为开始,把重点放在确认和提炼工作上。

One of the major products of the interview should be a set of subject areas and definitions from that person’s perspective. The information obtained will help create the subject area model and will also provide information for the business data model. By delving further within this interview, we make better use of the business representatives’ time. When we subsequently work on the business data model, we can start with the information we obtained from these interviews and then focus on confirmation and refinement.

技巧

在面谈前准备好一些问题,但不期望全部用到。这些问题提供一个好的检查单,以保证覆盖所有关键点;然而,一个好的采访者根据提供的信息与提供信息的方式调整面谈。面谈一结束,建模员要巩固这些信息。建模者可能会受到一些矛盾的信息,这些矛盾需要解决。有时,解决方法可能是使用最平常的事例,但是另一方面,需要主持一个讨论来澄清这些分歧。最后的主题域模型要提供给每一个受访者确认。根据个人的职位与技术部署,确认过程可能通过一个简短的讨论,而不是通过递交模型。

 

TIP

Go to an interview prepared with a set of questions, but don’t expect to use them all. The questions provide a good checklist for ensuring that key points are covered; however, a good interviewer adjusts the interview to reflect the information being provided and the way that it is provided. Once the interviews are completed, the modeler needs to consolidate the information. It is possible that the modeler will receive conflicting information, and these conflicts need to be resolved. Sometimes, the resolution may be one of using the most generalized case, but at other times, a discussion to clarify the differences may be needed. The resultant subject area model should be provided to each of the interviewees for verification. Depending upon the person’s position and technical disposition, the verification may be conducted through a brief discussion rather than through submission of the model for review.

 

 

 

2.2.3.    通过小型会议开发

作者发现使用小型会议是最快速有效的方法。会议的参与者包括各个业务领域的代表,这与面谈一样。最大的不同是,代表之间互相交流,而不是单独参与。但是有时把这些人召集到一起很困难,一旦做到,模型可以很快完成并在业务代码之间折中认同。主要的步骤是准备一到两次会议,在会议之间进行开发工作,并继续工作。如果开发组从头开始,需要两次会议;如果开发组使用一个现成的模型,可能同意一个会议就可以完成工作。

Development through Facilitated Sessions

The approach that the authors have found to be the most effective and efficient is the use of facilitated sessions. These sessions involve representatives of the various business areas, just as the interviews do. The significant difference is that the people are interacting with each other instead of providing individual contributions. While it is sometimes difficult to get the people together, when this is accomplished, the product is completed very quickly and reflects compromises with which the business representatives agree. The major steps in the process are preparation, one or two facilitated sessions, work between the facilitated sessions, and follow-on work. If the group is starting from a clean slate, two facilitated sessions will be needed; if the group is using a starter model, it may be possible to complete the effort in one session.

 

准备

准备工作包括选择和邀请参与者及后勤安排。至少要在会议前一到两周开始准备。会议成功的一个关键点是要让参与者理解会议的目的、过程及他们的角色。这要在邀请函中描述清楚。

Preparation

Preparation consists of selecting and inviting the participants and making the logistical arrangements. The preparation should be performed at least one to two weeks before the session. One of the keys to a successful session is to ensure that the participants understand the purpose, the process, and their role. These should be described in the invitation letter.

第一次会议

第一次会议的议程应包括以下几项:

介绍:参与者作自我介绍,讨论会议目标。

培训:对有关概念和过程进行培训。

头脑风暴:头脑风暴用于开发一个潜在主题域清单。

提炼:总结与提炼主题域清单,得到主题域。

结论:回顾会议结果,安排模型创建。

这个议程假设开发组从头开始,如果开发组以一个普通的或行业模型开始,可以使用下列议程:

介绍:参与者作自我介绍,讨论会议目标。

培训:对有关概念和过程,及开始模型进行培训。

讨论及提炼主题域:讨论开始模型里的主题域,推导出一系列主题域,对这些主题域的定义进行讨论和提炼。

提炼:总结与提炼主题域清单,得到主题域。

第一次会议议程中非常重要的一部份是培训。在会议的培训部份,演示者解释什么是主题域,如何区分及定义他们,它为何对后续的模型有益。会议过程(例如头脑风暴)与规则一起描述。

First Facilitated Session

The agenda for the first session should include the following items:

Introductions. The participants introduce themselves, and the session objectives are reviewed.

Education. Education is provided on the relevant concepts and on the process.

Brainstorming. Brainstorming is used to develop a list of potential subject areas.

Refinement. The list of potential subject areas is reviewed and refined to arrive at the set of subject areas.

Conclusion. The session results are reviewed, and assignments for definition creation are made.

 

This agenda presumes that the group will be starting with a clean slate. If the group starts with a generic or industry model, the following agenda would apply:

Introductions. The participants introduce themselves, and the session objectives are reviewed.

Education. Education is provided on the relevant concepts, on the process, and on the starter model.

Review and refinement of subject areas. The subject areas in the starter model are reviewed, and a set of subject areas is derived. Definitions for those subject areas are then reviewed and refined.

Refinement. The list of potential subject areas is reviewed and refined to arrive at the set of subject areas.

A critical part of the agenda for the first session is education. During the educational portion of the meeting, the facilitator explains what a subject area is, how it should be identified and defined, and why the resultant model is beneficial. The processes (for example, brainstorming) to be employed are also described along with the rules for the facilitated session.

技巧

如果组内有些成员了解这些概念,而有些不了解,可以在真正的会议之前组织一次培训会议。这个参与者提供选择,而不必强迫懂这个议题的人参加不情愿的培训。

 

TIP

If some members of the group understand the concepts and others don’t, consider having an educational session before the actual facilitated session. This provides the attendees with a choice and does not force people who know the topic to attend redundant education.

本节的剩余部分假设开发组不以一个现成模型开始。在培训会议之后,开发组投入一个头脑风暴会议提出潜在的主题域。在头脑风暴会议里,所有的发言都被记录下来,不经过任何讨论。因此,对人们定义主题域,及报表、过程、功能、实体、属性、组织等等来说不是罕见的。图3.1显示一个头脑风暴会议的潜在结果,事例为Zenith 汽车公司这样的制造厂商。如果你仔细地审视这些图表,你会看到大多数第二页及部分第三页指出了太细的细节。当这种情况发生时,指导者应该提醒开发组 定义主题域。

The remainder of this section presumes that the group is not beginning with a starter model. Following the educational session, the group engages in a brainstorming session to identify potential subject areas. In a brainstorming session, all contributions are recorded, without any discussion. It is, therefore, not uncommon for people to identify reports, processes, functions, entities, attributes, organizations, and so on, in addition to real subject areas. Figure 3.1 shows the potential result of such a brainstorming session for an automobile manufacturer such as the Zenith Automobile Company. If you look closely at the flip charts, you’ll see that most of the second sheet and part of the third sheet deviated into too great a level of detail. When this happens, the facilitator should remind the group of the definition of a subject area.

 

 

下一步是检查这些项目并排出不是潜在主题域的项目。每个项目都要讨论,如果没有取得一个潜在主题域一致的定义,她被移出,可能被其他取得一致定义的主题域代替。这个过程完毕后,清单上的主题域会减少,如图3.2 所示。一些转换活动如下:

The next step in the process is to examine the contributed items and exclude items that are not potential subject areas. Each item is discussed and, if it does not conform to the definition of a potential subject area, it is removed and possibly replaced by something that conveys the concept and could conform to the definition of a subject area. When this process is over, there will be fewer subject areas on the list, as shown in Figure 3.2. Some of the transformation actions that took place follow:

■■物品和产品用来指同一样事情,选用“汽车”术语,因为所有的物品和产品都是由汽车而来。而且,这些包含轿车、油漆、豪华轿车、部件、组件、发动机、二手车。

■■客户和消费者用于指同一样事情,选用“客户”术语。潜在客户也被吸收到这个领域。

■■变迁报表和销售分析报表用“报表”表示,并不再使用。

■■市场用“功能”表示,且不再使用,在讨论过程里,增加了广告和促销主题。

■■信用卡和贷款组合成付款方式。

■■职员和承包商组合成人力资源。

■■代理权和代理商认为是相同的,代理商选作主题域名称。

 

■■ ITEMS and PRODUCTS were determined to be the same thing and AUTOMOBILES was selected as the term to be used since all the products and items were driven by the automobiles. Further, these were found to encompass CARS, PAINT, LUXURY CAR, PARTS, PACKAGES, MOTORS, USED CARS.

■■ CUSTOMER and CONSUMER were determined to be the same thing and CUSTOMERS was selected as the term to be used. PROSPECTS was absorbed into this area.

■■ VARIANCE REPORT and SALES ANALYSIS REPORT were determined to be reports and eliminated.

■■ MARKETING was determined to be a function and was eliminated. During the discussion, ADVERTISEMENTS and PROMOTIONS were added.

■■ CREDIT CARD and LOAN were grouped into PAYMENT METHODS.

■■ EMPLOYEES and CONTRACTOR were combined into HUMAN RESOURCES.

■■ DEALERSHIPS and DEALERS were deemed to be the same, and DEALERS was chosen as the subject area.

这样整理之后,清单包含独立的数据组,但是其中有些比其他的更重要。下一步,要求开发组再次检查列表并把这些项目分组。例如,图3.2种列出了仓库、分发中心、工厂等,仓库和分发中心应该分组为一个潜在的主题域“设施”,和工厂一起作为主题域。这个过程完成后,最有可能的候选主题域已经定义了,如图3.3 所示。

The resultant list should consist solely of data groupings, but some may be more significant than others. Next, the group is asked to look at the list and try to group items together. For example, WAREHOUSES, DISTRIBUTION CENTERS, and FACTORIES are shown in Figure 3.2. WAREHOUSES and DISTRIBUTION CENTERS could be grouped into a potential subject area of FACILITIES, with FACTORIES also established as a subject area. When this process is over, the most likely candidates for the subject areas will have been identified, as shown in Figure 3.3.

这样第一次小型会议实际上完成了。在准备第二次会议的过程中,每一个主题域要分给两个人,每个人写出主题域的定义草稿,且每个主题域至少要包含三个实体。(有些人可能不只负责一个主题域)。这个工作应该在会议之后很快完成并且提交给指导者。建议小组成员在两次会议之间,指导者使用这些信息及主题域模型模版信息(如果有)来作为下次会议的开始。

This virtually completes the first facilitated session. In preparation for the next session, each subject area should be assigned to two people. Each of these people should draft a definition for the subject area and should identify at least three entities that would be included within it. (Some people may be responsible for more than one subject area.) The work should be completed shortly following the meeting and submitted to the facilitator. The group should be advised that on the intervening day, the facilitator uses this information and information from subject area model templates (if available) to provide a starting point for the second session.

 

巩固并准备第二次小型会议

在两次会议之中(至少有一天时间),指导者回顾定义与事例实体,并使用这些创建一个主题域的定义列表,这将用于第二次会议。指导者应当创建一个文档,显示已提供的文献及建议。例如,主题域客户,应该有以下文献:

文献1:“客户是那些购买或者准备购买产品的人。”示例实体有客户,批发商、潜在购买者。

文献2:“客户是那些获取我们产品用于内部消费的组织。” 示例实体有客户、客户子公司、购买代理。

主题域模版(前面表3.3所示)信息提供了一个客户的定义“获得/或者使用公司产品的个人与组织,”并且提供客户、潜在客户、消费者作为示例实体。使用这些信息,指导者应该包含“客户”信息如表3.5所示。每一个主题域都要提供类似的信息。

 

Consolidation and Preparation for Second Facilitated Session

During the period (potentially as little as one day) between the two facilitated sessions, the facilitator reviews the definitions and sample entities and uses these to create the defined list of subject areas that will be used in the second facilitated session. The facilitator should create a document that shows the contributions provided, along with a recommendation. For example, for the subject area of Customers, the following contributions could have been made:

Contribution 1. “Customers are people who buy or are considering buying our items.” Sample entities are Customer, Wholesaler, and Prospect.

Contribution 2. “Customers are organizations that acquire our items for their internal consumption.” Sample entities are Customer, Customer Subsidiary, and Purchasing Agent.

 

The subject area template information (previously shown in Table 3.3) provides a definition of Customers as “People and organizations who acquire and/or use the company’s products,” and provides Customer, Prospect, and Consumer as sample entities. Using this information, the facilitator could include the information for CUSTOMERS shown in Table 3.5. Similar information would be provided for each of the subject areas.

 

第二次会议

第二次会议的议程包括以下几项:

回顾:回顾第一次会议的成果及以后所作的工作。

提炼:回顾和提炼主题域及他们的定义。

关系。创建每对主题域之间的主要关系。

结论:复习模型,讨论没有解决的问题,定义下一步的工作。

Second Facilitated Session

The agenda for the second session should include the following items:

Review. The results of the first session and the work performed since then are reviewed.

Refinement. The subject areas and their definitions are reviewed and refined.

Relationships. Major relationships between pairs of subject areas are created.

Conclusion. The model is reviewed, unresolved issues are discussed, and follow-up actions are defined.

第二次会议的成功高度依赖于每个参与者是否及时完成分派的任务及指导者编辑的文档。在每个主题域的讨论时间,会指出他们的限制。如果主题域没有在分配的结束时间完成,完成剩下工作的责任会分给小组其他成员。常常,剩下的工作会包含提炼定义的用词(而不是意义)。

在所有的主题域讨论完成之后,定义主题域之间的主要关系,并绘制主题域草图。这个步骤是最不严格的,因为主题域的关系可以自然地从业务数据模型得出。第二次会议一个严格的最后步骤是开发问题列表及下一步的计划。

The success of the second session is highly dependent on each of the participants completing his or her assignment on time and on the facilitator compiling a document that reflects the input received and best practices. A limit should be placed on the discussion time for each subject area. If the subject area is not resolved by the end of the allotted time, the responsibility to complete the remaining work should be assigned to a member of the team. Often, the remaining work will consist of refining the wording (but not the meaning) of the definition.

After all of the subject areas have been discussed, the major relationships among the subject areas are identified and the resultant subject area diagram is drawn. This step is the least critical one in the process because the subject area relationships can be derived naturally from the business data model as it is developed. A critical final step of the second facilitated session is the development of the issues list and action plan.

下一步工作

问题列表和工作计划是第二次会议非常重要的产品,因为它提供了保证下一步工作完成的方法。问题列表包含会议中提出的需要解决的问题。每一项都应包含负责人的名称和期限。工作计划大概列出开发主题域模型剩下的工作步骤。常常,会议的产品能很快应用于支持业务数据模型的开发,在提炼工作完成以后。

 

Follow-on Work

The issues list and action plan are important products of the second facilitated session, since they provide a means of ensuring that the follow-on work is completed. The issues list contains questions that were raised during the session that need to be resolved. Each item should include the name of the person responsible and the due date. The action plan summarizes the remaining steps for the subject area model. Often, the product of the session can be applied immediately to support development of the business data model, with refinements being completed over time based on their priority.

2.2.4.    主题域模型的好处

不管主题域模型能开发可以多快,也只有在能得到好处的情况下才值得付出努力。在第二章已经列出了三个主要的好处:

主题域模型指导业务模型的开发。

它影响数据仓库项目选择。

它指导数据仓库开发项目。

主题域模型是一个帮助建模员组织工作及帮助为数据库仓库的工作的多个项目小组之间了解领域之间的重叠部分。工具条显示主题域模型如何用于辅助数据仓库项目的定义和选择。

Subject Area Model Benefits

Regardless of how quickly the subject area model can be developed, the effort should be undertaken only if there are benefits to be gained. Three major benefits were cited in Chapter 2:

■■ The subject area model guides the business data model development.

■■ It influences data warehouse project selection.

■■ It guides data warehouse development projects.

The subject area model is a tool that helps the modeler organize his or her work and helps multiple teams working on data warehouse projects recognize areas of overlap. The sidebar shows how the subject area model can be used to assist in data warehouse project definition and selection.

 

2.3. Zenith汽车公司的主题域模型

Zenith 汽车公司潜在的主题域模型如图3.5。只显示出了需要回答业务问题和客户的主题域。

数据仓库项目定义与选择

3.4显示需要回答Zenith汽车公司业务问题的主要主题域。

Subject Area Model for Zenith Automobile Company

A potential subject area model for the Zenith Automobile Company is provided in Figure 3.5. Only the subject areas needed to answer the business questions and Customers are shown.

Data Warehouse Project Definition and Selection

Figure 3.4 shows the primary subject areas that are needed to answer the business questions for the Zenith Automobile Company.

使用图3.4的信息,显示一个逻辑的实现顺序,首先开发汽车、代理商、销售组织主题域,因为实际上所有的问题都依赖于他们。工厂或激励程序程序可以在下一步开发,紧接着开发其中剩下的另一个。因为提车的业务问题,不需要关于客户和供应商的任何信息,即使问题3——问题7是最重要的,他们也不应该第一步引入。得出这个结论的理由是,为了回答这些问题,你仍然需要其他三个主题域的信息。

这是一个迭代开发方法的示例,这样数据仓库增量创建,一直盯着最终的目标。

Using the information in Figure 3.4, a logical implementation sequence would be to develop the Automobiles, Dealers, and Sales Organizations subject areas first since virtually all the questions are dependent on them. Factories or Incentive Programs could be developed next, followed by the remaining one of those two. For the business questions posed, no information about Customers and Suppliers is needed. Even if the business considered question 3 or 7 to be the most significant, they should not be addressed first. The reason for this conclusion is that in order to answer those questions, you still need information for the other three subject areas.

This is an example of the iterative development approach whereby the data warehouse is built in increments, with an eye toward the final deliverable.

每一个主题域定义如下:

■■汽车是一种交通装置,相关的部件由Zenith汽车公司制造并由代理商销售。

■■客户是从代理商处获得汽车及其部件的伙伴。

■■代理商是授权出售Zenith汽车公司制造的汽车和部件的经销人。

■■工厂是Zenith汽车公司制造汽车及其部件的设施。

■■激励程序是鼓励汽车销售的一些考量。

■■销售组织是代理商按照利润的信息的分组。

3.6提供了一个零售公司的潜在主题域模型。这个模型作为第5章到第8章的学习案例参考。

 

Definitions for each subject area follow:

■■ Automobiles are the vehicles and associated parts manufactured by Zenith Automobile Company and sold through its dealers.

■■ Customers are the parties that acquire automobiles and associated parts from Dealers.

■■ Dealers are agencies authorized to sell Zenith Automobile Company automobiles and associated parts.

■■ Factories are the facilities in which Zenith Automobile Company manufactures its automobiles and parts.

■■ Incentive Programs are financial considerations designed to foster the sale of automobiles.

■■ Sales Organizations are the groupings of Dealers for which information is of interest.

Figure 3.6 provides a potential subject area model for a retail company. This model is provided as a reference point for some of the case studies used in Chapters 5–8.

 

每一个主题域的定义示例如下:

■■沟通指通讯及用于通讯的媒体。

■■客户是取得或使用公司商品的人或组织。

■■设备是可移动的机械、装置、工具及其集成组件。

■■人力资源为公司完成工作的个人。

■■财务是关于公司收取、保有、期望或跟踪金钱等的信息。

■■内部组织是人力资源所属的正式或非正式的分组。

■■物品是公司或其竞争对手提供的商品或服务。

■■位置是地理点或区域。

■■其他设施是实体资产或其他构造,及其集成组件,商店除外。

■■销售是把物品所有权或控制权从公司交给客户的过程。

■■商店是销售发生的地方,包括小货摊。

■■卖主是制造或提供物品给公司的法人实体。

 

Sample definitions for each of the subject areas follow.

■■ Communications are messages and the media used to transmit the messages.

■■ Customers are people and organizations who acquire and/or use the company’s items.

■■ Equipment is movable machinery, devices, and tools and their integrated components.

■■ Human Resources are individuals who perform work for the company.

■■ Financials is information about money that is received, retained, expended, or tracked by the company.

■■ Internal Organizations are formal and informal groups to which Human Resources belong.

■■ Items are goods and services that the company or its competitors provide or make available to Customers.

■■ Locations are geographic points and areas.

■■ Other Facilities are real estate and other structures and their integrated components, except stores.

■■ Sales are transactions that shift the ownership or control of an item from the Company to a Customer.

■■ Stores are places, including kiosks, at which Sales take place.

■■ Vendors are legal entities that manufacture or provide the company with items.

3.   业务数据模型

我们在第2章已经说明,模型是事物的抽象和表示,表示或者演示原事物的部分或全部。业务模型是一种模型,它是数据在一个指定业务环境的抽象和表示,它帮助人们直观的了解业务信息之间的关系(“各部分如何组合起来”) 。应用业务数据模型的产品包括应用系统,数据仓库,数据集市。而且,模型提供数据库的元数据(即关于数据的信息),帮助人们理解和使用最终的数据。主题域模型提供业务数据模型的基础,而且这个模型减少应用系统正确反映业务环境的开发风险。

 

Business Data Model

As we explained in Chapter 2, a model is an abstraction or representation of a subject that looks or behaves like all or part of the original. The business data model is one type of model, and it is an abstraction or representation of the data in a given business environment. It helps people envision how the information in the business relates to other information in the business (“how the parts fit together”). Products that apply the business data model include application systems, the data warehouse, and data mart databases. In addition, the model provides the meta data (or information about the data) for these databases to help people understand how to use or apply the final product. The subject area model provides the foundation for the business data model, and that model reduces the development risk by ensuring that the application system correctly reflects the business environment.

 

3.1. 业务数据开发过程

如果象本节描述的,业务数据模型还不存在,那么在着手数据仓库数据模型之前应该先进行这部分开发。开发业务数据模型的过程不能缺少第一步:定义参与者。在理想情况下,数据管理者与建模员联合开发业务数据模型。大多数公司没有正式的数据管理人员,而且业务中心( 有时是信息技术中心)可能也看不到开发业务数据模型的价值。总而言之,它耽误了写代码!业务数据模型的好处已经在第二章罗列,但是在缺乏正式的数据管理人员的情况下,数据建模员需要指定关键业务代表,他们必须具有必要的知识和权利去做出有关数据定义和关系的决定。这些常常叫做“主题专家”,简称SME。一旦这些被确定,建模员需要得到他们的委托开始建模活动。这不是小事,常常需要做出折衷。例如,SME可能更愿意回答问题及评审进度,但是不愿意参与建模进程。在建模员理解了这些参与层次后,他/她应该评估减少SME参与完成模型的风险。

 

 

Business Data Development Process

If a business data model does not exist, as is assumed in this section, then a portion of it should be developed prior to embarking on the data warehouse data model development. The process for developing the business data model cannot be described without first defining the participants. In the ideal world, the data stewards and the data modelers develop the business data model jointly.

Most companies do not have formal data stewardship programs, and the business community (and sometimes the information technology community) may not see any value in developing the business data model. After all, it delays producing the code! The benefits of the business data model were presented  in Chapter 2, but in the absence of formal data stewards, the data modeler needs to identify the key business representatives with the necessary knowledge and the authority to make decisions concerning the data definitions and relationships. These are often called “subject matter experts” or SMEs (pronounced “smeeze”). Once these people are identified, the modeler needs to obtain their commitment to participate in the modeling activities. This is no small chore, and often compromises need to be made. For example, the SMEs may be willing to answer questions and review progress, but may not be willing to participate in the modeling sessions. After the modeler understands the level of participation, he or she should evaluate the risk of the reduced SME involvement to the accuracy and completeness of the model.

 

然后,建模员应该调整他/她的精力与时间,如果在SME委托的任务与计划任务之间存在显著的分歧时。开发一个完整的业务数据模型可能需要6——12个月,在这段时间内,没有切实的业务产出。然而,这可能是一个理论上正确的方法,实际上很少这样做。我们建议使用以下方法:

1、  定义主题域,用于项目迭代需要的数据。

2、  定义主题域内感兴趣的实体,并建立标识符。

3、  指定实体之间的关系。

4、  增加属性。

5、  确认模型结构。

6、  确认模型内容。

本节剩余部分描述这6个活动。

Then, the modeler should adjust his or her effort estimate and schedule if there is a significant difference between the SMEs’ committed level of involvement and the level that was assumed when the plan was created. Development of a complete business data model can take 6 to 12 months, with no tangible business deliverable being provided during that timeframe. While this may be the theoretically correct approach, it is rarely a practical one. We recommend using the following approach:

1. Identify the subject area(s) from which data is needed for the project iteration.

2. Identify the entities of interest within the affected subject area(s) and establish the identifiers.

3. Determine the relationships between pairs of entities.

4. Add attributes.

5. Confirm the model’s structure.

6. Confirm the model’s content.

The remainder of this section describes these six activities.

 

3.1.1.    定义有关的主题域

在这个案例中,主题域要回答的问题如图3.5所示,他们是:汽车代理商,工厂,激励机制,销售组织。在这个主题域模型中,还有其他的主题域,但是这些在首次迭代数据仓库时不需要。主题域模型的第一个应用给我们提供一个快速限定工作范围的方法。如果我们仅仅着眼于里面的几个问题,我们能进一步减少范围。例如,假设我们第一次迭代不回答问题3——7,那我们就不需要工厂和激励机制的信息,也不需要任何关于客户的信息。能够排除这些主题域是非常重要的。例如,客户数据,事实上是最难取得的数据之一。如果在开始时业务只关注与销售统计,那么关于具体客户的数据能够排除在第一次迭代的范围之外。这样,避免了对客户进行普遍的定义及解决多个客户文件的集成问题(在汽车业,关于客户的信息需要从代理商处获取)。请记住,排除一个主题域,并不是降低它的重要性,仅仅是降低定义业务规则的紧迫性,让数据仓库的后续提交物能够早点实现,如图3.7 。类似地,在开发主题域的细节时,应该把焦点放在本次跌代需要用到的实体上。如图3.7指出了使用主题域模型限定范围的好处。首先,这个项目可以分为多次迭代,每一次都比整个项目短。第二,迭代是可以重叠的(如果资源允许),这样进一步缩短了整了项目的时间。例如,第一次迭代的分析和建模一旦完成,就可以进行第二次迭代的分析建模,同时进行首次迭代的开发。有些工作需要其他工作已经完成,但是这可以通过很好的计划避免。能快速提供一个业务提交品,往往值得这个风险。

 

 

Identify Relevant Subject Areas

The subject areas with information needed to answer the questions posed in the scenario described for are shown in Figure 3.5. These are: Automobiles,Dealers, Factories, Incentive Programs, and Sales Organizations.

There are other subject areas in the subject area model, but these do not appear to be needed for the first few iterations of the data warehouse. This first application of the subject area model provides us with a quick way of limiting the scope of our work. We could further reduce our scope if we want to address only a few of the questions. For example, let’s assume that the first iteration doesn’t answer questions 3 and 7. To answer these questions, we don’t need any information from Factories and Incentive Programs, nor do we need information about Customers for any of the questions.

Being able to exclude these subject areas is extremely important. Customer data, for example, is one of the most difficult to obtain accurately. If the business is initially interested in sales statistics, then information about the specific customers can be excluded from the scope of the first iteration of the model. This avoids the need to gain a common definition of “customer” and to solve the integration issues that often exist with multiple customer files. (In the automotive industry, information about Customers requires cooperation from the Dealers.) It is important to remember that excluding a subject area has no bearing on its importance—it only has a bearing on the urgency of defining the business rules governing that area and hence the speed with which the next business deliverable of the data warehouse can be created, as shown in Figure 3.7. Similarly, in developing the details for the other subject areas, the focus should remain on the entities needed for the iteration being developed. Figure 3.7 points out several benefits of using the subject areas to limit scope.

First, the project can be subdivided into independent iterations, each of which is shorter than the full project. Second, the iterations can often overlap (if resources are available) to further shorten the elapsed time for completing the entire effort. For example, once the analysis and modeling are completed for the first iteration, these steps can begin for the second iteration, while the development for the first iteration proceeds. Some rework may be needed as additional iterations are pursued, but this can often be avoided through reasonable planning. The value of providing the business deliverables quicker is  usually worth the risk.

 

3.1.2.    定义主要的实体并建立标识符

一个实体是公司感兴趣的一个人,地点,事情,事件,或者一个概念,且公司有能力和意愿获取这些信息。常常通过听一个用户描述业务,或者阅读一个地区的描述文档,或者与主题域专建面谈的过程可以获得实体。我们得出结论,汽车、代理商、销售组织是回答前三个问题需要的信息。让我们检查:销售。

潜在实体应当通过头脑风暴会议、面谈、分析获得。不期望初始列表很完善。当模型开发出来后,实体要增加到列表中去,而原来的一些项目可能会删除,根据数据仓库首次迭代的具体情况而定。要定义每一个实体,但是在花太多时间在实体上前,建模员应该快速决定这个实体是否在本次迭代范围内。这样做的好处是显然的,定义一个实体需要花费时间,如果在意见不统一的时候还需要相当的讨论。在等到需要这个实体时,不只度过了时间,SME们也更加倾注于定义的工作,因为他们他们工作的重要性。

最后,模型会转换成物理数据库,每个表需要一个主键唯一定义一个实例。我们因此应该给每一个实体设计一个标识符。既然这是业务建模,我们不需要考虑这个标识符的物理属性,因此,我们可以对每一个实体简单的创建一个主键属性,叫“[实体名]ID”或者“[实体名]编码”。ID与编码的不同在“实体与属性建模习俗”里说明。大多数的建模工具产生需要的外键,我们的模型会包含层叠的外键。“实体与属性建模习俗”栏列出了一些我们建模时常用的实体和属性命名方法。表3.6是这个活动的产出物。

 

Identify Major Entities and Establish Identifiers

An entity is a person, place, thing, event, or concept of interest to a company and for which the company has the capability and willingness to capture information. Entities can often be uncovered by listening to a user describe the business, by reviewing descriptive documents for an area, and by interviewing subject matter experts. We concluded that information from three subject areas—Automobiles, Dealers, and Sales Organizations—is needed to address the first three questions. Let’s examine Sales.

Potential entities should be developed through a brainstorming session, interviews, or analysis. The initial list should not be expected to be complete. As the model is developed, entities will be added to the list and some items initially inserted in the list may be eliminated, particularly for the first iteration of the data warehouse. Each of the entities needs to be defined, but before spending too much time on an entity, the modeler should quickly determine whether or not the entity is within the scope of the data warehouse iteration being pursued. The reason for this screening is obvious—defining an entity takes time and may involve a significant amount of discussion if there is any controversy. By waiting until an entity is needed, not only is time better spent, but the SMEs are also more inclined to work on the definition since they understand the importance of doing so.

Eventually, the model will be transformed into a physical database with each table in that database requiring a key to uniquely identify each instance. We therefore should designate an identifier for each entity that we will be modeling. Since this is a business model, we need not be concerned with the physical characteristics of the identifier; therefore, we can simply create a primary key attribute of “[Entity Name] Identifier” or “[Entity Name] Code” for each entity. The difference between Identifier and Code is described in the “Entity- and Attribute-Modeling Conventions” sidebar, which shows the

entity-modeling conventions we’ve adopted. Most modeling tools generate foreign keys when the relationships dictate the need and, by including the identifier, our model will include the cascaded foreign keys. The “Entity- and Attribute-Modeling Conventions” sidebar summarizes the conventions we used to name and define entities and attributes. Table 3.6 presents the results of this activity for the entities of interest for the business questions that need to be answered.

 

实体与属性建模习俗

每个企业都应该建立给实体与属性命名、定义的规则。实体与属性表示面向业务的视图,命名习俗不仅局限于物理约束。一些要考虑的习俗如下。

实体命名习俗包括:

每一个实体应当有一个唯一的名字。

实体名应当首字母大写(介词和连词除外)

实体名应当由面向业务的术语组成。

使用不缩写的全词。

在单词之间使用空格。

使用单数名词。

避免冠词,介词和连词。

名字的长度没有限制(一个好的名字应该是Bill to Customer,一个不好的名字是BTC,或者Bill-to-Cust)。

 

Entity- and Attribute-Modeling Conventions

The rules for naming and defining entities and attributes should be established within each enterprise. Entities and attributes represent business-oriented views, and the naming conventions are not limited by physical constraints. Some of the conventions to consider are as follows.

Entity naming conventions include:

Each entity should have a unique name.

The entity name should be in title case (that is, all words except for prepositions and conjunctions are capitalized).

Entity names should be composed of business-oriented terms:

Use full, unabbreviated words.

Use spaces between words.

Use singular nouns.

Avoid articles, prepositions, and conjunctions.

The length of the name is not limited. (A good entity name would be Bill to Customer; a poor one would be BTC or Bill-to-Cust.)

 

属性命名习俗包括:

属性名应包含一到多个主单词,零到多个修饰符,一个类型单词。

主单词描述项目,它常常属性所在的实体同名。

限制符进一步描述项目。

类型词(如数量,名称)是项目类型的描述。

每个属性应有一个在实体内唯一的名字。如果同一个属性,除了主词以外(如过期日期、状态)用于多个实体,它应永远有同样的定义。

属性名应手字母大写。

每个属性名应由面向业务的术语组成。

使用不缩写的全词,名字的长度没有限制。

在单词之间使用空格。

使用单数名词。

避免冠词,介词,连接,如then,and等。

 

Attribute naming conventions include:

Attribute names should contain one or more prime words, zero or more modifiers, and one class word.

The prime word describes the item. It is often the same as the name of the entity within which the attribute belongs.

The qualifier is a further description of the item

The class word (for example, amount, name) is a description of the type of item.

Each attribute should have a unique name within an entity. If the same attribute, except for the prime word (for example, expiration date, status) is used in several entities, it should always have the same definition.

The attribute name should be in title case.

Each attribute name should be composed of business-oriented terms:

Use full, unabbreviated words. The length of the name is not limited.

Use spaces between words.

Use singular nouns.

Avoid articles, prepositions, and conjunctions such as “the” and “and.”

 

实体和属性定义习俗包括:

定义要使用一致的形式。

定义应自满足的。

定义应清晰、简洁。

定义不应嵌套递归,不用用同样的单词来定义自己。

定义应面向业务。

定义应互斥。

定义应独立于物理系统约束。

 

Entity and attribute definition conventions include:

Definitions should use consistent formats.

Definitions should be self-sufficient.

Definitions should be clear and concise.

Definitions should not be recursive. A word should not be used to define itself.

Definitions should be business-oriented.

Definitions should be mutually exclusive.

Definitions should be independent of physical system constraints.

 

在业务模型里,我们可提供一个属性用于描述(同时避免连接到一个实体把代码翻译成描述) .仅仅当我们把模型迁移到数据仓库时需要代码,用于保证使用有效的代码( 域约束可以实现) 或者节省存贮空间.当创建数据模型时,我们使用代码-描述实体.

 

In the business model, we can provide an attribute for the description (and avoid having a reference entity for translating the code into the description).The code is needed only when we migrate to the data warehouse, where it is used either to ensure that only valid codes are used (domain constraints can also accomplish this) or to reduce the storage requirements. We create code—description entities—when we build the data warehouse model.

3.1.3.    定义关系

在维护所有数据模型时,有必要使用建模工具.下面列出了市面上的一些常用工具.每一种工具都有各自的优缺点,但是每一种都只要提供基本的建模功能.每一种工具版本之间的不同在本书不讨论.

常用的数据建模工具包括;

■■Erwin, Computer Associates 公司

■■ ER Studio , Embarcadero 公司

■■ Oracle Designer,  Oracle公司

■■ Silverrun , Magna Solutions公司

■■ System Architect,  Popkin公司

■■ Visio , Microsoft 公司

■■ Warehouse Designer ,  Sybase 公司.

 

Define Relationships

A modeling tool is essential for developing and maintaining all data models. Some of the common tools on the market follow. There are advantages and disadvantages to each of the tools, but every one of them performs at least the basic modeling functions. The differences among the tools change with each release and hence are not described in this book.

 

Common data modeling tools include

■■ ERwin by Computer Associates

■■ ER Studio by Embarcadero

■■ Oracle Designer by Oracle

■■ Silverrun by Magna Solutions

■■ System Architect by Popkin

■■ Visio by Microsoft

■■ Warehouse Designer by Sybase

 

关系图形化的描述业务规则.下面列出在业务数据模型需要反应的部分业务规则:

■■ 汽车案制造厂家、款式、系列、颜色分类。

■■汽车在工厂制造。

■■ 一个选项包包含各种选项,每个选项都可以包含在多个选项包内。

■■ 汽车包含零到多个选项包。

■■ 汽车分配给代理商。

■■ 汽车由代理商销售。

这个规则可以通过和有关的主题域专家讨论获得。下一步是定义每对实体间的关系。图3.8 显示支持这些问题的实体。

 

The relationships diagrammatically portray the business rules. Following is a partial set of business rules that need to be reflected in the business data model.

■■ An automobile is classified by make, model, series, and color.

■■ An automobile is manufactured in a factory.

■■ An option package contains one or more options, each of which may be included in several option packages.

■■ An automobile contains zero, one, or more option packages.

■■ An automobile is allocated to a dealer.

■■ An automobile is sold by a dealer.

These rules would be uncovered through discussions with appropriate subject matter experts. The next step in the process is to define the relationships between pairs of entities. Figure 3.8 shows the entities needed in the model to support these questions.

 

小贴士:

业务数据模型的另一个信息来源是已经存在的其他系统。当使用已有系统系统时,建模员需要认识物理数据库的技术约束,以及设计者的一些假设(往往缺乏文档)。因此,不仅在业务模型里要考虑,在某些情况下,可能是模型的输入。发现的与已经存在系统的任何不同都需要记录好文档,用于转换规则。

 

TIP

Another source of information for the business data model is the database of an existing system. While this is a source, the modeler needs to recognize that the physical database used by a system reflects technical constraints and other (frequently undocumented) assumptions made by the person who designed it as well as erroneous or outdated business rules. It should, therefore, not be considered to be the business model, but it certainly can be used as input to the model. Any differences discovered in using a database from an existing system should be documented. These will be used when the transformation rules are developed.

3.1.4.      增加属性

属性是实体的事实或离散信息片。这样的一个属性已经包含在图里——标示符。其他属性要回答其他的业务问题。例如,关于库龄的问题。基于这个需求,存贮初始日期需要作为一个属性。

小贴士:

在业务模型里,随着时间改变的信息应尽可能贴上日历标签。例如,与其存贮库龄,不如记录仓库开放日期及最后维修日期。在数据仓库模型里,我们可以选择只存贮出生日期,或者既存贮出生日期,又存贮年龄。如果我们要做年龄分布分析,我们选择在集市里存贮年龄分布。(如果我们这样做,我们需要包含更新年龄分布的逻辑,否则,集市需要在每个装入周期重建)。

 

Add Attributes

An attribute is a fact or discrete piece of information pertaining to an entity. One such attribute has already been included in the diagram—the identifier. At this point, additional attributes needed to answer the business questions of interest are added. For example, the questions involving the Store requested information on the store’s age. Based on that requirement, the store inception date should be added as an attribute.

 

TIP

In the business model, information that changes with time should be tied to calendar dates whenever possible. For example, instead of store age, the date the store was opened or last renovated should be shown. In the data warehouse model, we have options on whether to store just the date of birth or both the date of birth and the age. If we’re doing analysis based on a range of ages, we may choose to store the age range in the mart. (If we choose this option, we will need to include logic for updating the age range unless the mart is rebuilt with each load cycle.)

 

数据仓库模型设计的一个难题是预见业务用户最终想要的属性。既然业务数据模型主要用于支持数据仓库,那么问题清单也要在此列出。导致这个困难的部分原因是因为业务用户不知道自己真正需要什么,直到他们使用这个系统的时候才会发现。找出这些潜在需求的来源有目前的报表、查询、源系统。这个问题在第4章更深入的讨论,那是建立数据仓库模型的第一步。

“实体属性建模习俗”栏列出了一些我们常用于定义实体和属性名称的习惯。图3.9显示了扩展的模型,包含了属性。针对实体,我们期望增加、删除、改变模型。

 

The difficulty with a data warehouse data model is anticipating the attributes that business users will eventually want. Since the business data model is being built primarily to support the warehouse, that problem manifests itself at this point. Part of the reason for the difficulty is that the business users truly do not know everything they need. They will discover some of their needs as they use the environment. Some sources to consider in identifying the potential elements are existing reports, queries, and source system databases. This area is discussed more thoroughly in Chapter 4 as part of the first step of creating the data warehouse data model.

The “Entity- and Attribute-Modeling Conventions” sidebar summarizes the conventions we used to name and define attributes. Figure 3.9 shows the expanded model, with the attributes included. As was the case with the entities, we should expect additions, deletions, and changes as the model continues to evolve.

 

3.1.5.    评审模型结构

业务数据模型应满足第三范式(在第二章有说明)。简言之,在第三范式里,每一个属性都依赖于实体的键,且是全部键,且只有主键。

记住,业务模型不需要提供好的性能,它永远不实现,那时数据仓库、操作型系统、数据集市等的后续模型需要的。在这个阶段,第三范式提供最大的灵活性、稳定性,一致性。

 

Confirm Model Structure

The business data model should be presented in what is known as “third normal form.” The third normal form was described in Chapter 2. By way of summary, in the third normal form, each attribute is dependent on the key of the entity in which it appears, on the whole key, and on nothing but the key.

Remember that the business model does not need to provide good performance. It is never implemented. It is the basis of subsequent models that may be used for a data warehouse, an operational system, or data marts. For that usage, the third normal form provides the greatest degree of flexibility and stability and ensures the greatest degree of consistency.

 

小贴士

业务数据模型一个纯粹的视图就是它是一个第三范式模型,只考虑逻辑视图(不考虑物理视图)。

好几个数据建模工具存贮物理属性,即实体对应表,属性对应字段。理论家仅仅在模型使用到具体应用(如数据仓库)时才增加这些。

一些实际的方法是业务数据模型里包含物理模型的信息。因为,好几个应用会使用这个业务模型,而且都是从拷贝这个模型的有关部分开始。如果不只一个应用需要同一个实体,那每一个应用都要花力气建立每个字段的数据类型等物理特性。这种复制工作减少了潜在的不一致性。更好的方法是使用建模工具创建部分信息。

熟练的建模员使用域定义来减少工作量,并提供灵活性。建模工具的域特性用于定义有效值、数据类型、非空属性等等。域的一个应用是这些特性的唯一组合,而不是定义每一列的物理特性,而只赋予域。这还提供未来变化的灵活性,进一步减少工作量。

TIP

A purist view of the business data model is that it is a third normal form model that is concerned only with the logical (and not physical) view. Several of the datamodeling tools store information about the physical characteristics of the table for each entity and about the physical characteristics of the column for each attribute.The theoretician would address these only when the model is applied for an application such as the data warehouse.

A more practical approach is to include some information pertaining to the physical model in the business model. The reason for this is that several applications will use the business model, and they start by copying the relevant section of the model. If more than one application needs the same entity, then each is forced to establish the physical characteristics such as datatype for the resultant table and its columns. This creates duplicate effort and introduces a potential for inconsistency. A better approach is to create some of this information within the business model in the modeling tool.

The use of domain definitions is another technique that experienced modelers use to minimize work and provide flexibility. The domain feature of the modeling tool can be used to define valid values, data types, nullability, and so on. One application of domains is to establish one for each unique combination of these, then instead of defining each of the physical characteristics of a column, it is merely assigned to a domain. In addition to reducing the workload, this provides the flexibility to accommodate future changes.

 

3.1.6.    评审模型内容

 

开发业务模型的最后一步,也可能是最重要的一步是评审模型内容,这通过与业务代表讨论完成,使用的技术很多。在与业务用户会谈时,建模员必须提醒,模型是最终产品的表示,它是一种描述业务的技术,这种方式方便系统和数据仓库的开发。一些业务代码可能既愿意也有能力评审模型,而另一些,可能需要建模员使用平常的语言提出各种问题,来验证业务规则和定义。例如,建模员可能需要引导一个面谈,通过向业务代表提问的方式来确认每一个业务规则。

Confirm Model Content

The last, and possibly most important, step in developing the business data model is to verify its content. This is accomplished through a discussion with business representatives. The techniques used vary. In meeting with the business users, the modeler must remember that the model is a means to an end. It is a technique for describing the business in a way that facilitates the development of systems and data warehouses. Some business representatives may be both willing and able to review the actual model. With others, the modeler may need to ask questions in plain English that verify the business rules and definitions. For example, the modeler may need to conduct an interview in which he or she confirms the relationships by asking the business representative if each of the business rules that the relationships represents is valid.

 

4.   小结

主题域模型与生俱来就是数据仓库的基础,因为数据仓库就是“面向主题的”。主题域模型提供一个组织业务数据模型的好方法。主题域模型为企业定义14-25个主要的组,每一个组与其它组互斥。主题域模型能在几天内创建出来,使用简便的会议。两个简便会议中的第一个包括有关概念的培训,头脑风暴出潜在主题域清单,并提炼清单。在第二次会议前初步定义这些主题,然后再会上对这些进行评审,确认主题域及其定义并进行提炼,给模型增加主要的关系,并评审模型,并且记录未解决的问题及确定下一步行动。

 

Summary

The subject area model is inherent in the foundation of the data warehouse, since the warehouse itself is “subject oriented.” The subject area model provides a good way of organizing the business data model. The subject area model identifies the 15–25 major groupings of significance to the company, with each one mutually exclusive of the others. The subject area model can be created in a few days, using facilitated sessions. The first of two facilitated sessions includes education on the relevant concepts, brainstorming a list of potential subject areas, and refinement of the list. Preliminary definitions are developed prior to the second meeting, at which the results of the first session and the work performed since then are reviewed, the subject areas and their definitions are reviewed and refined, major relationships are added to the model, and the model is reviewed. Unresolved issues and follow-up actions may also be identified.

业务数据模型是后面所有事情的基础。显著的错误会导致连锁效果,所以验证模型的结构与内容非常重要。业务数据模型描述了一个企业重要的信息,及这些信息如何联系起来。它完全独立于任何组织、功能、技术等等。它为任何应用系统的数据库设计提供坚实的基础,包括数据仓库。一个完整的业务数据模型很复杂,可能需要一年时间来完成。与其开发一个 完整的业务数据模型,建模员应创建一个现有业务问题需求的模型。

 

This business data model is the foundation of everything that follows. Significant errors can have a cascading effect, so it is very important to verify both the structure and the content of the model. The business data model describes the information of importance to an enterprise and how pieces of information are related to each other. It is completely independent of any organizational, functional, or technological considerations. It therefore provides a solid foundation for designing the database for any application system, including a data warehouse. A complete business data model is complex and can easily require a year to complete. Instead of developing a complete business data model, the data warehouse modeler should create only those portions of the model that are needed to support the business questions being asked. Within the scope of the business questions being asked, the business data model is developed by identifying the subject areas from which data is needed, identifying and defining the major entities, establishing the relationships between pairs of entities, adding attributes, conforming to the third normal form, and confirming the content of the model.

在业务问题的范围内,业务数据模型用于定义主题域,制定及定义主要的实体,建立每实体之间的关系,增加属性,使其满足第三范式,并且确认模型内容。

 

 

你可能感兴趣的:(数据仓库)