Dimension Modeling Digest

The data warehouse must make an organization's information easily accessible.

The data warehouse must present the organization's information consistently.

The data warehouse must be adaptive and resilient to change.

The data warehouse must be a secure bastion that protects our information assets.

The data warehouse must serve as the foundation for improved decision making.

The business community must accept the data warehouse if it is to be deemed successful.

 

The fact table itself generally has its own primary key made up of a subset of the foreign keys. This key is often called a

composite or concatenated key. (P18)

 

You must avoid null keys in the fact table. A proper design includes a row in the corresponding dimension table to identify that

the dimension is not applicable to the measurement. (P49)

 

Promotion Coverage Factless Fact Table

(P49)

 

Degenerate Transaction Number Dimension

 

Degenerate dimensions often play an integral role in the fact table’s primary key. Often, the primary key of a fact table is a

subset of the table’s foreign keys. We typically do not need every foreign key in the fact table to guarantee the uniqueness of a

fact table row.

 

Operational control numbers such as order numbers, invoice numbers, and bill-oflading numbers usually give rise to empty

dimensions and are represented as degenerate dimensions (that is, dimension keys without corresponding dimension tables) in fact

tables where the grain of the table is the document itself or a line item in the document.

 

If, for some reason, one or more attributes are legitimately left over after all the other dimensions have been created and seem

to belong to this header entity, we would simply create a normal dimension record with a normal join. However, we would no longer

have a degenerate dimension. (P50)

 

Addition of a completely new data source involving existing dimensions as well as unexpected new dimensions. We should avoid

force-fitting new measurements into an existing fact table of consistent measurements. (P54)

 

Surrogate Keys

We strongly encourage the use of surrogate keys in dimensional models rather than relying on operational production codes.The

surrogate keys merely serve to join the dimension tables to the fact table. If the fifth through ninth characters in the

operational code identify the manufacturer, then the manufacturer’s name should be included as a dimension table attribute. Every

join between dimension and fact tables in the data warehouse should be based on meaningless integer surrogate keys. You should

avoid using the natural operational production codes. None of the data warehouse keys should be smart, where you can tell

something about the row just by looking at the key. (P58/59)

 

Market Basket Analysis/affinity grouping (P62)

 

Isolated, independent data marts are worse than simply a lost opportunity for analysis. (P81)

 

Unpredictable Changes with Single-Version Overlay (P103)

 

Role-playing in a data warehouse occurs when a single dimension simultaneously appears several times in the same fact table. The

underlying dimension may exist as a single physical table, but each of the roles should be presented to the data access tools in a

separately labeled view. (P111)

 

A junk dimension is a convenient grouping of typically low-cardinality flags and indicators. By creating an abstract dimension, we

remove the flags from the fact table while placing them into a useful dimensional framework. (P118)

 

We shouldn’t mix fact granularities (for example, order and order line facts) within a single fact table. Instead, we need to

either allocate the higher-level facts to a more detailed level or create two separate fact tables to handle the differently

grained facts. Allocation is the preferred approach. Optimally, a finance or business team (not the data warehouse team)

spearheads the allocation effort. (P122)

 

Packaging all the facts and conversion factors together in the same fact table row provides the safest guarantee that these

factors will be used correctly. The converted facts are presented in a view(s) to the users. (P132)

 

In a large organization, the customer dimension can be extremely deep (with millions of rows), extremely wide (with dozens or even

hundreds of attributes), and sometimes subject to rather rapid change. One leading direct marketer maintains over 3,000 attributes

about its customers.

The biggest retailers, credit card companies, and government agencies have monster customer dimensions whose sizes exceed 100

million rows. (P146)

 

Aggregated Facts as Attributes (P152)

 

Dimension Outriggers for a Low-Cardinality Attribute Set
Dimension outriggers are permissible, but they should be the exception rather than the rule. A red warning flag should go up if

your design is riddled with outriggers; you may have succumbed to the temptation to overly normalize the design. (P153)

 

Large Changing Customer Dimensions ---- rapidly changing monster dimension (P154)

 

The minidimension terminology refers to when the demographics key is part of the fact table composite key; if the demographics key

is a foreign key in the customer dimension, we refer to it as an outrigger. (P156)

 

The demographic dimension itself cannot be allowed to grow too large. If we have 5 demographic attributes, each with 10 possible

values, then the demographics dimension could have 100,000 (10^5) rows. This is a reasonable upper limit for the number of rows in

a minidimension. (P157)

 

The best approach for efficiently browsing and tracking changes of key attributes in really huge dimensions is to break off one or

more minidimensions from the dimension table, each consisting of small clumps of attributes that have been administered to have a

limited number of values. (P157)

 

The secret to building complex behavioral study group queries is to capture the keys of the customers or products whose behavior

you are tracking. You then use the captured keys to constrain other fact tables without having to rerun the original behavior

analysis. (P160)


Scripts in (P166)

 

Organization hierarchies and parts-explosion hierarchies may be represented with the help of a bridge table. This approach allows

the regular SQL grouping and summarizing functions to work through ordinary query tools. (P167)

 

Be very careful when simultaneously joining a single dimension table to two fact tables of different cardinality. In many cases,

relational systems will return the wrong answer. A similar problem arises when joining two fact tables of different granularity

together directly. (P170)

 

In general, to-date totals should be calculated, not stored in the fact table. (P177)

 

We'll drive stakes in the ground regarding the goals of the data warehouse while observing the uncanny similarities between  the

responsibilities of a data warehouse manager and thouse of a publisher.

 

We need to slice and dice the data every which way.

 

We also can identify items that should be nongoals for the magazine editor-in-chief. These would include such things as building

the magazine around the technology of a particular printing press, putting management's energy into operational efficiencies

exclusively, imposing a technical writing style that readers don't easily understand, or creating an intricate and crouded layout

that is difficult to peruse or read.

 

你可能感兴趣的:(BI)