The data warehouse must make an organization's information easily accessible.
The data warehouse must present the organization's information consistently.
The data warehouse must be adaptive and resilient to change.
The data warehouse must be a secure bastion that protects our information assets.
The data warehouse must serve as the foundation for improved decision making.
The business community must accept the data warehouse if it is to be deemed successful.
The fact table itself generally has its own primary key made up of a subset of the foreign keys. This key is often called a
composite or concatenated key. (P18)
You must avoid null keys in the fact table. A proper design includes a row in the corresponding dimension table to identify that
the dimension is not applicable to the measurement. (P49)
Promotion Coverage Factless Fact Table
(P49)
Degenerate Transaction Number Dimension
Degenerate dimensions often play an integral role in the fact table’s primary key. Often, the primary key of a fact table is a
subset of the table’s foreign keys. We typically do not need every foreign key in the fact table to guarantee the uniqueness of a
fact table row.
Operational control numbers such as order numbers, invoice numbers, and bill-oflading numbers usually give rise to empty
dimensions and are represented as degenerate dimensions (that is, dimension keys without corresponding dimension tables) in fact
tables where the grain of the table is the document itself or a line item in the document.
If, for some reason, one or more attributes are legitimately left over after all the other dimensions have been created and seem
to belong to this header entity, we would simply create a normal dimension record with a normal join. However, we would no longer
have a degenerate dimension. (P50)
Addition of a completely new data source involving existing dimensions as well as unexpected new dimensions. We should avoid
force-fitting new measurements into an existing fact table of consistent measurements. (P54)
Surrogate Keys
We strongly encourage the use of surrogate keys in dimensional models rather than relying on operational production codes.The
surrogate keys merely serve to join the dimension tables to the fact table. If the fifth through ninth characters in the
operational code identify the manufacturer, then the manufacturer’s name should be included as a dimension table attribute. Every
join between dimension and fact tables in the data warehouse should be based on meaningless integer surrogate keys. You should
avoid using the natural operational production codes. None of the data warehouse keys should be smart, where you can tell
something about the row just by looking at the key. (P58/59)
Market Basket Analysis/affinity grouping (P62)
Isolated, independent data marts are worse than simply a lost opportunity for analysis. (P81)
Unpredictable Changes with Single-Version Overlay (P103)
Role-playing in a data warehouse occurs when a single dimension simultaneously appears several times in the same fact table. The
underlying dimension may exist as a single physical table, but each of the roles should be presented to the data access tools in a
separately labeled view. (P111)
A junk dimension is a convenient grouping of typically low-cardinality flags and indicators. By creating an abstract dimension, we
remove the flags from the fact table while placing them into a useful dimensional framework. (P118)
We shouldn’t mix fact granularities (for example, order and order line facts) within a single fact table. Instead, we need to
either allocate the higher-level facts to a more detailed level or create two separate fact tables to handle the differently
grained facts. Allocation is the preferred approach. Optimally, a finance or business team (not the data warehouse team)
spearheads the allocation effort. (P122)
Packaging all the facts and conversion factors together in the same fact table row provides the safest guarantee that these
factors will be used correctly. The converted facts are presented in a view(s) to the users. (P132)
In a large organization, the customer dimension can be extremely deep (with millions of rows), extremely wide (with dozens or even
hundreds of attributes), and sometimes subject to rather rapid change. One leading direct marketer maintains over 3,000 attributes
about its customers.
The biggest retailers, credit card companies, and government agencies have monster customer dimensions whose sizes exceed 100
million rows. (P146)
Aggregated Facts as Attributes (P152)
Dimension Outriggers for a Low-Cardinality Attribute Set
Dimension outriggers are permissible, but they should be the exception rather than the rule. A red warning flag should go up if
your design is riddled with outriggers; you may have succumbed to the temptation to overly normalize the design. (P153)
Large Changing Customer Dimensions ---- rapidly changing monster dimension (P154)
The minidimension terminology refers to when the demographics key is part of the fact table composite key; if the demographics key
is a foreign key in the customer dimension, we refer to it as an outrigger. (P156)
The demographic dimension itself cannot be allowed to grow too large. If we have 5 demographic attributes, each with 10 possible
values, then the demographics dimension could have 100,000 (10^5) rows. This is a reasonable upper limit for the number of rows in
a minidimension. (P157)
The best approach for efficiently browsing and tracking changes of key attributes in really huge dimensions is to break off one or
more minidimensions from the dimension table, each consisting of small clumps of attributes that have been administered to have a
limited number of values. (P157)
The secret to building complex behavioral study group queries is to capture the keys of the customers or products whose behavior
you are tracking. You then use the captured keys to constrain other fact tables without having to rerun the original behavior
analysis. (P160)
Scripts in (P166)
Organization hierarchies and parts-explosion hierarchies may be represented with the help of a bridge table. This approach allows
the regular SQL grouping and summarizing functions to work through ordinary query tools. (P167)
Be very careful when simultaneously joining a single dimension table to two fact tables of different cardinality. In many cases,
relational systems will return the wrong answer. A similar problem arises when joining two fact tables of different granularity
together directly. (P170)
In general, to-date totals should be calculated, not stored in the fact table. (P177)
We'll drive stakes in the ground regarding the goals of the data warehouse while observing the uncanny similarities between the
responsibilities of a data warehouse manager and thouse of a publisher.
We need to slice and dice the data every which way.
We also can identify items that should be nongoals for the magazine editor-in-chief. These would include such things as building
the magazine around the technology of a particular printing press, putting management's energy into operational efficiencies
exclusively, imposing a technical writing style that readers don't easily understand, or creating an intricate and crouded layout
that is difficult to peruse or read.