摘自: 《Data Mining - Concepts and Techniques》
According toWilliam H. Inmon, a leading architect in the construction of data warehouse
systems, “A data warehouse is a subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of management’s decision making process”
[Inm96]. This short, but comprehensive definition presents the major features of a data
warehouse. The four keywords, subject-oriented, integrated, time-variant, and nonvolatile,
distinguish data warehouses from other data repository systems, such as relational
database systems, transaction processing systems, and file systems. Let’s take a closer
look at each of these key features.
Subject-oriented: A data warehouse is organized around major subjects, such as customer, supplier, product, and sales.Rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.
Integrated: A data warehouse is usually constructed by integratingmultiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on.
Time-variant: Data are stored to provide information from a historical perspective (e.g., the past 5–10 years). Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.
Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data.