2019 年数据仓库 BI 及 Data Science 最全书单


由于本篇文章,在 wordpress.com 站点上,原文可能并不是每个人都可以访问,具体原因大家都懂的。所以我就一字不差都转载过来,包括作者自己写的一本入门级数据仓库的书。



Disappointed with the Google search result of “data warehousing books”, I try to put all data warehousing books that I know into this page. It is totally understandable why Google’s search result don’t include ETL or Dimensional Modeling, for example. Same thing with Amazon, see Note 1 below. Even data warehouse books as important as Inmon’s DW 2.0 was missed because the title doesn’t contain the word “Warehouse”.

For data modelling my all time favorite is the Kimball’s toolkit (#1 in the list). Devlin’s, Inmon’s and Imhoff’s classics (#3, #4 and #5 in the list) have broaden my horizon on the basic principles of DW design. For ODS design it’s #17 and the newest model is in #6. If you are building a DW on SQL Server platform, Mundy’s Toolkit (#2) is a treasure. On Oracle, it’s Hobbs (#54) and on Teradata it’s Coffing’s series (#58 to #63). #7 to #11 explain Kimball’s theory in more detail. Some of them are dimensional modelling (Adamson’s #8 is excellent), some are about ETL (Kimball’s #7 is a jewel). For methodology/project management #11 is the classic, #27 is a proven treasure and #83 for the iterative approach.

  1. The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling by Ralph Kimball and Margy Ross

  2. Microsoft Data Warehouse Toolkit: With SQL Server 2005 and the Microsoft Business Intelligence Toolset by Joy Mundy, Warren Thornthwaite, and Ralph Kimball

  3. Building the Data Warehouse by W. H. Inmon

  4. Mastering Data Warehouse Design: Relational and Dimensional Techniques by Claudia Imhoff, Nicholas Galemmo, and Jonathan G. Geiger

  5. Data Warehouse: From Architecture to Implementation by Barry Devlin

  6. DW 2.0: The Architecture for the Next Generation of Data Warehousing by William H. Inmon

  7. The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data by Ralph Kimball and Joe Caserta

  8. The Star Schema Handbook: The Complete Reference to Dimensional Data Warehouse Designby Christopher Adamson

  9. The Data Webhouse Toolkit: Building the Web-enabled Data Warehouse by Ralph Kimball and Richard Merz

  10. Data Warehouse Design Solutions by Christopher Adamson and Michael Venerable

  11. The Data Warehouse Lifecycle Toolkit by Ralph Kimball, Margy Ross, Warren Thornthwaite, and Joy Mundy

  12. Building a Data Warehouse: with Examples on SQL Server by Vincent Rainardi

  13. Oracle Data Warehousing and Business Intelligence Solutions: With Business Intelligence Solutions by Robert Stackowiak, Joseph Rayman, and Rick Greenwald

  14. Impossible Data Warehouse Situations: Solutions from the Experts (Information Technology)by Sid Adelman, Joyce Bischoff, Jill Dyché, and Douglas Hackney

  15. Mastering Data Warehouse Aggregates: Solutions for Star Schema Performance by Christopher Adamson

  16. Data Warehouse Performance by W. H. Inmon, Ken Rudin, Christopher K. Buss, and Ryan Sousa

  17. Building the Operational Data Store by W. H. Inmon, Claudia Imhoff, and Greg Battas

  18. Rapid Data Warehouse Design: User-Focused Techniques for Designing Dimensional Data Warehouses by Lawrence Corr

  19. Data Warehouse Design: Modern Principles and Methodologies by Matteo Golfarelli and Stefano Rizzi

  20. Advanced Data Warehouse Design: From Conventional to Spatial and Temporal Applications (Data-centric Systems and Applications) by Elzbieta Malinowski and Esteban Zimányi

  21. Designing a Data Warehouse – Supporting Customer Relationship Management by Chris Todman

  22. Data Warehouses and OLAP: Concepts, Architectures and Solutions by Robert Wrembel and Christian Koncilia

  23. Implementing a Data Warehouse: A Methodology That Worked by Bruce Russel Ullrey

  24. Data Warehousing for Dummies by Thomas C. Hammergren

  25. Improving Data Warehouse and Business Information Quality : Methods for Reducing Costs and Increasing Profits by Larry P English

  26. Data Warehouse 100 Success Secrets – 100 Most Asked Questions on Data Warehouse Design, Projects, Business Intelligence, Architecture, Software and Models by Richard Martin

  27. Data Warehouse Project Management by Sid Adelman and Larissa T. Moss

  28. Data Warehouse Management Handbook by Kachur

  29. Data Warehouse: Extract, Transform, Load, Metadata, Data Integration, Data Mining, Data Warehouse Appliance, Database Management System, Decision Support System by Frederic P. Miller, Agnes F. Vandome, and John McBrewster

  30. Oracle Data Warehouse Tuning for 10g by Gavin JT Powell

  31. Using the Data Warehouse by W. H. Inmon and Richard D. Hackathorn

  32. Entity-attribute-value model: Data model, Data warehouse, Denormalization, Attribute- value system, Linked Data, Resource Description Framework, Semantic Web, Inner- platform effectby Frederic P. Miller, Agnes F. Vandome, and John McBrewster

  33. Index Structures for Data Warehouses: v. 1859 (Lecture Notes in Computer Science) by Marcus Jürgens

  34. Tivoli Data Warehouse Version 1.3: Planning And Implementation by IBM Redbooks and Vasfi Gucer

  35. Data Warehouse Implementations: Critical Implementation Factors Study by Joe Ganczarski

  36. The Enterprise Data Warehouse: Planning, Building and Implementation v. 1 by Eric Sperley and Hewlett-Packard

  37. Data Warehousing in the Real World: A Step-by-step Guide for Building Decision Support Data Warehouses by S. Anahory and D. Murray

  38. Filtering the Web to Feed Data Warehouses by Witold Abramowicz, Pawel J. Kalczynski, and Krzysztof Wecel

  39. Data Warehouse: Practical Advice from the Experts by Joyce Bischoff and Ted Alexander

  40. Leveraging DB2 Data Warehouse Edition for Business Intelligence by IBM Redbooks

  41. Fundamentals of Data Warehouses by Matthias Jarke, Maurizio Lenzerini, Yannis Vassiliou, and P. Vassiliadis

  42. Web-enabled Data Warehouse by William A. Giovinazzo

  43. Decision Support and Data Warehouse Systems by Efrem G Mallach

  44. Planning and Designing the Data Warehouse (The Data Warehousing Institute series) by Ramon Barquin and Herb Edelstein

  45. Data Warehouse Design by William A. Giovinazzo

  46. Building, Using and Managing the Data Warehouse (Data Warehousing Institute) by Ramon Barquin and Herb Edelstein

  47. Building a Data Warehouse for Decision Support by Vidette Poe and Laura L. Reeves

  48. Parallel Systems in the Data Warehouse (Data Warehousing Institute) by Steve Morse and David Isaac

  49. Decision Support in the Data Warehouse (The Data Warehousing Institute series) by Hugh J. Watson and Paul Gray

  50. Building a Better Data Warehouse by Don Meyer and Casey E. Cannon

  51. The Data Model Resource Book: A Library of Logical Data and Data Warehouse Models by Len Silverston, W. H. Inmon, and Kent Graziano

  52. Managing the Data Warehouse: Practical Techniques for Monitoring Operations and Performances Administering Data and Tools by W. H. Inmon, J. D. Welch, and Katherine L. Glassey

  53. The Intranet Data Warehouse: Tools and Techniques for Building Intranet-enabled Data Warehouse by Richard Tanler

  54. Oracle 10g Data Warehousing by Lilian Hobbs PhD, Susan Hillson MS in CIS Boston University, Shilpa Lawande, and Pete Smith

  55. Oracle9iR2 Data Warehousing by Lilian Hobbs, Susan Hillson MS in CIS Boston University, and Shilpa Lawande

  56. Oracle8i Data Warehousing by Lilian Hobbs PhD and Susan Hillson MS in CIS Boston University

  57. Oracle8i Data Warehousing by Michael J. Corey, Michael Abbey, Ben Taub, and Ian Abramson

  58. Tera-Tom on Teradata Basics by Tom Coffing and Gareth Walter

  59. Tera-Tom on Teradata Physical Implementation by W. Coffing and Mark Ferguson

  60. Tera-Tom on Teradata SQL by Tom Cofffing and Robert Hines

  61. Tera-Tom on Teradata Database Administrator by Tom Coffing and Steve Wilmes

  62. Tera-Tom on Teradata Designer by Tom Coffing and Todd Wilson

  63. Tera-Tom on Teradata Application Development by Tom Coffing and Scott Smith

  64. Tera-Tom on Teradata E-Business by Randy Volters and Tom Coffing

  65. Teradata SQL Unleash the Power V2R6 by Thomas L. Coffing and Michael Larkins

  66. Teradata Utilities – Breaking the Barriers by Tom Coffing, Morgan Jones, Mike Larkins, Steve Wilmes, Randy Volters

  67. Netezza SQL – Harness the Power by Mike Larkins and Tom Coffing

  68. Netezza Underground: The unauthorized tales of derring-do and adventures in resilient data warehousing solutions byDavid Birmingham

  69. Teradata Users Guide: The Ultimate Companion by Tom Coffing, Leona Coffing, Chris Coffing, and Robert Hines

  70. Teradata SQL Quick Reference Guide – Simplicity By Design by Tom Coffing, Todd Carroll, Robert Hines, and Mike Larkins

  71. Secrets of Best Data Warehouses in the World by Rob Armstrong, Tom Coffing, and Rolf Hanusa

  72. Common Warehouse Metamodel: An Introduction to the Standard for Data Warehouse Integration (Omg) by John Poole, Dan Chang, Douglas Tolbert, and David Mellor

  73. 50 Tb Data Warehouse Benchmark on IBM System Z by IBM Redbooks

  74. E-Business Intelligence Front-End Tool Access to Os/390 Data Warehouse by IBM Redbooks

  75. Rdb/vms: Developing a Data Warehouse by William H. Inmon and Chuck Kelley

  76. Data Warehouses: More Than Just Mining by Barbara J. Bashein and M. Lynne Markus

  77. Corporate Information with Sap(R)-Eis: Building a Data Warehouse and Mis-Application (Efficient business-computing) by Bernd-Ulrich Kaiser

  78. Dimensional Data Warehousing with MySQL: A Tutorial by Djoni Darmawikarta

  79. Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals by Paulraj Ponniah

  80. Data Warehousing, Data Mining, and OLAP (Data Warehousing/Data Management) by Alex Berson and Stephen J. Smith

  81. Data Warehousing: Architecture and Implementation by Mark W. Humphries, Michael W. Hawkins, and Michelle C. Dy

  82. Data Warehousing 101: Concepts and Implementation by Arshad Khan

  83. Agile Data Warehousing: Delivering World-Class Business Intelligence Systems Using Scrum and XP by Ralph Hughes

  84. e-Data: Turning Data Into Information With Data Warehousing by Jill Dyché

  85. Pentaho Solutions: Business Intelligence and Data Warehousing with Pentaho and MySQL by Roland Bouman and Jos van Dongen

  86. A Manager’s Guide to Data Warehousing by Laura Reeves

  87. Data Warehousing with SAP Bw7 Bi in SAP Netweaver 2004s: Architecture, Concepts, and Implementation by Christian Mehrwald and Sabine Morlock

  88. Data Warehousing: Using the Wal-Mart Model (The Morgan Kaufmann Series in Data Management Systems) by Paul Westerman

  89. Oracle DBA Guide to Data Warehousing and Star Schemas by Bert Scalzo

  90. Building and Maintaining a Data Warehouse by Fon Silvers

  91. Evolving Application Domains of Data Warehousing and Mining: Trends and Solutions by Pedro Nuno San-Banto Furtado

  92. Data Warehousing And Business Intelligence For e-Commerce (The Morgan Kaufmann Series in Data Management Systems) by Alan R. Simon and Steven L. Shaffer

  93. Data Warehousing with Informix: Best Practices by Angela Sanchez

  94. Data Warehousing: Concepts, Technologies, Implementations, and Management by Harry Singh

  95. Data Warehousing in Action by Sean Kelly

  96. High Performance Oracle Data Warehousing: All You Need to Master Professional Database Development Using Oracle by Donald K. Burleson

  97. Implementing Enterprise Data Warehousing: A Guide for Executives by Alan Schlukbier

  98. Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development: Innovative Methods and Applications (Advances in Data Warehousing and Mining (Adwm) Book Series) by Tho Manh Nguyen

  99. New Trends in Data Warehousing and Data Analysis (Annals of Information Systems) by Stanislaw Kozielski and Robert Wrembel

  100. Data Warehousing with Service-oriented Architecture: Designing and Implementing Prototype Models For an Integration of Near-Real-Time Data Warehousing Architecture with Service-oriented Architecture by Ronnie Abrahiem

  101. Encyclopedia of Data Warehousing and Mining, Second Edition by John Wang

  102. IBM Data Warehousing: With IBM Business Intelligence Tools by Michael L. Gonzales

  103. Clickstream Data Warehousing by Mark Sweiger, Mark R. Madsen, Jimmy Langston, and Howard Lombard

  104. Intelligent Data Warehousing: From Data Preparation to Data Mining by Zhengxin Chen

  105. Data Stores, Data Warehousing, and the Zachman Framework: Managing Enterprise Knowledge (Mcgraw-Hill Series on Data Warehousing and Data Management) by William H. Inmon, John A. Zachman, and Jonathan G. Geiger

  106. Progressive Methods in Data Warehousing and Business Intelligence: Concepts and Competitive Analytics (Advances in Data Warehousing and Mining) by David Taniar

  107. Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas

  108. AS/400 Data Warehousing: The Complete Guide to Implementation by Brian W. Kelly

  109. Data Warehousing and Data Mining for Telecommunications (Artech House Computer Science Library) by Rob Mattison

  110. Data Warehousing : Design, Development and Best Practices by Soumendra Mohanty

  111. Exploration Warehousing: Turning Business Information into Business Opportunity by William H. Inmon, R. H. Terdeman, and Claudia Imhoff

  112. The Data Model Resource Book: A Library of Logical Data and Data Warehouse Designs by Len Silverston, William H. Inmon, and Kent Graziano

  113. Data Warehousing in the Real World (A Practical Guide for Building Decision Support Systems)by Dennis Murray Sam Anahory

  114. Parallel Processing Techniques for Data Warehousing and Mining: Application and Challengesby Satchidananda Dehuri

  115. Essential Oracle8i Data Warehousing: Designing, Building, and Managing Oracle Data Warehouses by Gary Dodge and Tim Gorman

  116. The Essential Guide to Data Warehousing by Lou Agosta

  117. Data Warehousing OLAP and Data Mining by S. Nagabhushana

  118. Building the Customer-Centric Enterprise: Data Warehousing Techniques for Supporting Customer Relationship Management by Claudia Imhoff, Lisa Loftis, and Jonathan G. Geiger

  119. Data Warehousing: The Ultimate Guide to Building Corporate Business Intelligence (HOTT Guide) by SCN Education B.V.

  120. Data Warehousing and Knowledge Discovery: 9th International Conference, DaWaK 2007, Regensburg, Germany, September 3-7, 2007, Proceedings (Lecture Notes … Applications, incl. Internet/Web, and HCI) by Il Yeol Song, Johann Eder, and Tho Manh Nguyen

  121. Clinical Data Mining and Warehousing, An Issue of Clinics in Laboratory Medicine (The Clinics: Internal Medicine) by James Harrison Jr. MD PhD

  122. Using data warehousing to deliver integrated management information: Case studies of customer data integration using sales and marketing data marts by Shana Ponelis

  123. Data Warehousing and Knowledge Discovery: 6th International Conference, DaWaK 2004, Zaragoza, Spain, September 1-3, 2004, Proceedings (Lecture Notes in Computer Science) by Yahiko Kambayashi, Mukesh Mohania, and Wolfram Wöß

  124. Strategic Data Warehousing: Achieving Alignment with Business by Neera Bhansali

  125. Strategic Data Warehousing Principles Using SAS Software by Peter R. Welbrock

  126. Data Warehousing: The Route to Mass Communication by Sean Kelly

  127. Data Warehousing for E-Business by R. H. Terdeman, Joyce Norris-Montanari, Dan Meers, and William H. Inmon

  128. Data Warehousing and Knowledge Discovery: 10th International Conference, DaWak 2008 Turin, Italy, September 1-5, 2008, Proceedings (Lecture Notes in Computer … Applications, incl. Internet/Web, and HCI) by Il-Yeol Song, Johann Eder, and Tho Manh Nguyen

  129. Data Warehousing and Data Mining Techniques for Cyber Security (Advances in Information Security) by Anoop Singhal

  130. Data Warehousing and Decision Support : The State of the Art, Volume 1 by Pam Roth. Volume 2 is here.

  131. Advances in Database Technologies: ER ’98 Workshops on Data Warehousing and Data Mining, Mobile Data Access, and Collaborative Work Support and Spatio-Temporal … (Lecture Notes in Computer Science) by Yahiko Kambayashi, Dik Lun Lee, Ee-Peng Lim, and Mukesh Kumar Mohania

  132. Data Warehousing and Web Engineering by Shirley A. Becker

  133. ERP and Data Warehousing in Organizations: Issues and Challenges by Gerald G. Grant

  134. Data Warehousing and Knowledge Discovery: 8th International Conference, DaWaK 2006, Krakow, Poland, September 4-8, 2006, Proceedings (Lecture Notes in … Applications, incl. Internet/Web, and HCI) by A Min Tjoa and Juan Trujillo

  135. Data Warehousing Advice for Managers by Patricia L. Ferdinandi

  136. Data Warehousing and the Management Accountant (CIMA Research) by Ian Cobb

  137. Data Warehousing and Knowledge Discovery: 4th International Conference, DaWaK 2002, Aix-en-Provence, France, September 4-6, 2002. Proceedings (Lecture Notes in Computer Science) by Yahiko Kambayashi, Werner Winiwarter, and Masatoshi Arikawa

  138. Oracle Data Warehousing Unleashed by Michael Schrader, John Dakin, Kieron Hardy, and Matthew Townsend

  139. Journal of Healthcare Information Management, E-Healthcare Data Warehousing Journal of Healthcare Information Management, No. 2: Journal of Healthcare … Health Care Information Mgmt) by Julie Foreman

  140. Worldwide Data Warehousing Tools 2004 Vendor Shares by Dan Vesset

  141. Constructing Data Warehouses with Metadata-driven Generic Operators by Dr Bin Jiang.

  142. Testing the Data Warehouse Practicum by Doug Vucevic and Wayne Yaddow


  1. You may think that “data warehouse” search in Amazon would also include “data warehousing”. That was what I was thinking. But sadly no. I don’t hope Amazon search is smart enough to interpret that the term “ETL” or “Dimensional Model” has a lot to do with data warehousing either, hence my motive to create this list. Same for the term “ODS” and “data mart”.

  2. Data warehouse book as important as Inmon’s DW 2.0 was missed because the title doesn’t contain “Warehous*”. Sad. And Data Warehousing 101: Concepts and Implementation by Arshad Khan was missed when we search “Data Warehouse” in Amazon.

  3. I don’t limit myself on SQL Server. As you can see I also include Oracle ones. We can learn a lot about data warehousing from other platform, particularly the ETL. In fact I learnt a lot from a book called “Oracle 8i Data Warehousing” (Corey et al, not Hobbs & Hilson). Informix, DB2, MySQL, AS/400, SAS, are all in there now.

  4. I don’t include data modelling book in the list if it’s a general one. I only include it if it’s dimensional model.

  5. I don’t include “bundle”, e.g. several books packaged and sold as one. An example of a bundle is Kimball’s Toolkit bundle. The reason is because I have included the components individually.

  6. I don’t include data mining book if it’s only data mining. But if contains data warehousing as well then I include it. See Alex Berson’s for example. Ditto for MDM, BI, OLAP, DQ and Text Analytics. I do include Decision Support though (well of course)

  7. Can you believe it’s 123 books in data warehousing! That’s a lot of books for 1 area of study/work. And that exclude the things I mentioned above.

  8. If there are many editions of the book (like Inmon classic) I only include the latest one. First edition is an absolute treasure sometimes, like Kimball’s 1996 but there you go. When it’s a rewrite using different version of the software, I include them. For example: Oracle 8i, 9i and 10g Data Warehousing.

  9. I do include conference proceedings and lecture notes, despite that some people say they are not ‘real books’. I don’t care the physical form of it (thin, thick, non paper, etc), as long as the content is warehousing.

  10. Apologies there are many DW books in German which I don’t include here. Primarily because this is an English blog and I can’t write in German. Perhaps somebody else could make a list of these German DW books (there are really a lot of them, check in Amazon).

  11. I know there is a Data Warehousing book in MySQL. I know it exists because I know the author, who is also from Indonesia like me but he lives in Canada now. Djoni Darmawikarta. So I’ll find it and put it here too.

  12. I own Barry Devlin’s warehousing book. Very old, the binder is almost off, but the content is illuminating. Primarily because it was written free from Inmon & Kimball influence, hence it defined its owned principles of design. I’ll add it here.

  13. Intelligent Solution composed a comprehensive list of data warehousing articles, from 1993 to 2006.

My Book

I was sometimes asked by people who wanted to learn data warehousing to recommend a book for them. Some of them are database administrators/data architects (on various platforms) and some are developers (application developers and database developers). They know how to write SQL. They know how to create tables. They know how to query data. They are looking for a basic data warehousing book, which is practical and aimed for beginners. A book that can be used by new starters to build their first data warehouse, and the BI on top of it. A book that contains all the essential topics such as methodology, architecture, data modelling, ETL, data quality, reports, cubes and BI. A book that contains examples and illustrations from real projects which are easy to understand. For this reason I wrote a data warehousing book: Building a Data Warehouse: with Examples on SQL Server (#12).

It has 17 chapters:

  • Chapter 1 is about what a data warehouse is

  • Chapter 2 is about data warehouse architecture

  • Chapter 3 is about methodology / project management

  • Chapter 4 is about gathering requirements

  • Chapter 5 is about designing the data model, both dimensional and normalised

  • Chapter 6 is about the system architecture/servers and configuring the databases

  • Chapter 7 is about ETL (extracting data from source systems)

  • Chapter 8 is also about ETL (loading data into the warehouse)

  • Chapter 9 is about data quality

  • Chapter 10 is about metadata

  • Chapter 11 is about reports

  • Chapter 12 is about OLAP cubes

  • Chapter 13 is about BI (Business Intelligence)

  • Chapter 14 is about using a data warehouse for CRM

  • Chapter 15 is about unstructured data and data warehousing search

  • Chapter 16 is about testing

  • Chapter 17 is about operation and administration

2019 年数据仓库 BI 及 Data Science 最全书单_第1张图片

It contains all the essential topics in data warehousing. In order for this book to be able to be used to build the reader’s first data warehouse, and the BI on top of it, I need to give a case study. A case study that contain examples which span across all those chapters. From designing the architecture, to building the cubes and reports. For this purpose I had to choose a platform. I chose SQL Server as the platform. Not only it has an excellent database engine, it also comes with the ETL, reports, OLAP cubes and data mining tool built-in. SQL Server 2005/2008 is a complete end-to-end data warehousing solution. So in chapter 6 I use SQL Server database server to create the databases. In chapter 7 & 8 I use SSIS for data extraction and data loading (ETL). In chapter 10 I used SQL Server database for metadata. In chapter 11 I used SSRS for reports. In chapter 12 I used SSAS for OLAP cubes. And in chapter 13 I used SSAS for data mining. I hope this book will serve its purpose in providing a basic data warehousing book, which is practical and aimed for beginners


你可能感兴趣的:(2019 年数据仓库 BI 及 Data Science 最全书单)