Codd's 12 Rules
Dr. E.F. Codd, an IBM researcher, first developed the relational data model in 1970. In 1985, Dr. Codd published a list of 12 rules that concisely define an ideal relational database, which have provided a guideline for the design of all relational database systems ever since.
I use the term "guideline" because, to date, no commercial relational database system fully conforms to all 12 rules. They do represent the relational ideal, though. For a few years, scorecards were kept that rated each commercial product's conformity to Codd's rules. Today, the rules are not talked about as much but remain a goal for relational database design.
Following is a list of Codd's 12 rules, including his original name for each rule and a simplified description. I also have included a note where certain rules are problematic to implement. Don't worry if some of these items are confusing to you, as we move further through this newsletter series we will fill in the details.
Rule 1: The Information Rule
All data should be presented to the user in table form. Last week's newsletter already discussed the basics of this rule.
Rule 2: Guaranteed Access Rule
All data should be accessible without ambiguity. This can be accomplished through a combination of the table name, primary key, and column name.
Rule 3: Systematic Treatment of Null Values
A field should be allowed to remain empty. This involves the support of a null value, which is distinct from an empty string or a number with a value of zero. Of course, this can't apply to primary keys. In addition, most database implementations support the concept of a nun- null field constraint that prevents null values in a specific table column.
Rule 4: Dynamic On-Line Catalog Based on the Relational Model A relational database must provide access to its structure through the same tools that are used to access the data. This is usually accomplished by storing the structure definition within special system tables.
Rule 5: Comprehensive Data Sublanguage Rule
The database must support at least one clearly defined language that includes functionality for data definition, data manipulation, data integrity, and database transaction control. All commercial relational databases use forms of the standard SQL (Structured Query Language) as their supported comprehensive language.
Rule 6: View Updating Rule
Data can be presented to the user in different logical combinations, called views. Each view should support the same full range of data manipulation that direct-access to a table has available. In practice, providing update and delete access to logical views is difficult and is not fully supported by any current database.
Rule 7: High-level Insert, Update, and Delete
Data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.
Rule 8: Physical Data Independence
The user is isolated from the physical method of storing and retrieving information from the database. Changes can be made to the underlying architecture ( hardware, disk storage methods ) without affecting how the user accesses it.
Rule 9: Logical Data Independence
How a user views data should not change when the logical structure (tables structure) of the database changes. This rule is particularly difficult to satisfy. Most databases rely on strong ties between the user view of the data and the actual structure of the underlying tables.
Rule 10: Integrity Independence
The database language (like SQL) should support constraints on user input that maintain database integrity. This rule is not fully implemented by most major vendors. At a minimum, all databases do preserve two constraints through SQL. No component of a primary key can have a null value. (see rule 3)
If a foreign key is defined in one table, any value in it must exist as a primary key in another table.
Rule 11: Distribution Independence
A user should be totally unaware of whether or not the database is distributed (whether parts of the database exist in multiple locations). A variety of reasons make this rule difficult to implement; I will spend time addressing these reasons when we discuss distributed databases.
Rule 12: Nonsubversion Rule
There should be no way to modify the database structure other than through the multiple row database language (like SQL). Most databases today support administrative tools that allow some direct manipulation of the datastructure. Over the life of this newsletter, I will be expanding on the concepts covered by each of Codd's rules. I will use the relational query language of choice, SQL, to illustrate these concepts and explain relational database structure in detail.