Data Model Guideline 数据模型指南
The Data Model captures the design of the persistent data stores used by the system. This guideline introduces Data Modeling and its use within RUP.
Data Models are used to design the structure of the persistent data stores used by the system. The Unified Modeling Language (UML) profile for database design provides database designers with a set of modeling elements that can be used to develop the detailed design of tables in the database and model the physical storage layout of the database. The UML database profile also provides constructs for modeling referential integrity (constraints and triggers), as well as stored procedures used to manage access to the database.
Data Models might be constructed at the enterprise, departmental, or individual application level. Enterprise and departmental level Data Models can be used to provide standard definitions for key business entities (such as customer and employee) that will be used by all applications within a business or a business unit. These types of Data Models can also be used to define which system in the enterprise is the "owner" of the data for a specific business entity and what other systems are users of (subscribers to) the data.
This guideline describes the model elements of the UML profile for database modeling used to construct a Data Model for a relational database. Because there are numerous existing publications on general database theory, it does not cover this area. For background information on relational Data Models and Object Models see Concept: Relational Databases and Object Orientation.
Note: The data modeling representations contained in this guideline are based on the UML 1.3. At the time that this guideline was developed, the UML 1.4 data-modeling profile was not available.
As described in [NBG01 ], there are three general stages in the development of a Data Model: conceptual, logical, and physical. These stages of data modeling reflect the different levels of detail in the design of the persistent data storage and retrieval mechanisms of the application. A discussion of conceptual data modeling is provided in Concepts Conceptual Data Modeling . Summaries of logical and physical data modeling are provided in the next two sections of this guideline.
In logical data modeling, the Database Designer is concerned with identifying the key entities and relationships that capture the critical information that the application needs to persist in the database. During the Use-Case Analysis , Use-Case Design , and Class Design tasks, the Database Designer and the Designer must work together to ensure that the evolving designs of the analysis and design classes for the application will adequately support the development of the database. During the Class Design task, the database designer and the designer must identify the set of classes in the Design Model that will need to persist data in the database.
This set of persistent classes in the Design Model provides a Design Model View that, although different from the traditional Logical Data Model, meets many of the same needs. The persistent classes used in the Design Model function in the same manner as the traditional entities in the Logical Data Model. These design classes accurately reflect the data that must be persisted, including all of the data columns (attributes) that must be persisted and key relationships. This makes these design classes an excellent starting point for the physical database design.
Creating a separate Logical Data Model is an option. However, in the best case it would end up capturing the same information in a different form. In the worst case it would not, and thus in the end might not meet the business needs of the application. In particular, if the database is intended to service a single application, then the application's view of the data might be the best starting point. The database designer creates tables from this set of persistent design classes to form an initial Physical Data Model.
Still, situations might exist that would require the database designer to create an idealized design of the database that is independent from the application design. In this case, the logical database design is represented in a separate Logical Data Model that is part of the overall Artifact: Data Model . This Logical Data Model depicts the key logical entities and their relationships that are necessary to satisfy the system requirements for persisting data consistent with the overall architecture of the application. The Logical Data Model might be constructed using the modeling elements of the UML profile for database design described in later sections of this guideline. For projects that use this approach, close collaboration between the application designers and the database designers is absolutely critical to the successful development of the database design.
The Logical Data Model might be refined by applying the standard rules for normalization as defined in Concept: Normalization prior to evolving the elements of the Logical Data Model to create the physical design of the database.
The figure below depicts the primary approach of using the Design Model classes as the source of logical database design information for creating an initial Physical Data Model. It also illustrates the alternative approach of using a separate Logical Data Model.
Logical Data Modeling Approaches
Physical data modeling is the final stage of development in the design of the database. The Physical Data Model consists of the detailed database table designs and their relationships created initially from the persistent design classes and their relationships. The mechanics of performing the transformation of the Design Model classes to tables is discussed in Guideline: Forward-Engineering Relational Databases . The Physical Data Model is part of the Data Model ; it is not a separate artifact.
The tables in the Physical Data Model have well-defined columns, as well as keys and indexes as needed. The tables might also have triggers defined as necessary to support the database functionality and referential integrity of the system. In addition to the tables, stored procedures have been created, documented, and associated with the database in which the stored procedure will reside.
The diagram below shows an example of some of the elements of the Physical Data Model. This example model is a part of the Physical Data Model of a fictional online auction application. It depicts four tables (Auction, Bid, Item, and AuctionCategory), along with one stored procedure (sp_Auction) and its container class (AuctionManagement). The figure also depicts the columns of each table, the primary key and foreign key constraints, and the indexes defined for the tables.
Example (Physical) Data Model Elements
The Physical Data Model also contains mappings of the tables to physical storage units (tablespaces) in the database. The figure below shows an example of this mapping. In this example, the tables Auction and OrderStatus are mapped to a tablespace called PRIMARY. The diagram also illustrates modeling the realization of the tables to the database (named PearlCircle in this example).
Example Data Storage Model Elements
On projects in which a database already exists, the database designer can reverse-engineer the existing database to populate the Physical Data Model. See Guideline: Reverse-engineering Relational Databases for more information.
This section describes the general modeling guidelines for each major element of the Data Model based on the UML profile for database modeling. A brief description of each model element is followed by an example illustration of the UML model element. The Relationships section of this guideline includes a description of the usage of the model elements.
Standard UML packages are used to group and organize elements of the Data Model. For example, packages might be defined to organize the Data Model into separate Logical and Physical Data Models. Packages might also be used to identify logically related groups of tables in the Data Model that constitute the major data "subject areas" of importance to the business domain of the application being developed. The figure below shows an example of two subject area packages (Auction Management and UserAccount Management) used to organize views and tables in the Data Model.
Subject Area Packages Example
In the UML profile for database modeling, a table is modeled as a class with a stereotype of < Table Example Tables might be related to other tables through the following types of relationships: identifying (composite aggregation) non-identifying (association) The Relationships section of this guideline provides examples of how these relationships are used. Information on how these types of relationships can be mapped to Design Model elements appears in Guideline: Reverse-engineering Relational Databases . A trigger is a procedural function designed to run as a result of some action on the table in which the trigger resides. A trigger is defined to execute when a row in the table is inserted, updated, or deleted. Additionally, a trigger is designated to execute either before or after the table command executes. Triggers are defined as operations in a table. The operations are stereotyped < Trigger Example Indexes are used as mechanisms for enabling faster access of information when specific columns are used to search the table. An index is modeled as an operation in the table with a stereotype of < Index Example A view is a virtual table with no independent persistent storage. A view has the characteristics and behaviors of a table and accesses the data in the columns from the table(s) with which the view has defined relationships. Views are used for providing more efficient access to information in one or more tables and also can be used to enforce business rules for restricting access to data in the tables. In the example below, an AuctionView has been defined as a "view" of information in the Auction table shown in the physical data modeling section of this guideline. Views are modeled as classes with the stereotype of < View Example A domain is a mechanism used to create user-defined datatypes that can be applied to columns across multiple tables. A domain is modeled as a class with the stereotype < Domain Example A stored procedure container is a grouping of stored procedures within the Data Model. A stored procedure container is created as a UML class that is stereotyped < A stored procedure is an independent procedure that typically resides on the database server. Stored procedures are documented as operations that are grouped into classes stereotyped as < Stored Procedure Container and Stored Procedure Example A tablespace represents the amount of storage space to be allocated to such items as tables, stored procedures and indexes. Tablespaces are linked to a specific database through a dependency relationship. The number of tablespaces and how the individual tables will be mapped to them depends on the complexity of the Data Model. Tables that will be accessed frequently might need to be partitioned into multiple tablespaces. Tables that do not contain large amounts of frequently accessed data might be grouped into a single tablespace. A tablespace container is defined for each tablespace. The tablespace container is the physical storage device for the tablespace. Although multiple tablespace containers can exist for a single tablespace, it is recommended that a tablespace container be assigned to only a single tablespace. Tablespace containers are defined as attributes to the tablespace; they are not explicitly modeled. Tablespace Example A schema documents the organization or structure of the database. A schema is represented as a package that is stereotyped < Schema Example A database is a collection of data that is organized such that the information in it can be accessed and managed. The management and access of information in the database is performed through the use of a commercial database management system (DBMS). A database is represented in the Data Model as a component that is stereotyped < Database Example The UML profile for database modeling defines the valid relationships between the major elements of the Data Model. The following sections provide examples of the different relationship types. A non-identifying relationship is a relationship between two tables that independently exist within the database. A non-identifying relationship is documented by using an association between the tables. The association is stereotyped < Non-Identifying Relationship Example An identifying relationship is a relationship between two tables in which the child table must coexist with the parent table. An identifying relationship is documented by using a composite aggregation between two tables. The composite aggregation is stereotyped as < Identifying Relationship Example For both the association and composite aggregation, multiplicity should be defined to document the number of rows in the relationship. In the example above, for each row in the UserAccount table, there can be 0 or more CreditCard rows in the CreditCard table. For each row in the CreditCard table, there is exactly one row in the UserAccount table. Multiplicity is also known as cardinality. When defining a database view's relationship with a table, a dependency relationship is used, drawn from the view to the table. The stereotype of the dependency is < View and Table Dependency Relationship Example A dependency relationship is used to link a tablespace to a specific database. As shown in the figure below, the relationship is drawn to show that the database has the dependency on the tablespace. Multiple tablespaces can be related to a single database in the model. Tablespace and Database Dependency Relationship Example A dependency relationship is used to document the relationships between tablespaces and the tables within a tablespace. One or many tables can be related to a single tablespace, and a single table can be related to multiple tablespaces. The example below shows that the table Auction is assigned to a single tablespace named PRIMARY. Table and Tablespace Dependency Relationship Example Realizations are used to establish the relationship between a database and the tables that exist within it. A table can be realized by multiple databases in the Data Model. Table and Database Realization Relationship Example A dependency relationship is used to document the relationship between the stored procedure container and the tables that the stored procedures within the stored procedure containers act upon. The example below depicts this type of relationship by showing that the stored procedure SP_Auction will be used to access information in the Auction table. Stored Procedure Container and Table Dependency Relationship Example In the inception phase , initial data modeling tasks might be performed in conjunction with the development of any proof-of-concept prototypes as part of the"Perform architectural synthesis activity" tasks. On projects in which a database already exists, the database designer might reverse-engineer the existing database to develop an initial Physical Data Model based on the structure of the existing database. See Guideline: Reverse-engineering Relational Databases for more information. Elements of the Physical Data Model might be transformed into Design Model elements as needed to support any proof-of-concept prototyping tasks. The goal of the elaboration phase is to eliminate technical risk and to produce a stable (baselined) architecture for the system. In large-scale systems, poor performance resulting from a badly designed Data Model is a major architectural concern. As a result, both data modeling and the development of an architectural prototype that allows the performance of the database to be evaluated are essential to achieving a stable architecture. As the architecturally significant use cases are detailed and analyzed in each iteration, Data Model elements are defined based on the development of the persistent class designs from the use cases. As the class designs stabilize, the database designer might periodically transform the class designs into tables in the Data Model and define the appropriate data storage model elements. By the end of the elaboration phase, the major database structures (tables, indexes, and primary and foreign key columns) must be put in place to support the execution of the defined architecturally significant scenarios for the application. In addition, representative data volumes must be loaded into the database to support architectural performance testing. Based on the results of performance testing, the Data Model might need to be adjusted with optimization techniques, including but not limited to de-normalizing, optimizing physical storage attributes or distribution, and indexing. Major restructuring of the Data Model must not occur during the construction phase . Additional tables and data storage elements might be defined during the construction phase iterations based on the detailed design of the set of use cases and approved change requests allocated to the iteration. A primary focus of database design during the construction phase is to continually monitor the performance of the database and optimize the database design as needed through de-normalizing, defining indexes, creating database views, and other optimization techniques. The Physical Data Model is the design artifact that the database designer maintains during the construction phase. It can be maintained by either making direct updates in the model or as a result of a tool reading updates that have been made directly on the database. The Data Model, like the Design Model, is maintained during the transition phase in response to approved change requests. The database designer must keep the Data Model synchronized with the database as the application goes through final acceptance test and is deployed into production. If a development team is using modern visual modeling tools that have the ability to convert classes to tables (and vice versa) and/or has the ability to reverse and forward engineer databases, then the team needs to establish guidelines for managing the transformation and engineering processes. The guidelines are primarily needed for large projects in which a team is working in parallel on the database and application design. The development team must define the points in the development of the application (build/release cycle) at which it will be appropriate to perform the class-to-table transformations and to forward-engineer the database. Once the initial database is created, the development team must define guidelines for the team to manage the synchronization of the Data Model and database as the design and code of the system evolve throughout the project. >. The columns in the table are modeled as attributes with the stereotype of <
Trigger
Index
View
Domain
Stored Procedure Container
Stored Procedure
Tablespace
Schema
Database
Relationships
Non-Identifying
Identifying
Database Views
Tablespace
Realizations
Stored Procedures
Evolution of the Data Model
Inception Phase
Elaboration Phase
Construction Phase
Transition Phase
Round-trip Engineering Considerations
你可能感兴趣的:(IBM,SDP,应用生命周期管理(ALM),企业管理软件)