Agile Database Refactoring with Hibernate

Agile Database Refactoring with Hibernate
by Gilad Buzi, Kelley Glenn, Jonathan Novich
06/11/2007

Your data model was near perfect when your application was first written. Since then, it has evolved. You've hacked, you've denormalized, and, as a result, you've spent countless hours in meetings ranting about the fixes you need to put in place.

Yet, you're ambivalent. Despite your cogent arguments, you're loath to putting together the "change-all-your-data-all-at-once" plan. It's just too risky. There are countless applications that directly read from and write to your database--you can't change all of them at once! If only you could only fix your data model one piece at a time, and one application at a time.

It's a typical scenario, really. Over time, IT organizations at small, medium, and large enterprises create disparate applications that access vital data stored in a centralized database. And slowly, moderately ill-designed data models start dragging down performance, scalability, and the overall efficiency of an organization.

In this article, we will show readers how to upgrade their faulty schemas and data models without affecting existing applications or processes. By using the latest technology from Hibernate (version 3.0 and up)--along with a combination of database views, stored procedures, and standard design patterns--application developers and data architects can repair a faulty data model, one piece at a time.

Steps to Follow

Here's how we'll do it:

  1. Dream up an improved data model: Agree on what's wrong with the current model and how you could fix it
  2. Develop database views: Based on the current (faulty) model, these views reflect how you would like your data model to be
  3. Develop stored procedures or "instead of" triggers: These will replace a standard table insert
  4. Develop POJOs, Hibernate mappings, and DAOs: Represent your new data model and tie it to your views
  5. Test, test, test: Prove your data model right

Now let's dive into the details of each one of these steps. But first, let's present the example at hand.

Our example is an overly denormalized order system. Instead of dividing the orders into an ORDER table and an ORDER_ITEM table, the original data designer decided to put all order information into one table, CUST_ORDER. We'd love to split this table into two, but how?

Figure 1 shows the original design.


Figure 1. Our data model before the DMA solution

Ok, let's get fixin'!

Dream Up an Improved Data Model

We decided that we could really split this up fairly easily. It'd be great to achieve something like in Figure 2.

Agile Database Refactoring with Hibernate_第1张图片
Figure 2. Our data model including the views that improve the overall design

By dividing the order data into two tables, we avoid data repetition and have a generally more sustainable data model. But how can we arrive at this model given our existing data structure?

Communication is key. Although this has little to do with coding and testing, it is an important point. Make sure to involve all stakeholders in the new design of your data model. This includes developers working on other applications that access this data, database administrators who will have to maintain the new data model, and finally technical managers and technical business analysts who may have their own ideas about where the data model should go. We really can't say enough about how important communication is.

Develop Database Views

To achieve our desired structure, we can define database views on top of our existing schema that use the current data in our overly denormalized table(s). Our views, however, will present this data in a normalized way. The ORDER_V view is really just a grouped and simplified version of the CUST_ORDER table (removing specific order item information and grouping by the order_id). Here's the definition:

CREATE VIEW dma_example.order_v
AS select
dma_example.cust_order.order_id AS order_id,
dma_example.cust_order.order_cust AS order_cust,
max(dma_example.cust_order.order_date) AS order_date
from dma_example.cust_order
group by dma_example.cust_order.order_id;

The ORDER_ITEM_V view captures only the order item details, ignoring the customer id and the date (information that can be obtained from the ORDER_V view). Here's the ORDER_ITEM_V's definition:

CREATE VIEW dma_example.order_item_v
AS select
dma_example.cust_order.order_id AS oi_order,
dma_example.cust_order.order_item AS oi_item,
dma_example.cust_order.order_item_price AS oi_price,
dma_example.cust_order.order_item_qty AS oi_qty
from dma_example.cust_order
where (dma_example.cust_order.order_item is not null);

So what we've basically done is split one table into two.

Stored Procedures or INSTEAD OF Triggers

We now want to be able to treat our new views as if they were tables--inserting, updating, and deleting to our hearts' content without actually worrying about what is going on behind the scenes. Although some views may be directly updatable without any further intervention on the part of the database designer, our views are a little more complex, and we want to make sure we control exactly how the database will affect the underlying (CUST_ORDER) table. The best way to do this is to define code on the database that will execute every time we try to execute one of these CUD operations against our views.


In most databases (MS SQL Server, Sybase, Oracle, DB2), we can define INSTEAD OF triggers (PostreSQL uses "rules" that behave similarly) that will be responsible for inserting, updating, and deleting records from the underlying table from which the view is defined. MySQL, however, does not currently support INSTEAD OF triggers. In their place, we can create stored procedures and, through careful configuration of Hibernate mapping files, call these stored procedures every time a CUD operation is triggered in our code (and persisted by Hibernate). Be it stored procedures or instead of triggers, the code is very similar.

Since our example uses MySQL, we will demonstrate our solution using stored procedures.

The code

Our stored procedures for inserting, updating, and deleting into our denormalized table must take into account all aspects of the denormalization: repetitive rows, additional fields, superfluous values, etc. When we use these stored procedures, the data model we created with the definition of nice, normalized views is turned back into the flawed, denormalized structure. Why? Because the rest of our applications are expecting the data to be presented in this way. In addition, our view definitions rely on the data to exist in the current structure.

So what does one of these procedures look like? Here's an example of inserting an item to the order:

create procedure insert_order_item
(in itemprice FLOAT, in itemqty INT, in orderid INT, in itemid INT)
LANGUAGE SQL
BEGIN
DECLARE p_order_id INT;
DECLARE p_cust_id INT;
DECLARE max_order_id INT;
DECLARE p_itemprice FLOAT;
-- apply the current price to the line item
if itemprice is null then
select prod_price into p_itemprice from product where prod_id=itemid;
else
set p_itemprice = itemprice;
end if;
-- get the customer id.
select order_cust into p_cust_id
from cust_order where order_id=orderid limit 1;
insert into cust_order
(order_id, order_cust, order_date,
order_item, order_item_price, order_item_qty)
values
(orderid, p_cust_id, now(), itemid, p_itemprice, itemqty);
END

Notice that whatever data is usually missing from the ORDER_ITEM_V view has to be sought out and inserted in the underlying CUST_ORDER table. This procedure, if successful in inserting into the CUST_ORDER table, will return the number of rows affected as 1. It is important to note that Hibernate expects either 1 or 0 as a result of these stored procedures, since it treats them as single rows in tables (even though they are really views). To ensure that this happens, we might have to throw little tricks into our stored procedures. For instance, the stored procedure to update an order may affect various rows in the CUST_ORDER table (one row for every order item). If we were to simply update all the rows with the given order ID, the rows' affected value returned would be greater than 1. Since this would present a problem for Hibernate, we use a small table and update it after the update to the CUST_ORDER table. This causes the stored procedure to return 1 as the number of affected rows (since the update we executed only affects one row). Here is what the stored procedure looks like:

create procedure update_order
(in ordercust INT, in orderdate DATETIME, in orderid INT)
LANGUAGE SQL
BEGIN
update cust_order set order_cust=ordercust,
order_date=orderdate
where order_id=orderid;
if row_count() > 0 then
update help_table set i=i+1;
end if;
END

POJOs, Hibernate Mappings, and DAOs

Creating the POJOs and Hibernate mappings for your new, view-based data model is fairly straightforward. There are, however, a couple of gotchas to keep in mind here.

Virtual foreign and primary keys

Although database views do not have foreign and primary keys, you should still map these in your solution's mapping file. This allows other developers to treat this new data model as if it were a true physical model. Furthermore, mapping these elements will ensure an almost seamless transition when you move on to a final solution based on real tables.

Overriding insert, update, and delete

When using stored procedures (you do not need to do this if your solution is implementing instead of triggers), you must override the insert, update, and delete calls with calls to your stored procedures. This is done by adding <sql-insert>, <sql-update> and <sql-delete> elements to the mapping. These elements tell Hibernate to call the given procedures instead of inserting, updating, and deleting directly to the database. Here is the ORDER_V mapping:

<?xml version="1.0"?>
<!DOCTYPE hibernate-mapping PUBLIC "-//Hibernate/Hibernate Mapping DTD 3.0//EN"
"http://hibernate.sourceforge.net/hibernate-mapping-3.0.dtd">
<hibernate-mapping>
<class name="org.onjava.shared.vo.Order" table="order_v" catalog="dma_example">
<id name="orderId" type="int" column="order_id" />
<property name="orderCust" type="int" column="order_cust" />
<property name="orderDate" type="timestamp" column="order_date" length="19"/>
<set name="items" inverse="true" cascade="all-delete-orphan" lazy="true">
<key column="oi_order"/>
<one-to-many class="org.onjava.shared.vo.OrderItem" />
</set>
<sql-insert callable="true">{call insert_order(?, ?, ?)}</sql-insert>
<sql-update callable="true">{call update_order(?, ?, ?)}</sql-update>
<sql-delete callable="true">{call delete_order(?)}</sql-delete>
</class>
</hibernate-mapping>

The parameter order is important here. Refer to the custom SQL reference of the Hibernate manual to determine the parameter order in your stored procedures.

Data access objects

Once the right mapping is in place, the data access objects for the view-based data model are identical to table-based models. Hibernate takes care of executing the stored procedures and treats the views much like tables. See this article's sample DMA solution for complete data access classes for the ORDER_V and ORDER_ITEM_V views.


Test, Test, Test

Extensive testing is one of the most important activities during the creation of a DMA solution. Only thorough testing can ensure a correctly functioning view-based (logical) data model. All aspects of the new data model must be explored in tests. And, of course, it is imperative to test both working cases and failing cases.


A great aid in automating testing is DBUnit. Although we won't go into great detail on how DBUnit works (a great OnJava article by Andrew Glover already does that) a couple of important pointers should be noted:

  • Data diversity: Make sure your test data clearly reflects all sorts of different data scenarios, including foreign key relationships and null values.
  • Dataset size: Although it is important to maintain a large enough data set to support all of your tests, keep in mind that DBUnit empties and reloads your data during every test method. Large data sets may lead to slow testing.

As far as the tests themselves, look at them as a way to exercise your DAOs and value objects. Here are some of the kinds of tests we recommend. For a closer look at the implementation, look at the sample code included with this article.

  • Find All: Make sure the DAO returns the expected number of rows.
  • Find one: Look up a record using the virtual primary key and make sure the correct record is returned with all the expected (column) values in place.
  • Insert: Insert a record and verify it has been inserted.
  • Insert multiple records: Make sure inserting is working for more than one skill at a time.
  • Insert duplicates: Attempt to violate the virtual primary key constraint.
  • Delete: Delete a record and verify that it has really been deleted.
  • Delete multiples: Do so only if your DAO design supports doing this.
  • Update: Update and make sure the update has been persisted to the DB.
  • Violate constraints: Make sure all possible virtual constraint conditions are tested.
  • Optimistic locking: There are several ways to generate optimistic locking exceptions. All of these should be attempted in order to verify the correct functioning of the optimistic locking exception mechanism. Four different kinds of optimistic locking conditions should be tested:
    • Delete a record when a record has already been deleted.
    • Delete a record when it has been updated since the last fetch (only possible if you are maintaining a version column).
    • Update a record that has been deleted.
    • Update a record that has been updated since the last fetch (only possible if you are maintaining a version column).

Once you have completed all of these tests, you can be confident that your new data model is fairly bomb-proof.

Peaceful Coexistence

Now that you have upgraded your application to use a sane data model, keep in mind that other applications will be accessing the same data using slightly different points of contact. This shouldn't worry you, it's just something to keep in mind. Figure 3 demonstrates how your new and improved application lives peacefully alongside the existing legacy applications.


Figure 3. A legacy application and the DMA solution coexisting peacefully and manipulating the same data set albeit through different models

Now What? Migration to a Permanent Data Model

So you've implemented this fancy solution to fix your data model. As the months go by, developers update their applications and begin to use this new view-based data model. But the underlying denormalization (or whatever faulty design exists) is still there, and you want to get rid of it. But how? It's actually simpler than you might think. Here is the step-by-step guide:

  1. Develop tables: These will look much like your views, but will have real foreign keys and indexes. Make sure to maintain the same column names as the views themselves.
  2. Load tables: You will load the tables with data from your already existing views. Select from view into table. It's really that simple.
  3. Modify your mappings: Change your Hibernate mappings to reflect the table names instead of the views and, if you used stored procedures, get rid of the <sql-insert>, <sql-update> and <sql-delete> elements (since now you won't need them).
  4. Test, test, test: Your original tests should work with absolutely no modification. Run them over and over again to prove that.

Voila! If you've done things right, not a single line of Java code needs to be modified, and your applications will behave exactly the same. This is where the true beauty of this kind of solution is evident. By abstracting the data model through Hibernate and database procedures, you can achieve an impressive change with little effort. Of course, this doesn't mean that you shouldn't retest everything thoroughly--the good news is that your tests are still totally valid as well (if you are using XML data sets, make sure to replace the view name with the table name).

Conclusion

Using some of the latest and greatest Hibernate technologies, Java testing methodologies, and smart use of your databases resources, we have shown you that iterative change is possible. What is magical about solving data model problems this way is that the solution is mutually inclusive. This means that while you have solved the problem for yourself (in your application), other applications accessing the same data can continue to operate fine until they wise up and jump on your corrected bandwagon. It's a really friendly approach to data model migration.

As a final note, we would like to remind you to keep the following in mind when implementing your solution:

  • Zero impact: DMA solutions must not affect current applications or processes. This means that while records are modified in the new (virtual) data model, the existing set of data (in the faulty data model) must reflect the change.
  • Maintain functionality: While a new data model may eliminate the importance of certain data columns, it is important to simultaneously maintain the old behavior. Calculated fields are a perfect example of this. If a legacy application calculates a field before insertion, but the new data model doesn't even include the field, the DMA solution must respect the old behavior and calculate (and insert) the field.
  • Testing for complete confidence: DMA solutions that are created must be thoroughly tested in order to provide confidence that the new schema is as solid as a direct database schema is. The DMA pattern may not be a straightforward mapping of tables and columns, and is therefore vulnerable to bugs. The challenge then, is in testing the solution and making sure it adheres to complete ACID principles.

Resources

  • Sample code for this article
  • Hibernate documentation with further information about Hibernate
  • MySQL documentation to guide you in writing stored procedures for MySQL

Gilad Buzi has been involved in data driven application development for over ten years. He is currently a Principal Software Engineer with The Code Works Inc.

Kelley Glenn has worked in the software development industry for more than 10 years, with experience in telephony billing and enterprise application integration.

Jonathan Novich is co-founder and partner at The Code Works, Inc. with more than 10 years of experience in software consulting and development.

你可能感兴趣的:(Agile Database Refactoring with Hibernate)