Last time I talked briefly about lazy associations in Hibernate, and how they could be applied to minimize unnecessary database requests. We learned how Hibernate introduces and manages lazy associations, and how you can develop to ensure that the details of these lazy connections don't trip you up. Today I want to expand on those ideas, and learn how we can optimize the lazy fetching model. You should read this previous article: Hibernate: Understanding Lazy Fetching, otherwise today's tip won't make a dollar's worth of sense.
As I mentioned last time, the best bet whenever possible if data is known ahead-of-time to be necessary is to use joins. Joining across tables (or really in Hibernate's case, joining across objects) can dramatically improve performance in database selects. By joining, you can perform one select, as opposed to n+1 to get data from two tables.For those that aren't familiar, the n+1 selects come from the one select on the base table, and then one for each joining record in the next table. If you have ever put any timing on JDBC code, you have probably learned that the bulk of the time in database access is not, in fact, related to the amount of the data, but rather the entire processing sequence of preparing and sending the select statement itself, as well as the database processing each select individually 'in a vacuum'. Even though using a 'join' will result in the same amount of data being brought back(assuming all columns are selected), the database only has to parse a single select statement, and in addition can (potentially) optimize based on the awareness of wanting to select from multiple tables.
While database joins are probably the optimal solution for performance whenever possible (and believe me, Hibernate can do joins, quite well I might add), it isn't always worth the complexity that may be required in your database code; the performance impact may be minimal given the context that you are working with, so you choose lazy fetching. However, it can still be beneficial to find ways to tune lazy fetching to kind of get the best of both worlds.
Say you have this database structure for a veterinarian's office:
*----------------* *-----------------* | pet | | owner | |----------------| * 1 |-----------------| | - id |-----------------| - id | | - name | | - pet_id | *----------------* | - name | *-----------------*
Simple (overly simple perhaps), but it works for today's discussion.
Let's say you had a page that was to show all of the pets, as well as their owner's various information. While it is true that this case is *begging* for a join to be performed, remember that we are trying to see how far we can get without forcing ourselves to have to ripple the knowledge required for joins all throughout our application. Hibernate when using lazy fetching (in its default format) will run n+1 selects to give us all of the pets and owners - where n
is the number of pets. So, assuming we have 3 pets and 3 owners:
*--------------------------* | id | name | owner_id | |----+----------|----------| | 1 | Snoopy | 2 | | 2 | Garfield | 3 | | 3 | Satchel | 1 | *--------------------------* *---------------* | id | name | |---------------| | 2 | Rick | | 3 | Matt | | 1 | R.J. | *---------------*
The selects that would be eventually fired by Hibernate would look like this:
-- get all of the pets first select * from pet -- get the owner for each pet returned select * from owner where pet_id=1 select * from owner where pet_id=2 select * from owner where pet_id=3
In fact, I have run this solution locally using this test class:
package org.javalobby.tnt.hibernate.lazy; import java.util.List; import org.hibernate.*; import com.javalobby.tnt.hibernate.*; public class LazyTest { public static void main(String[] args) { Session s = HibernateSupport.currentSession(); try { Query q = s.createQuery("from Pet"); Listl = q.list(); for(Pet p : l) { System.out.println("Pet: " + p.getName()); System.out.println("Owner: " + p.getOwner().getName()); } } finally { HibernateSupport.closeSession(s); } } }
...and here is the output with some silly data on my local test class scenario (sprinkled with my log statements so you can see the order and timing of the SQL execution):
Hibernate: select pet0_.id as id, pet0_.name as name0_, pet0_.owner_id as owner3_0_ from Pet pet0_ Pet: Snoopy Hibernate: select owner0_.id as id0_, owner0_.name as name1_0_ from Owner owner0_ where owner0_.id=? Owner: Rick Pet: Garfield Hibernate: select owner0_.id as id0_, owner0_.name as name1_0_ from Owner owner0_ where owner0_.id=? Owner: Matt Pet: Satchel Hibernate: select owner0_.id as id0_, owner0_.name as name1_0_ from Owner owner0_ where owner0_.id=? Owner: R.J.
This is 4 (3+1, n=3) select statements. This is certainly not optimal. The biggest problem is that this application isn't going to scale. Before you know it, you'll have fifty registered pets, and you're executing fifty-one select statements, taking up a very noticable amount of time. Wouldn't it be nice if we could do something more like this:
-- get all of the pets first select * from pet -- get all owners in a single select select * from owner where pet_id in (1, 2, 3)
Now we only have two selects, and the second one can scale much better than linearly. This is great; but how can we achieve this through Hibernate? Cases like this are often the scenarios that people attack O/R mappers over, saying they aren't smart enough and flexible enough to meet the performance demands. It turns out Hibernate provides all kinds of options in this case.
Batching Selects
The way to tell Hibernate to use the latter solution is to tell it that a certain class is batch-able. You do this by adding the batch-size
attribute to either a.) the entity definition for the association being fetched (e.g. the definition for the Owner
class) or b.) the collection definition on a class with a collection mapping. Here is the mapping declaration for the example above:
batch-size="50" >
Note the batch size which I have manually set to fifty. What a batch size means is the number of sub-elements that will be loaded at one time (the number of parameters to the 'in' clause of the SQL). If you set this number to 10, for instance, and you had 34 records to load the association for, it would load ten, ten, ten, and then four - executing 5 total select statements.
Here is the finished SQL emitted by Hibernate (sprinkled with my log statements so you can see when they were triggered again):
Hibernate: select pet0_.id as id, pet0_.name as name0_, pet0_.owner_id as owner3_0_ from Pet pet0_ Pet: Snoopy Hibernate: select owner0_.id as id0_, owner0_.name as name1_0_ from Owner owner0_ where owner0_.id in (?, ?, ?) Owner: Rick Pet: Garfield Owner: Matt Pet: Satchel Owner: R.J.
Let's say now, that this example gets turned on it's head, and we want to look at owners rather than pets. Owners (as our diagram above implies) are allowed to have multiple pets. We want to be able to select all owners, and then iterate over each of their pets. Let's see what Hibernate does in this scenario. Here is our new class:
package org.javalobby.tnt.hibernate.lazy; import java.util.List; import org.hibernate.*; import com.javalobby.tnt.hibernate.*; public class LazyTest { public static void main(String[] args) { Session s = HibernateSupport.currentSession(); try { Query q = s.createQuery("from Owner"); Listl = q.list(); for(Owner owner : l) { System.out.println("Owner: " + owner.getName()); for(Pet pet : owner.getPets()) { System.out.println("\tPet: " + pet.getName()); } } } finally { HibernateSupport.closeSession(s); } } }
Here is our new mapping declaration:
... here is some additional data just to exercise the one-to-many relationship:
*--------------------------* | id | name | owner_id | |----+----------|----------| | 1 | Snoopy | 2 | | 2 | Garfield | 3 | | 3 | Satchel | 1 | | 4 | Bucky | 1 | | 5 | Odie | 3 | *--------------------------* *---------------* | id | name | |---------------| | 2 | Rick | | 3 | Matt | | 1 | R.J. | *---------------*
... and here is the output:
Hibernate: select owner0_.id as id, owner0_.name as name1_ from Owner owner0_ Owner: R.J. Hibernate: select pets0_.owner_id as owner3___, pets0_.id as id__, pets0_.id as id0_, pets0_.name as name0_0_, pets0_.owner_id as owner3_0_0_ from Pet pets0_ where pets0_.owner_id=? Pet: Satchel Pet: Bucky Owner: Rick Hibernate: select pets0_.owner_id as owner3___, pets0_.id as id__, pets0_.id as id0_, pets0_.name as name0_0_, pets0_.owner_id as owner3_0_0_ from Pet pets0_ where pets0_.owner_id=? Pet: Snoopy Owner: Matt Hibernate: select pets0_.owner_id as owner3___, pets0_.id as id__, pets0_.id as id0_, pets0_.name as name0_0_, pets0_.owner_id as owner3_0_0_ from Pet pets0_ where pets0_.owner_id=? Pet: Garfield Pet: Odie
As we can see, we are back to a slow linear situation - it is running a select for each owner it gets back; that's really not optimal. Thankfully, collections can be batched as well - here is our new mapping declaration:
batch-size="50" >
... and here is our new output:
Hibernate: select owner0_.id as id, owner0_.name as name1_ from Owner owner0_ Owner: R.J. Hibernate: select pets0_.owner_id as owner3___, pets0_.id as id__, pets0_.id as id0_, pets0_.name as name0_0_, pets0_.owner_id as owner3_0_0_ from Pet pets0_ where pets0_.owner_id in (?, ?, ?) Pet: Bucky Pet: Satchel Owner: Rick Pet: Snoopy Owner: Matt Pet: Garfield Pet: Odie
Much better! Keep in mind that the 'batch-size' parameter has *no* bearing on how many elements inside the collection are loaded. Instead, it defines how many collections should be loaded in a single select. No matter what setting you provide, it will always retrieve 'Bucky and Satchel' in a single select statement as defined above, because they are part of the same collection. I repeat - batch size in collections defines *how many collections* will be retrieved at once.
Subselect Selection
The last form of fetching I want to cover is subselect fetching. Subselect fetching is very similar to batch size controlled fetching, which I just described, but takes the 'numerical complications' out of the equation. Subselect fetching is actually a different type of fetching strategy that is applied to collection style associations. Unlike join style fetching, however, subselect fetching is still compatible with lazy associations. The difference is that subselect fetching just gets "the whole shootin' match" as a co-worker of mine would say, rather than just a batch. In other words, it uses subselect execution to pass the ID set of the main entity set into the select off of the association table:
select * from owner select * from pet where owner_id in (select id from owner)
This is very similar to the previous examples, but all of the burden is now put on the database; and the batch size is effectively infinity.
Here is the new mapping declaration:
... and here is the output:
Hibernate: select owner0_.id as id, owner0_.name as name1_ from Owner owner0_ Owner: R.J. Hibernate: select pets0_.owner_id as owner3_1_, pets0_.id as id1_, pets0_.id as id0_, pets0_.name as name0_0_, pets0_.owner_id as owner3_0_0_ from Pet pets0_ where pets0_.owner_id in (select owner0_.id from Owner owner0_) Pet: Satchel Pet: Bucky Owner: Rick Pet: Snoopy Owner: Matt Pet: Garfield Pet: Odie
Not too shabby! As you can see, even without explicitly using joins, Hibernate is able to optimize our query set quite well. Note, however, that subselect fetching is only available when processing a collection style association, and not for single-point associations.
Lazy fetching, while usually not as performant as joins, can be optimized quite well, and potentially allows for more reusability and expressiveness in your application code.