In my last article, I introduced NHibernate, which is an open-source object/relational mapping framework newly available to .NET developers. I have been rightly criticized (both here and in other forums) for failing to emphasize that NHibernate is but one of many available ORM solutions available, and in fact, the number of open source plus commercial ORM frameworks for .NET is currently over 50. As I had to pick one, though, NHibernate was an obvious choice for me since I’ve been doing so much work with Hibernate in my life as a consultant. I wanted to explore how that framework made the jump over to .NET and talk about its benefits.
In this followup article, I want to introduce some more advanced techniques for using NHibernate. Last time, I just walked through getting it set up, and introduced a simple application as a demonstration (the University Registration application). This month, the goal is to give you some strategies for making NHibernate work with real-world constraints and requirements: high throughput, high concurrency applications with special data querying needs.
To do that, we’ll talk about four major areas: better session management, direct database queries using HQL, lazy loading of object collections and direct lifecycle management.
In my previous article, I showed a rather simple data manager class called RegMgr that implemented a very straightforward version of SessionFactory and Session management. Namely, the class created a SessionFactory in the constructor from configuration settings and then used that SessionFactory to create a Session for each method. This strategy has several problems with it.
First, each instance of RegMgr requires a new SessionFactory to be built, which is slow and expensive. A SessionFactory is meant to model the entirety of a conversation between your application and a single database; the only reason to have more than one instance of SessionFactory is if your application communicates with more than one database. My example application does not, so this is needless overhead.
Secondly, every method has to construct a new Session from the factory. This provides the maximum concurrency protection, as every method finishes by closing the open Session and releasing all locks in the database. However, this also finalizes all state in objects associated with that Session. Sessions are also expensive to create (though not nearly as expensive as SessionFactories).
The answer to the first problem is to make sure you only create your SessionFactory once. The original version of RegMgr looked like this:
public class RegMgr : DBMgr { Configuration config; ISessionFactory factory; public RegMgr() { try { config = new Configuration(); config.AddClass(typeof(nhRegistration.Department)); config.AddClass(typeof(nhRegistration.Person)); config.AddClass(typeof( nhRegistration.UniversityClass)); factory = config.BuildSessionFactory(); } catch (Exception ex) { // handle exception } } // etc... }
The simple fix is to make the factory-related objects static and move the configuration into a static constructor.
public class RegMgr : DBMgr { static Configuration config; static ISessionFactory factory; static RegMgr() { try { config = new Configuration(); config.AddClass(typeof(nhRegistration.Department)); config.AddClass(typeof(nhRegistration.Person)); config.AddClass(typeof( nhRegistration.UniversityClass)); factory = config.BuildSessionFactory(); } catch (Exception ex) { // handle exception } } // etc... }
Now, all methods of all instances of RegMgr share the same SessionFactory, saving your application lots of needless overhead.
This solution might look like it will work for Session management as well. Why not just add a static Session to the class, and create and open it just once in the static constructor? Then, all the methods could use the same Session. The answer is twofold: first, a statically shared Session might be accessed by multiple threads concurrently, which could interfere with transactional semantics. Second, a statically shared Session will spend most of its life sitting idle, and an idle Session is still holding open a physical database connection. Database connections are expensive, rare and valuable. Every data access strategy after server-side cursors has had as a goal minimizing the length of time an application holds open a connection, and NHibernate is no different. Therefore, having a static Session sitting around open and available is a poor strategy.
Instead, you can use disconnected Sessions. You can still create a shared Session object (either instance-available or truly static) for use in your methods. However, instead of leaving it open, or closing it in every method, you can follow the following pattern:
public IList getClasses() { IList classes = null; try { // session is declared in instance scope session.Reconnect(); ITransaction tx = session.BeginTransaction(); classes = session.CreateCriteria( typeof(UniversityClass)).List(); } catch (Exception ex) { // handle exception } finally { session.Flush(); session.Disconnect(); } return classes; }
Using this pattern, you can reuse the same Session object, but between method calls you can release the physical resources back to the pool which maximizes the concurrency potential of your database but minimizing the overhead to your application.
Sometimes, you are going to need to make more specific requests of NHibernate than simply “give me all the Departments” or “give me this Student”. Your application will need to execute arbitrary database queries to provide sophisticated searching and filtering of your data. NHibernate provides a facility for this called HQL (the Hibernate Query Language) which looks a lot like SQL. The biggest difference between HQL and SQL can be seen in the FROM clause: instead of specifying one or more table names from which to pull data, in HQL, you specify one or more classes. Since the point of NHibernate is to provide a transparent data layer via your domain objects, it would not make sense to surface the table names in your code. Since your domain objects are already mapped to data tables, NHibernate doesn’t need to be fed the table names again.
For example, in our University registration application, it would be nice to know which classes have been scheduled but do not yet have professors assigned to teach them (we’ll need to go round up some first-year grad students to teach those). We’ll create an entry point in our RegMgr interface for looking them up:
public IList getClassesNoProf() {}
We could implement this completely using what we already know about NHibernate:
IList classes = this.getClasses(); IList results = new ArrayList(); foreach(UniversityClass c in classes) { if(c.Prof == null) results.Add(c); } return results;
We can piggyback on the existing method for retrieving all classes in the database, then iterate through them, returning only those that meet our criteria. While this works, it is unconscionably slow. We pull every row from the universityclasses table, instantiating an object for each, only to throw some, and probably most, of them away. The larger the table, the more egregiously bad this choice becomes.
Instead, we want to do the filtering in the database, which is a platform optimized for this kind of activity and minimizes the amount of bandwidth required to pipe the results back to our application (since the database is probably on its own server elsewhere on the network).
public IList getClassesNoProf() { IList results = null; try { ISession session = factory.OpenSession(); results = session.Find( "from nhRegistration.UniversityClass as uc where personid is null"); } catch (Exception ex) { // handle exception } return results; }
The primary access point for implementing an HQL query is the ISession.Find() method. Use this method to execute arbitrary queries. HQL supports all major SQL query suffix clauses, like ordering and grouping. To return the same list of UniversityClass instances but sorted by the name of the class, we would just change the query slightly:
results = session.Find("from nhRegistration.UniversityClass as uc where personid is null order by uc.Name asc");
Special note: look at the syntax of the ordering clause. We are sorting by the Name property of UniversityClass which, if you refer back to the mapping files from the last article, is mapped to a column named “classname”. When crafting the ordering clause, you can refer to this column either by the property name from your domain object or the column name from the database schema, but if you choose the property name (sticking with the domain-object-only style) you have to prefix the name of the property with the alias you gave to the class name (in this case, “uc”). If you choose to use the column name, you must leave the prefix out (“order by classname asc”). I prefer to use the class alias plus property name style, as it maintains the separation between my code and the database.
By leaving out the normal SQL SELECT clause at the beginning of the query, we have implicitly told NHibernate to return instances of the object whose typename appears in the FROM clause. Sometimes you don’t need the whole objects; in this case, all we really want to do is display the names of the classes with no professors as a warning to the administrators. Loading and initializing all those instances of UniversityClass would be beyond overkill. To overcome this, just include a SELECT clause to return the property you are looking for; you will get back an IList containing instances of that property’s type. In our case, an IList of strings:
public IList getClassNamesNoProf() { IList results = null; try { ISession session = factory.OpenSession(); results = session.Find("select uc.Name from nhRegistration.UniversityClass as uc where personid is null order by uc.Name asc"); } catch (Exception ex) { string s = ex.Message; } return results; }
Conversely, we can return aggregate scalar results as well. Instead of returning the names of the classes that are missing professors, we could just return the number of classes missing professors.
results = session.Find("select count(uc.Name) from nhRegistration.UniversityClass as uc where personid is null");
As you can see, HQL closely mimics SQL in most ways, except for the substitution of object-oriented descriptors of data elements. For most regular object persistence tasks, you can safely ignore HQL, but for more data-centric tasks that can be accomplished faster in the database, HQL provides the perfect toolset.
One of the best things about NHibernate is the way it makes loading related classes utterly transparent; you just ask for the parent class, and NHibernate knows enough to go load all the child classes. This is true whether the relationship is one-to-one or one-to-many. This could also be one of the worst things about NHibernate, if it was the only option.
The problem is that this default behavior, while making database usage completely transparent in your code, it can make that usage not transparent at all at runtime. By which I mean that any object hierarchy of even moderate size is going to load extremely slowly because of all the relationships that have to be navigated.
Just look at the University Registration domain model as an example. We have only four classes, but loading a single one (an instance of Department) causes two separate lists of dependent objects to be loaded, a list of UniversityClasses, and a list of Professors. Mind you, each UniversityClass also loads an instance of Professor, and a further list of Students. With enough data in the database, our four-class model would already be unwieldy just loading a single Department instance. Now imagine the RegMgr.getAllDepartments() method, and the full scope of the problem becomes manifest.
The solution is lazy loading of collections. Declaring a collection to be lazily initialized is simple – you need only add the attribute lazy=”true” to the element declaration. For instance, to make the Classes collection of our Department be lazily loaded, our mapping file goes from this:
<set name="Classes" cascade="all"> <key column="deptid"/> <one-to-many class="nhRegistration.UniversityClass, nhRegistration"/> </set>
To this:
<set name="Classes" cascade="all" lazy="true"> <key column="deptid"/> <one-to-many class="nhRegistration.UniversityClass, nhRegistration"/> </set>
Lazy loading means that the elements in the collection are not populated until they are required. You defer the consumption of resources dedicated to that task until such time as it is impossible not to. Not only do you defer the resource consumption, you avoid it altogether if your execution path never touches the lazy collection. Perhaps the method you are working on only needs to access a Department’s classes if the Department has more than one Professor associated with it. For those Department’s that do not meet your criteria, you never bother to load all the dependent UniversityClass instances, which is an enormous time saving for your application (both at the client and at the database).
The downside to this powerful technique is that the deferred load requires the original session that loaded the parent object to be used to load the lazy collection. If your code follows the pattern we have used for the University Registration application, namely separating all data management into a single data manager class, then you can only see the benefits of lazy loading in that data manager class. Look at RegMgr.getDepartments() again:
public IList getDepartments() { IList depts = null; try { ISession session = factory.OpenSession(); ITransaction tx = session.BeginTransaction(); depts = session.CreateCriteria(typeof(Department)).List(); session.Close(); } catch (Exception ex) { // handle exceptions } return depts; }
Immediately after we load the Departments, we close the session. With the Classes list marked as lazy, if you attempt to access that collection in your business code, you’ll get a LazyInitializationException. This means you can neither access any individual member of the collection, nor query any aggregate data, like the total number of members in the collection, because it doesn’t exist yet. And, since the session is closed, it never will, unless you attach the parent object to a different session (a different discussion altogether).
Therefore, you can either move any business logic that would benefit from lazy loading into the data manager, or you can choose not to close your session before returning from the method. I suggest that you do not follow the latter course unless you have spent a lot of time thinking about the implications, or at least bulletproofing your application through extensive exception handling code.
There are four major moments in the life of a persistent object. That object can be loaded, updated, saved, or deleted. (Remember, save is what happens when a new object is persisted, update happens when you commit changes to an existing object.) Using the code we’ve written so far, the only way you can take action at these moments is in the methods of the data manager class that correspond to the different activities (getDepartment() or saveStudent() for example). The problem is that these methods live in your data manager, and if they need to do anything substantial using the persistent object in question, you have more than likely created too tight a coupling between the persistent object and the data manager.
It would be much cleaner if your objects could handle these lifecycle moments on their own. NHibernate provides an interface that your classes can implement that alert NHibernate that instances of the class want to be notified of those lifecycle events. That interface is called ILifecycle, and it looks like this:
public NHibernate.LifecycleVeto OnUpdate(ISession s){}
public void OnLoad(ISession s, object id){}
public NHibernate.LifecycleVeto OnSave(ISession s){}
public NHibernate.LifecycleVeto OnDelete(ISession s){}
Three of the four methods give your object a chance to halt further processing of the given lifecycle event by returning a LifecycleVeto. Using this mechanism, you can interrogate the internal state of an object prior to saving, updating or deleting it, and if that state doesn’t meet some criteria of your application, you can veto the completion of the event. Returning the veto causes the rest of the event to fail silently, which means the object isn’t saved, updated or deleted but no notification of that fact is available elsewhere in the application. Conversely, you can raise exceptions from these methods which cause them to fail very loudly.
The original intent of the three veto-able methods was to provide application programmers an alternate means of cascading events to dependent objects instead of relying on NHibernate to handle them for you (you get greater control over the details of what events get cascaded to which dependent objects). You should NOT use the OnLoad method to load dependent objects, though, as you can interfere with NHibernate’s default actions.
You might be tempted to use these lifecycle events for something like logging, or security. For instance, you could implement this interface on every persistent class in your application and, in each method, you could log the event and information about the object to a log file. The problem is that you are introducing a lot of repetitive code into your application. The ILifecycle interface is for class-specific behavior only. For true cross-cutting concerns, you should create an interceptor instead.
Interceptors are classes that implement the IInterceptor interface. These classes implement the same kinds of lifecycle methods (with some additions) but the methods defined on the class are called for that lifecycle event for each and every domain object in the session. However, in an interceptor, you can’t veto the events. Instead, you can modify the object and return a Boolean indicating whether or not you did. Plus, the number of events is larger on IInterceptor:
public int[] FindDirty(object entity, object id, object[] currentState, object[] previousState, string[] propertyNames, NHibernate.Type.IType[] types){} public object Instantiate(Type type, object id){} public bool OnFlushDirty(object entity, object id, object[] currentState, object[] previousState, string[] propertyNames, NHibernate.Type.IType[] types){} public object IsUnsaved(object entity){} public bool OnLoad(object entity, object id, object[] state, string[] propertyNames, NHibernate.Type.IType[] types){} public bool OnSave(object entity, object id, object[] state, string[] propertyNames, NHibernate.Type.IType[] types){} public void OnDelete(object entity, object id, object[] state, string[] propertyNames, NHibernate.Type.IType[] types){} public void PreFlush(System.Collections.ICollection entities){} public void PostFlush(System.Collections.ICollection entities){}
For our University Registration application, I wanted to implement a consistent logging mechanism. To do that, I created a class called LoggingInterceptor, which writes out a message to the log whenever an object is loaded or persisted (other unused methods excluded for clarity).
public bool OnLoad(object entity, object id, object[] state, string[] propertyNames, NHibernate.Type.IType[] types) { string msg = string.Format("Instance of {0}, ID: {1} loaded.", entity.GetType().Name, id); log(msg); return false; } public bool OnSave(object entity, object id, object[] state, string[] propertyNames, NHibernate.Type.IType[] types) { string msg = string.Format("Instance of {0}, ID: {1} saved.", entity.GetType().Name, id); log(msg); return false; } private void log(string msg) { // app-specific logging behavior }
The last step to hooking this up is to specify the interceptor when creating the Session that loads the intercepted instances.
ISession session = factory.OpenSession(new LoggingInterceptor());
Each and every domain object loaded by this session will now have an interceptor logging all loads and saves.
As you can see, there is much more to NHibernate than a few mapping files and a calls to Session.Find(). NHibernate is enormously flexible. For small, simple applications, you can rely on NHibernate to manage the complexity and redundancy of data access code for you, without worrying much about the extended features I’ve covered here. As your application grows in size, complexity and user base, though, you will be happy to know that NHibernate provides features that allow you to customize almost everything about the way it manages your persistence. We have still barely scratched the surface of what is available.
Justin Gehtland is a founding member of Relevance, LLC, a consultant group dedicated to elevating the practice of software development. He is the co-author of Windows Forms Programming in Visual Basic .NET (Addison Wesley, 2003) and Effective Visual Basic (Addison Wesley, 2001). Justin is an industry speaker, and instructor with DevelopMentor in the .NET curriculum. |