http://www.aidanfinn.com/?p=11390
With this post, I’m going to try explain why I recommend against using Dynamic VHD in production.
What is Dynamic VHD?
There are two types of VHD you may use in production:
With Windows Server 2008 we knew that Dynamic VHD was just too slow for production. The VHD would grow in very small amounts, and often lots of growth was required at once, creating storage write latency.
Windows Server 2008 R2
We were told that was all fixed when Windows Server 2008 R2 was announced. Trustworthy names stood in front of large crowds and told us how Dynamic VHD would nearly match Fixed VHD in performance. The solution was to increase the size of the chunks that were added to the Dynamic VHD. After RTM there were performance reports that showed us how good Dynamic VHD was. And sure enough, this was all true … in the perfect, clean, short-lived, lab.
For now, lets assume that the W008 R2 Dynamic VHD can grow fast enough to meet write activity demand, and focus on the other performance negatives.
Fragmentation
Let’s imagine a CSV with 2 Dynamic VHDs on it. Both start out as small files:
Over time, both VHDs will grow. Notice that the growth is fragmenting the VHDs. That’s going to impact reads and overwrites.
And over the long term, it doesn’t get any better.
Now imagine that with dozens of VMs, all with one or more Dynamic VHDs, all getting fragmented.
The only thing you can do to combat this is to run a defrag operation on the CSV volume. Realistically, you’d have to run the defrag at least once per day. Defrag is an example of an operation that’s going to kick in Redirected Mode (or Redirected Access). And unlike backup, it cannot make use of a Hardware VSS Provider to limit the impact of that operation. Big and busy CSVs will take quite a while to defrag, and you’re going to impact on the performance of production systems. And you really need to be aware of what that impact would be on multi-site clusters, especially those that are active(site)-active(site).
Odds are you probably should be doing the occasional CSV defrag even if you use Fixed VHD. Stuff gets messed up over time on any file system.
Storage Controllers
I am not a storage expert. But I talked with some Hyper-V engineers yesterday who are. They told me that they’re seeing SAN storage controllers that really aren’t dealing well with Dynamic VHD, especially if LUN thin provisioning is enabled. Storage operations are being queued up, leading to latency issues. Sure, Dynamic VHD and thin provisioning may reduce the amount of disk you need, but at what cost to the performance/stability of your LOB applications, operations, and processes?
CSV and Dynamic VHD
I became aware of this one a while back thanks to my fellow Hyper-V MVPs. It never occurred to me at all – but it does make sense.
In scenario 1 (below) the CSV1 coordinator role is on Host1. A VM is running on Host1, and it has Dynamic VHDs on CSV1. When that Dynamic VHD needs to expand, Host1 can take care of it without any fuss.
In scenario 2 (below) things are a little different. The CSV1 coordinator role is still on Host1, but the VM is now on Host3. Now when the Dynamic VHD needs to expand, we see something different happen.
Redirected Mode/Access kicks in so the CSV coordinator (Host1) for CSV1 can expand the Dynamic VHD of the VM running on Host3. That means all storage operations for that CSV, on Hosts2-3 must travese the CSV network (maybe 1 Gbps) to Host1, and then go through its iSCSI or fibre channel link. This may be a very brief operation, but it’s still something that has a cumulative effect on latency, with potential storage I/O bottlenecks in the CSV network, Host1, Host1 HBA, or Host1 SAN connection.
Now take a moment to think bigger:
Is your mind boggled yet? OK, now add in the usual backup operations, and defrag operations (to handle Dynamic VHD fragmentation) into that thought!
You could try to keep the VMs on CSV1 running on Host1. That’ll eliminate the need for Redirected Mode. But things like PRO, and Dynamic Optimization of SCVMM 2012 will play havoc with that, moving VMs all over the place if they are enabled – and I’d argue that they should be enabled because they increase service uptime, reliability, and performance.
We need an alternative!
Sometimes Mentioned Solution
I’ve seen some say that they use Fixed VHD for data drives where there will be the most impact. That’s a good start, but I’d argue that you need to think about those System VHDs (the ones with the OS). Those VMs will get patched. Odds are that will happen at the same time and you could have a sustained level of Redirected Mode while Dynamic VHDs expand to handle the new files. And think of the fragmentation! Applications will be installed/upgraded, often during production hours. And what about Dynamic Memory? The VMs paging file will increase, thus expanding the size of the VHD: more Redirected I/O and fragmentation. Fixed VHD seems to be the way to go for me.
My Experience
Not long after the release of Windows Server 2008 R2, a friend of mine deployed a Hyper-V cluster for a business here in Ireland. They had a LOB application based on SQL Server. The performance of that application went through the floor. After some analysis, it was found that the W2008 R2 Dynamic VHDs were to blame. They were converted to Fixed VHD and the problem went away.
I also went through a similar thing in a hosting environment. A customer complained about poor performance of a SQL VM. This was for read activity – fragmentation would cause the disk heads to bounce and increase latency. I converted the VHDs to fixed and the run time for reports was immediately improved by 25%.
SCVMM Doesn’t Help
I love the role of the library in SCVMM. It makes life so much easier when it comes to deploying VMs, and SCVMM 2012 expands that exponentially with the deployment of a service.
If you are running a larger environment, or a public/private cloud, with SCVMM then you will need to maintain a large number of VM templates (VHDs in MSFT lingo but the rest of the world has been calling them templates for quite a long time). You may have Windows Server 2008 R2 with SP1 Datacenter, Enterprise, and Standard. You may have Windows Server 2008 R2 Datacenter, Enterprise, and Standard. You may have W2008 with SP1 x64 Datacenter, Enterprise, and Standard. You may have W2008 with SP1 x86 Datacenter, Enterprise, and Standard. You get the idea. Lots of VHDs.
Now you get that I prefer Fixed VHDs. If I build a VM with Fixed VHD and then create a template from it, then I’m going to eat up disk space in the library. Now it appears that some believe that disk is cheap. Yes, I can get 1TB of a disk for €80. But that’s a dumb, slow, USB 2.0 drive. That’s not exactly the sort of thing I’d use for my SCVMM library, let alone put in a server or a datacenter. Server/SAN storage is expensive, and it’s hard to justify 40 GB + for each template that I’ll store in the library.
The alternative is to store Dynamic VHDs in the library. But SCVMM does not convert them to Fixed VHD on deployment. That’s a manual process – and that’s one that is not suitable for the self-service nature of a cloud. The same applies to storing a VM in the library; it seems pointless to store Fixed VHDs for an offline VM, but there’s a manual conversion process to convert the stored VMs to Dynamic VHD.
It seems to me that:
What Do The Microsoft Product Groups Say?
Exchange: “Virtual disks that dynamically expand are not supported by Exchange”.
Dynamics CRM: “Create separate fixed-size virtual disks for Microsoft Dynamics CRM databases and log files”.
SQL Server: "Dynamic VHDs are not recommended for performance reasons”.
That seems to cover most of the foundations for LOB applications in a MSFT centric network.
Recommendation
Don’t use Dynamic VHD in production environments. Use Fixed VHD instead (and passthrough in those rare occasions where required). Yes, you will use more disk for Fixed VHD for all that white space, but you’ll get the best possible performance while using flexible and more manageable virtual disks.
If you have implemented Dynamic VHD: