FREE WHITE PAPER: Configuring a VEEAM Off Host Backup Proxy Server for backing up a Windows Server 2012 R2 Hyper-V cluster with a DELL Compellent SAN (Fiber Channel)

Whilst I’m attending TechEd North America 2014, being able to learn and network again with the community at large I think this is a good moment to share. So here’s a little contribution to that community: it’s a white paper on How to configure a VEEAM Off Host Backup Proxy server for backing up a Windows Server 2012 R2 Hyper-V cluster with a DELL Compellent SAN (Fiber Channel).

VEEAM Back & Replication is currently under and extensive test before we make the decision. So far it is going (very) well. And no, VEEAM or DELL did not sponsor this. It’s sharing with the community. A prosperous, successful community makes my professional live better to!

I have to applaud VEEAM for allowing such easy access to their software for trials, to their engineers for assistance and to their support forum and resources even without yet being a paying customer. This is how it should be: vendors having faith in their products both in quality and ease of use. It’s a refreshing experience as some vendors don’t want you to get your hands on new versions of their products even as a existing paying customer “because due to its complexity we might get the wrong impression”. It’s even near impossible with some to get a test license for the lab of the version you currently use with some of them. Not so with VEEAM and that’s great.

I hope you enjoy it. As you might realize I don’t have this kind of infrastructure in my home lab so some of the screenshots have been edited / blurred. I’m sure you can live with that. Otherwise feel free to provide me with the gear in a paid for data center.

Some Insights Into How Windows 2012 R2 Hyper-V Backups Work

How Windows Server 2012 R2 backups differ from Windows Server 2012 and earlier

You’ll remember our previous blog about an error when backing up a virtual machine on Windows Server 2012 R2, throwing this error:

Dealing With Event ID 10103 “The virtual machine ‘VM001′ cannot be hot backed up since it has no SCSI controllers attached. Please add one or more SCSI controllers to the virtual machine before performing a backup. (Virtual machine ID DCFE14D3-7E08-845F-9CEE-21E0605817DC)” In Windows Server 2012 R2

The fix was easy enough, adding a virtual SCSI controller to the virtual machine. But why does it need that now?

Well, this all has to do with the changed way Windows Server 2012 R2 backups work. Before Windows Server 20012 R2 the VSS provider created a VSS snapshot inside the guest virtual machine. That snapshot was exposed to the host, to create a volume snapshot for backup purposes. Right after the volume snapshot has been taken this VSS snapshot inside the guest virtual machine needed to be reverted. The backups then run against that volume snapshot and is consistent thanks to both host & guest VSS capabilities.

For an overview of VSS based backup process in general take a peak at Overview of Processing a Backup Under VSS

Now it is the “Hyper-V Integration Services Shadow Copy Provider” that is being used. When the the host initiates a volume snapshot (Microsoft or hardware VSS provider) the host VSS writer goes in to freeze. This process leverages the Hyper-V Integration Services Shadow Copy Provider  to create the virtual machine checkpoint. After that the volume/LUN/CSV snapshot is taken. When that is done the host VSS writes goes into thaw and the virtual machine checkpoint is deleted. After that the backup runs against the Volume snapshot and at the end that is also deleted. You can follow this process quite nicely in the GUI of your Hyper-V host, you SAN (if you use a Hardware VSS provider).

Dear storage vendors: a great, reliable, fast VSS Hardware Provider is paramount to success in a Microsoft environment. You need to get this absolutely right and out of the door before spending any more time and money on achieving yet more IOPS. Keep scalability in mind when doing this.

Dear backup software vendors: think about the scalability when designing your products. If we have 200 or 500 or a thousand VMs … can we leverage CSV based backups to protect every VM on the LUN or do we need to snap the LUN for every VM backed up? Choice there is good for both data protection schemes and scalability.

At this stage the hardware VSS snapshot is being taken …


Contrary to common belief this means that the backup will indeed application consistent to the time of the checkpoint as the CSV snapshot being taken is of a consistent checkpoint. It’s the delta in the active avhdx that is only crash consistent, like any running VM by the way. Now pay attention to the screenshot below. The two red arrows are indicating to ntfs source events, two volumes seem to be exposed to the next free drive letters. E: and F: here as C: is the virtual machine OS and D: the DVD.


Look at the detail. Indeed two. Well it the previous screenshot we only saw one in the CSV path but there are two avhdx files indeed.


Exposing a snapshot on the SAN to a server actually shows us this much better … look here at the avhdx with the GUID and one with “AutoRecovery” in the name. So that makes for two nfts events … and as the backup needs to do this life it requires a vSCSI controller to be present in the virtual machine … and vIDE controller can’t do this.


Anyway, enough under the hood detective work for now, In VEEAM that stage looks like this:


And on the Compellent it looks like this. The screenshots are from different backups at different times so don’t get confused about the time stamps here. It’s just as illustration of what you can expect to see.


Now when the CSV snapshot has been taken the virtual machine checkpoint is removed. At that time the backup runs against the CSV snapshot. In our case (hardware VSS provider) this is a snapshot on the SAN that gets exposed in a view and mapped to the off host backup proxy VEEAM server. On the DELL Compellent it looks like this.


This takes a while to o…but after a while the backup will kick off. Do not that the checkpoint has merged and is no longer visible at this time.


Once the backup is complete, the mapping is removed, the view deleted and the snapshot expired. So your SAN is left as the backup found it.

There you go. I hope this helped clarify certain things on how Hyper-V guest backups work in Windows 2012 R2. So your backups are still application consistent, just not when you’re running Linux or DOS or NT4.0 as there is no support / VSS for that. However they are based on a  consistent virtual machine snapshot which explains why Hyper-V backups can protect Linux guests very adequately!

Dealing With Event ID 10103 “The virtual machine ‘VM001′ cannot be hot backed up since it has no SCSI controllers attached. Please add one or more SCSI controllers to the virtual machine before performing a backup. (Virtual machine ID DCFE14D3-7E08-845F-9CEE-21E0605817DC)” In Windows Server 2012 R2

I was doing backups of a Windows 2012 R2 Hype-V cluster recently and it runs only Windows Server 2012 R2 virtual machines. It’s a small but very modern and up to date cluster Smile.

Using VEEAM as backup software I have high expectations and VEEAM did deliver. All went well except for one virtual machine.


VEEAM states "Processing Error. Guest processing skipped (check guest OS VSS state and integration components version)". Well all  virtual machines  are W2K12R2 as are the cluster host and all IC components are up to date and backup (volume checkpoint) is enabled.


I dove into the Hyper-V log and sure enough I found following event:

The virtual machine ‘VM001′ cannot be hot backed up since it has no SCSI controllers attached. Please add one or more SCSI controllers to the virtual machine before performing a backup. (Virtual machine ID DCFE14D3-7E08-845F-9CEE-21E0605817DC).

As it turns out in in Windows Server 2012 R2 the VM requires a SCSI controller for the backup to function. It doesn’t need to have any storage attached. It just needs one to be there (default). So the fix is easy, just add one.



Click “Apply” and “OK”. You can now start the virtual machine and that’s it. Once we fixed that it was a squeaky clean backup run.

But why does it need to be there?

Well when we monitor the event logs inside a virtual machine we are backing up we see that during the backup process, very briefly a VHDX get’s mounted inside the guest.


To answer this question we need to dive into how Windows Server 2012 R2 backups work as that is different from how it used to be. You can read about that over here when it’s published.

Hyper-V Cluster Node Pause & Drain fails – Live Migrations fail with “The requested operation cannot be completed because a resource has locked status”

One night I was doing some maintenance on a Hyper-V cluster and I wanted to Pause and drain one of the nodes that was up next for some tender loving care. But I was greeted by some messages:


[Window Title]
Resource Status

[Main Instruction]
The requested operation cannot be completed because a resource has locked status.

The requested operation cannot be completed because a resource has locked status.


Strange, the cluster is up and running, none of the other nodes had issues and operational wise all VMs are happy as can be. So what’s up? Not to much in the error logs except for this one related to a backup. Aha …We fire up disk part and see some extra LUNs mounted + using “vssadmin list writers“ we find:




Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Unexpected error

Bingo! Hello old “friend”, I know you! The Microsoft Hyper-V VSS Writer goes into an error state during the making of hardware snapshots of the LUNs due to almost or completely full partitions inside the virtual machines. Take a look at this blog post on what causes this and how to fix fit. As a result we can’t do live migrations anymore or Pause/Drain the node on which the hardware snapshots are being taken.

And yes, after fixing the disk space issue on the VM (a SDT who’s pumped the VM disks 99.999% full) the Hyper-V VSS writer get’s out of the error state and the hardware provider can do it’s thing. After the snapshots had completed everything was fine and I could continue with my maintenance.

Money Saving Hero of 2012: Windows 2012 In Box Deduplication Delivers Big Value

To wave goodbye to 2012 I’m posting the latest screenshot of the easiest and very effective money saving feature you got in Windows Server 2012 than RTM in August. Below you’ll find the status report of a backup LUN in a small environment.  Yes those are real numbers in a production environment.image

If you are not using it; you’re really throwing away vast amounts of money on storage right this moment. If you’re in the market for a practical, economical and effective backup solution my advice you to  is the following. Scrap any backup vendor or product that prevents it files of LUNs being duplicated  by Windows Server 2012.  They might as well be robbing you at gun point.

You can pay for a very nice company new years party with these savingsMartini glassParty smile

I wish you all a great end of 2012 and a magnificent 2013 ahead. In 2013 we’ll push Windows Server 2012 into service where we couldn’t before (waiting for 3 party vendor support and if they keep straggling they are out of the door) and work at making our infrastructure ever more resilient an protected.  With System Center SP1 some products of that suite will make a come back in our environment. 10Gbps is bound to become the standard all over our little data center network and not just our most important workloads.

You Got To Love Windows Server 2012 Deduplication for Backups

I’ve discussed this before in Windows Server 2012 Deduplication Results In A Small Environment but here’s a little updated screenshot of a backup volume:


Not to shabby I’d say and 100% free in box portable deduplication … What are you waiting for Winking smile

Altaro Backup for Hyper-V Has Gifts for the Festive Season

Here’s an early X-Mas gift from Altaro. They are giving away 50 free licenses of their desktop backup solution to all Hyper-V admins until December 24th 2012. Altaro is better known for their cost effective and good Hyper-V Backup product.

There is no catch. Now there is no such thing as a free lunch in life but there are some very decent meals to be gotten at very democratic pricing. This is one such case. All you need to do is send them a screenshot of Hyper-V in your environment that proves that you’re really using Hyper-V. I guess that means I qualify due to the amount of Hyper-V related screenshots on my blog Winking smile. I’m going to check it out for sure.

What do you get? 50 licenses of their desktop backup solution ($2,000 worth of software). You’re free to use them in your company, at home of as a gift to friends and family. 50 Licenses is something that a lot of companies using Hyper-V in the SMB market can leverage to protect their desktops so that’s a pretty nice gift.

If you’re interested you can go to

There more information about Altaro Hyper-V Backup at and If you’re a SMB shop in need of easy to use, affordable backup software for Hyper-V and want one that has full support for all features in Windows Server 2012 you should try them out. In that respect they were very fast to market beating most or all competitors I know (a lot of them still don’t have that support) They are also a non-aggressive vendor, which is something I appreciate.

Windows Server 2012 Deduplication Results In A Small Environment

There is a small environment that provides web presence and services. In total there a bout 20 production virtual machines. These are all backed up to a Transparent Failover File Share on a Windows Server 2012 cluster that is used to host all the infrastructure and management services.

The LUN/Volume for the backups is about 5.5 TB of storage is available. The folder layout is shown in the screenshot below. The backups are run “in guest” using native Windows Backup which has the WindowsImageBackup subfolder as target. Those backups are  archived to an “Archives” folder. That archive folder is the one that gets deduplicated, as the WindowsImageBackup folder is excluded.


This means that basically the most recent version is not deduplicated guaranteeing the fastest possible restore times at the cost of some disk space. All older (> 1 day) backup files are deduplicated. We achieve the following with this approach:

  • It provides us with enough disk space savings to keep archived backups around for longer in case we need ‘m.
  • It also provides for enough storage to backup more virtual machines while still being able to maintain a satisfactory number of archived backups.
  • Ay combination of the above two benefits can be balanced versus the business needs
  • It’s a free, zero cost solution

The Results

About 20 virtual machines are backed up every week (small delta and lots of stateless applications).As the optimization runs we see the savings grow. That’s perfectly logical. The more backups we make of virtual machines with a small delta the more deduplication can shine. So let’s look at the results using Get-DedupStatus | fl


A couple of weeks later it looks like this.


Give it some more months, with more retained backusp, and I think  we’ll keep this around 88%-90% .From tests we have done (ddpeval.exe) we think we’ll max out at around 80% savings rate. But it’s a bit less here overall because we excluded the most recent backups. Guess what, that’s good enough for us Winking smile. It beats buying extra storage of paying a wad of money for disk deduplication licenses from some backup vendor or appliance. Just using the build in deduplication mechanisms in Windows Server 2012 Server saved us a bunch of money.

The next step is to also convert the production  Hyper-V cluster to Windows Server 2012 so we can do host based backups with the native Windows Backup that now supports Cluster Shared Volumes (another place where that 64TB VHDX  size can come in handy as Windows backup now writes to VHDX).

Some interesting screen shots


The volume reports we’re using 3TB in data. So 2.4TB is free.


Looking at the backup folder you see  10.9TB of data stored on 1.99 TB of disk .

So the properties of the volume reports more disk space used that the actual folder containing the data. Let’s use WinDirStat to have a look.


So the above agrees with the volume properties. In the details of this volumes we again see about 2TB of consumed space.


Could it be that the volume might is reserving some space ensure proper functioning?

When you dive deeper things we get some cool view of storage space used.. Where Windows Explorer is aware of deduplication and shows the non deduplicates size for the vhd file, WinDirStat does not, it always shows shows the size on disk, which is a whole lot less.


This is the same as when you ask for the properties of a file in Windows Explorer.



Is it the best solution for everyone? Not always no. The deduplication is done on the target after the data is copied there. So in environments where bandwidth is seriously constrained and there is absolutely no technical and/or economical way to provide the needed throughput this might not be viable solution. But don’t dismiss this option to fast. In a lot of scenarios is it is very good and cost effective feature. Technically & functionally it might be wiser to do it on the target as you don’t consumes to much memory (deduplication is a memory hog) an CPU cycles on the source hosts. Also nice is that these dedupe files are portable across systems. VEEAM has demonstrated some nice examples of combing their deduplication with Windows dedupe by the way. So this might also be an interesting scenario.

Financially the the cost of deduplication functionality with hardware appliances or backup software hurts like the kick of a horse straight onto the head. So even if you have to invest a little in bandwidth and cabling you might be a lot better of. Perhaps, as you’re replacing older switches by new 1Gbps or 10Gbps gear, you might be able to recuperate the old ones as dedicated backup switches. We’re using mostly recuperated switch ports and native Windows NIC teaming, it works brilliantly. I’ve said this before, saving money whilst improving operations rarely gets you fired. The sweet thing about this that this is achieved by building good & reliable solutions, which means they are efficient even if it costs some money to achieve. Some managers focus way to much on efficiency from the start as to them means nothing more than a euphemism for saving every € possible. Penny wise and Pound foolish. Bad move. Efficiency, unless it is the goal itself, is a side effect of a well designed and optimized solution. A very nice and welcome one for that matter, but it’s not the end all be all of a solution or you’ll have the wrong outcome.

Disk to Disk Backup Solution with Windows Server 2012 & Commodity DELL Hardware – Part II

As I blogged in a previous post we’ve been building a Disk2Disk based backup solution with commodity hardware as all the appliances on the market are either to small in capacity for our needs, ridiculously expensive or sometimes just suck or a combination of the above (Virtual Library Systems or Virtual Tape Libraries come to mind, one of my biggest technology mistakes ever, at least the ones I had and in my humble opinion Disappointed smile) .

Here’s a logical drawing of what we’re talking about. We are using just two backup media agent building blocks (server + storage)  in our setup for now so we can scale out.


Now in future post I hope to be discussing storage spaces & Windows deduplication thrown into the mix.

So what do we get?

Not to shabby …  > 1TB/Hour


To great …


In close up you are seeing just 2 Windows 2012 Hyper-V cluster nodes, each being backed up over a native LBFO team of 2*1Gbps NIC ports to one Windows Server 2012 Backup Media Agent with a 10Gbps pipe. Look at the max throughput we got  …


Sure this is under optimal conditions, but guess what? When doing backup from multiple hosts to dual backup media servers or more we’re getting very fast backups at very low cost compared to some other solutions out there. This is our backup beast Smile. More bandwidth needed at the backup media server? It has dual port 10Gbps that can be teamed and/or leverage SMB 3.0 multichannel. High volume hosts can use 10Gbps at the source as well.

Lessons learned

  • The Windows 2012 networking improvements rock. Upgrade and benefit from it! We’re seeing great results thanks to Multichannel leveraging RSS and in box NIC teaming (LBFO).
  • A couple of 1Gbps NICS teamed on Windows Server 2012 work really well. Don’t worry about not having 10Gbps on all your hosts.
  • Having 10Gbps on your backup media hosts (target) is great as you’ll be pushing a lot of data to them from multiple (source) hosts.
  • Make sure your backup software supports enough streams before it keels over under the load you’re pushing through. More streams means more concurrent files (read VHDs/VMs) and thus more throughput and allows multichannel to shine over RSS capable NICs.
  • Find the sweet sport for number of disks per node and total IOPS versus the throughput you can send to the backup media agents. 4 Nodes of 50TB might be better than 2 nodes of a 100TB. If you can, experiment a bit to find your optimal backup block size.
  • Isolate your backup network traffic from data traffic either physically or by other means (QOS) and don’t route it all over the place to end up where it needs to be.
  • We’re doing this using Dell PowerConnect 5424 (end of life) /5524 switches … no need for the real  high end very expensive gear to do this. The 10Gbps switch, well yes that’s always high end at the moment.
  • Use JBODS with SAS/Storage spaces & you’ll be fine. Select them carefully for performance. You can use bays like the MD3X00 if you want to replicates the backups somewhere otherwise MD12x0 will do or any other decent JBOD => even cheaper. You can also mix, some building blocks that can replicate & other on Storage Spaces /JBOS. Mix and match with different backup needs means you have flexibility. Note that at TechEd Europe (June 2012), in a session by DELL, they mentioned the need for a firmware update with the MD1200 to optimize performance with Storage Spaces.

It’s all about the money in a smart way!

As I said before, you will not get fired for:

  • Increasing backup throughput at least 4 fold (without dedupe)
  • Increasing backup capacity 3.5 fold (without deduplication)
  • Doing the above for 20% of systems that are replaced & new offerings with specialized appliances (even at hilarious discount rates). That’s CAPEX reduction.
  • This helps pay for the primary storage, DRC site & extra SAN for data replication in case of disaster
  • Make backups faster, more reliable & reduce OPEX (The difference for us is huge and important)
  • Putting an affordable scale up & scale out Disk2Disk backup solution into place to the business can safely handle future backup loads as very acceptable costs.
  • It’s a modular solution which we like. On top of that it’s about as zero vendor lock in as it gets. You can mix servers, bays, switches. Use what you like best for your needs. Only the bays have to remain the same within an individual “building block”.

Cost reduction is one thing but look at what we get whilst saving money… wow!

What am I missing?  Specialized dedupe. Yes, but we’re  going for the poor mans workaround there. More on that later.  As long as we get enough throughput that doesn’t hurt us. And give the cost we cannot justify it + it’s way to much vendor lock in. If you can’t get backups done without, sure you might need to walk that route. But for now I’m using a different path. Our money is better spend in different places.

Now how to get the same economic improvements from the backup software? Disk capacity licensing sucks. But we need a solution that can handle this throughput & load , is reliable, has great support & product information, get’s support for new release fast after RTM (come on CommVault, get a move on) and is simple to use ==> even more money and time saved.

Spin off huge file server project?

Why is support for new releases in backup software important. Because the lack of it is causing me delays. Delays cost me, time, money & opportunities. I’m really interested to covert our large LUN file servers to Windows Server 2012 Hyper-V virtual machines, which I now can rather smoothly thanks to those big VDHX sizes that are possible now and slash the backup times of those millions of small files to pieces by backing them up per VHDX over this setup. But I’ll have to wait and see when CommVault will support VHDX files and GPT disks in guests because they are not moving as fast as a leading (and high cost) product should. Altaro & Veeam have them beaten solid there and I don’t like to be held back.

Trouble Shooting Windows Server 2012 host based CommVault Backups with DELL Compellent hardware VSS provider of Hyper-V guests: ‘Microsoft Hyper-V VSS Writer’ State: [5] Waiting for completion

We have been running CommVault Simpana 9.0 R2 SP7 in combination with the DELL Compellent Hardware VSS provider to do host based backups of the virtual machines on our Windows Server 2012 Hyper-V clusters host with great success and speed.

We’ve run into two issues so far. One, I blogged about in DELL Compellent Hardware VSS Provider & Commvault on Windows Server 2012 Hyper-V nodes – Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0×80070005, Access is denied was an due to some missing permissions for the domain account we configured the Compellent Replay manager Service to run with. The solution for that issue can be found in that same blog post.

The other one was that sometimes during the backup of a Hyper-V host we got an error from CommVault that put the job in a “pending” status, kept trying and failing. The error is:

Error Code: [91:9], Description: Volume Shadow Copy Service (VSS) error. VSS service or writers may be in a bad state. Please check vsbkp.log and Windows Event Viewer for VSS related messages. Or run vssadmin list writers from command prompt to check state of the VSS writers.


When we look at the Compellent controller we see the following things happen:

  • The snapshots get made
  • They are mounted briefly and then dismounted.
  • They are deleted

The result at the CommVault end is that the job goes into a pending state with the above error. When we look at the state of the Microsoft Hyper-V VSS Writer by running “vssadmin list writer” …


… from an elevated command prompt we see:

Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Retryable error

Note at this stage:

  1. Resuming the job doesn’t help (it actually keep trying by itself but no joy).
  2. Killing the job and restarting brings no joy. On top of that our friendly error “Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0×80070005, Access is denied.“ is back, but this time related to the error state of the ‘Microsoft Hyper-V VSS Writer’. The error now has changed a little and has become:




Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Unexpected error

To get rid of this one we can restart the host or, less drastic, restart the Hyper-V Virtual Machine management Service (VMMS.exe) which will do the trick as well.  Before you do this , drain the node when you pause it, then resume it with the option failing back the roles. Windows 2012 makes it a breeze to do this without service interruption Smile




The Cause: Almost or completely full partitions inside the virtual machines

Looking for solutions when CommVault is involved can be tedious as their consultancy driven sales model isn’t focused on making information widely available. Trouble shooting VSS issues can also be considered a form of black art at times. Since this is Windows 2012 RTM an the date is September 20th 2012 as the moment of writing, there are not yet any hotfixes related to host level backups of Virtual machines and such. CommVault Simpana 9.0 R2 SP7 is also fully patched.

This,combined with the fact that we did not see anything like this during testing (and we did a fair amount) makes us look at the guests. That’s the big difference on a large production cluster. All those unique guests with their own history. We also know from the past years with VSS snapshots in Windows 2008(R2) that these tend to fail due to issues in the guests. Take a peak at Troubleshoot VSS issues that occur with Windows Server Backup (WBADMIN) in Windows Server 2008 and Windows Server 2008 R2 just for starters  As an example we already had seen one guest (dev/test server) that had 5 user logged in doing all kinds of reconfigurations and installs go into save mode during a backup, so it could be due to something rotten in certain guests. There is very much to consider when doing these kinds of backups.

By doing some comparing of successful & failed backups it really looks as if it was related to certain virtual machines. A lot of issues are caused by the VSS service, not running or not being able to do snapshots because of lack of space so perhaps this was the case here as well?

We poked around a bit. First let’s see what we can find in the Hyper-V specific logs like the Microsoft-Windows-Hyper-V-VMMS-Admin event log. Ah lot’s of errors relating to a number of guests!


Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          19/09/2012 22:14:37
Event ID:      10102
Task Category: None
Level:         Error
User:          SYSTEM
Computer:      undisclosed server
Failed to create the volume shadow copy inside of virtual machine ‘undisclosedserver’. (Virtual machine ID 84521EG0G-8B7A-54ED-2F24-392A1761ED11)

Well people, that is called a clue Winking smile. So we did some Live Migration to isolate suspect VMs to a single node, run backups, see them fail, do the the same with a new and clean VM an it all works. and indeed … looking at the guest involved when the CommVault backup fails we that the VSS service is running and healthy but we do see all kind of badness related to disk space:

  • Large SQL Server backup files put aside on the system partition or or other disks
  • Application & service pack installers left behind,
  • Log and tempdb volumes running out of space.
  • Application Logs running out of control

That later one left 0MB of disk space on the system (Test Controller TFS shitting itself), but we managed to clear just enough to get to just over 1GB of free space which was enough to make the backup succeed.



Servers, virtual or physical ones, should to be locked down to prevent such abuse. I know, I know. Did I already tell you I do not reside in a perfect world? We cannot protect against dev and test server admins who act without much care on their servers. We’ll just keep hammering at it to raise their awareness I guess. For end users and production servers we monitor those well enough to proactively avoid issues. With dev & test servers we don’t do so, or the response team would have a day’s work reacting to all alerts that daily dev & test usage on those servers generate.

The fix

  • Clear at least 1GB or a bit more inside each partition in the guest running on the host that has a failing backup. I prefer to have at least a couple of GB free  (10% to 15% => give yourself some head room people).
  • Then you can resume the backup job manually or let CommVault do that for you if it’s still in a pending state.
  • If you’ve killed the job make sure you restore the
  • Microsoft Hyper-V VSS Writer  to a healthy state as described above. Thanks to Live Migration this can be achieved without any down time.


There is experimenting, testing, production testing, production and finally real life environments where not all is done as it should be. Yes, really the world isn’t perfect. Managers sometimes think it’s click, click, Next, click and voila we’ve got a complex multisite system running. Well it isn’t like that and you need some time and skills to make it all work. Yes even in todays “cheap, fast, easy to run your business form your smartphone”  ecosystem of the private, hybrid and public cloud, where all is bliss and world peace reigns.

The DELL Compellent Hardware VSS provider & replay manager service handle all this without missing a beat, which is very comforting. As previous experiences with hardware VSS provides of other vendors make us think that these would probably have blown up by now.