I’ve discussed this before in Windows Server 2012 Deduplication Results In A Small Environment but here’s a little updated screenshot of a backup volume:
Not to shabby I’d say and 100% free in box portable deduplication … What are you waiting for ![]()
I’ve discussed this before in Windows Server 2012 Deduplication Results In A Small Environment but here’s a little updated screenshot of a backup volume:
Not to shabby I’d say and 100% free in box portable deduplication … What are you waiting for ![]()
There is a small environment that provides web presence and services. In total there a bout 20 production virtual machines. These are all backed up to a Transparent Failover File Share on a Windows Server 2012 cluster that is used to host all the infrastructure and management services.
The LUN/Volume for the backups is about 5.5 TB of storage is available. The folder layout is shown in the screenshot below. The backups are run “in guest” using native Windows Backup which has the WindowsImageBackup subfolder as target. Those backups are archived to an “Archives” folder. That archive folder is the one that gets deduplicated, as the WindowsImageBackup folder is excluded.
This means that basically the most recent version is not deduplicated guaranteeing the fastest possible restore times at the cost of some disk space. All older (> 1 day) backup files are deduplicated. We achieve the following with this approach:
About 20 virtual machines are backed up every week (small delta and lots of stateless applications).As the optimization runs we see the savings grow. That’s perfectly logical. The more backups we make of virtual machines with a small delta the more deduplication can shine. So let’s look at the results using Get-DedupStatus | fl
A couple of weeks later it looks like this.
Give it some more months, with more retained backusp, and I think we’ll keep this around 88%-90% .From tests we have done (ddpeval.exe) we think we’ll max out at around 80% savings rate. But it’s a bit less here overall because we excluded the most recent backups. Guess what, that’s good enough for us
. It beats buying extra storage of paying a wad of money for disk deduplication licenses from some backup vendor or appliance. Just using the build in deduplication mechanisms in Windows Server 2012 Server saved us a bunch of money.
The next step is to also convert the production Hyper-V cluster to Windows Server 2012 so we can do host based backups with the native Windows Backup that now supports Cluster Shared Volumes (another place where that 64TB VHDX size can come in handy as Windows backup now writes to VHDX).
The volume reports we’re using 3TB in data. So 2.4TB is free.
Looking at the backup folder you see 10.9TB of data stored on 1.99 TB of disk .
So the properties of the volume reports more disk space used that the actual folder containing the data. Let’s use WinDirStat to have a look.
So the above agrees with the volume properties. In the details of this volumes we again see about 2TB of consumed space.
Could it be that the volume might is reserving some space ensure proper functioning?
When you dive deeper things we get some cool view of storage space used.. Where Windows Explorer is aware of deduplication and shows the non deduplicates size for the vhd file, WinDirStat does not, it always shows shows the size on disk, which is a whole lot less.
This is the same as when you ask for the properties of a file in Windows Explorer.
Is it the best solution for everyone? Not always no. The deduplication is done on the target after the data is copied there. So in environments where bandwidth is seriously constrained and there is absolutely no technical and/or economical way to provide the needed throughput this might not be viable solution. But don’t dismiss this option to fast. In a lot of scenarios is it is very good and cost effective feature. Technically & functionally it might be wiser to do it on the target as you don’t consumes to much memory (deduplication is a memory hog) an CPU cycles on the source hosts. Also nice is that these dedupe files are portable across systems. VEEAM has demonstrated some nice examples of combing their deduplication with Windows dedupe by the way. So this might also be an interesting scenario.
Financially the the cost of deduplication functionality with hardware appliances or backup software hurts like the kick of a horse straight onto the head. So even if you have to invest a little in bandwidth and cabling you might be a lot better of. Perhaps, as you’re replacing older switches by new 1Gbps or 10Gbps gear, you might be able to recuperate the old ones as dedicated backup switches. We’re using mostly recuperated switch ports and native Windows NIC teaming, it works brilliantly. I’ve said this before, saving money whilst improving operations rarely gets you fired. The sweet thing about this that this is achieved by building good & reliable solutions, which means they are efficient even if it costs some money to achieve. Some managers focus way to much on efficiency from the start as to them means nothing more than a euphemism for saving every € possible. Penny wise and Pound foolish. Bad move. Efficiency, unless it is the goal itself, is a side effect of a well designed and optimized solution. A very nice and welcome one for that matter, but it’s not the end all be all of a solution or you’ll have the wrong outcome.
As I blogged in a previous post we’ve been building a Disk2Disk based backup solution with commodity hardware as all the appliances on the market are either to small in capacity for our needs, ridiculously expensive or sometimes just suck or a combination of the above (Virtual Library Systems or Virtual Tape Libraries come to mind, one of my biggest technology mistakes ever, at least the ones I had and in my humble opinion
) .
Here’s a logical drawing of what we’re talking about. We are using just two backup media agent building blocks (server + storage) in our setup for now so we can scale out.
Now in future post I hope to be discussing storage spaces & Windows deduplication thrown into the mix.
Not to shabby … > 1TB/Hour
To great …
In close up you are seeing just 2 Windows 2012 Hyper-V cluster nodes, each being backed up over a native LBFO team of 2*1Gbps NIC ports to one Windows Server 2012 Backup Media Agent with a 10Gbps pipe. Look at the max throughput we got …
Sure this is under optimal conditions, but guess what? When doing backup from multiple hosts to dual backup media servers or more we’re getting very fast backups at very low cost compared to some other solutions out there. This is our backup beast
. More bandwidth needed at the backup media server? It has dual port 10Gbps that can be teamed and/or leverage SMB 3.0 multichannel. High volume hosts can use 10Gbps at the source as well.
As I said before, you will not get fired for:
Cost reduction is one thing but look at what we get whilst saving money… wow!
What am I missing? Specialized dedupe. Yes, but we’re going for the poor mans workaround there. More on that later. As long as we get enough throughput that doesn’t hurt us. And give the cost we cannot justify it + it’s way to much vendor lock in. If you can’t get backups done without, sure you might need to walk that route. But for now I’m using a different path. Our money is better spend in different places.
Now how to get the same economic improvements from the backup software? Disk capacity licensing sucks. But we need a solution that can handle this throughput & load , is reliable, has great support & product information, get’s support for new release fast after RTM (come on CommVault, get a move on) and is simple to use ==> even more money and time saved.
Why is support for new releases in backup software important. Because the lack of it is causing me delays. Delays cost me, time, money & opportunities. I’m really interested to covert our large LUN file servers to Windows Server 2012 Hyper-V virtual machines, which I now can rather smoothly thanks to those big VDHX sizes that are possible now and slash the backup times of those millions of small files to pieces by backing them up per VHDX over this setup. But I’ll have to wait and see when CommVault will support VHDX files and GPT disks in guests because they are not moving as fast as a leading (and high cost) product should. Altaro & Veeam have them beaten solid there and I don’t like to be held back.
We have been running CommVault Simpana 9.0 R2 SP7 in combination with the DELL Compellent Hardware VSS provider to do host based backups of the virtual machines on our Windows Server 2012 Hyper-V clusters host with great success and speed.
We’ve run into two issues so far. One, I blogged about in DELL Compellent Hardware VSS Provider & Commvault on Windows Server 2012 Hyper-V nodes – Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0×80070005, Access is denied was an due to some missing permissions for the domain account we configured the Compellent Replay manager Service to run with. The solution for that issue can be found in that same blog post.
The other one was that sometimes during the backup of a Hyper-V host we got an error from CommVault that put the job in a “pending” status, kept trying and failing. The error is:
Error Code: [91:9], Description: Volume Shadow Copy Service (VSS) error. VSS service or writers may be in a bad state. Please check vsbkp.log and Windows Event Viewer for VSS related messages. Or run vssadmin list writers from command prompt to check state of the VSS writers.
When we look at the Compellent controller we see the following things happen:
The result at the CommVault end is that the job goes into a pending state with the above error. When we look at the state of the Microsoft Hyper-V VSS Writer by running “vssadmin list writer” …
… from an elevated command prompt we see:
Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Retryable error
Note at this stage:
Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Unexpected error
To get rid of this one we can restart the host or, less drastic, restart the Hyper-V Virtual Machine management Service (VMMS.exe) which will do the trick as well. Before you do this , drain the node when you pause it, then resume it with the option failing back the roles. Windows 2012 makes it a breeze to do this without service interruption ![]()
Looking for solutions when CommVault is involved can be tedious as their consultancy driven sales model isn’t focused on making information widely available. Trouble shooting VSS issues can also be considered a form of black art at times. Since this is Windows 2012 RTM an the date is September 20th 2012 as the moment of writing, there are not yet any hotfixes related to host level backups of Virtual machines and such. CommVault Simpana 9.0 R2 SP7 is also fully patched.
This,combined with the fact that we did not see anything like this during testing (and we did a fair amount) makes us look at the guests. That’s the big difference on a large production cluster. All those unique guests with their own history. We also know from the past years with VSS snapshots in Windows 2008(R2) that these tend to fail due to issues in the guests. Take a peak at Troubleshoot VSS issues that occur with Windows Server Backup (WBADMIN) in Windows Server 2008 and Windows Server 2008 R2 just for starters As an example we already had seen one guest (dev/test server) that had 5 user logged in doing all kinds of reconfigurations and installs go into save mode during a backup, so it could be due to something rotten in certain guests. There is very much to consider when doing these kinds of backups.
By doing some comparing of successful & failed backups it really looks as if it was related to certain virtual machines. A lot of issues are caused by the VSS service, not running or not being able to do snapshots because of lack of space so perhaps this was the case here as well?
We poked around a bit. First let’s see what we can find in the Hyper-V specific logs like the Microsoft-Windows-Hyper-V-VMMS-Admin event log. Ah lot’s of errors relating to a number of guests!
Log Name: Microsoft-Windows-Hyper-V-VMMS-Admin
Source: Microsoft-Windows-Hyper-V-VMMS
Date: 19/09/2012 22:14:37
Event ID: 10102
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Computer: undisclosed server
Description:
Failed to create the volume shadow copy inside of virtual machine ‘undisclosedserver’. (Virtual machine ID 84521EG0G-8B7A-54ED-2F24-392A1761ED11)
Well people, that is called a clue
. So we did some Live Migration to isolate suspect VMs to a single node, run backups, see them fail, do the the same with a new and clean VM an it all works. and indeed … looking at the guest involved when the CommVault backup fails we that the VSS service is running and healthy but we do see all kind of badness related to disk space:
That later one left 0MB of disk space on the system (Test Controller TFS shitting itself), but we managed to clear just enough to get to just over 1GB of free space which was enough to make the backup succeed.
Servers, virtual or physical ones, should to be locked down to prevent such abuse. I know, I know. Did I already tell you I do not reside in a perfect world? We cannot protect against dev and test server admins who act without much care on their servers. We’ll just keep hammering at it to raise their awareness I guess. For end users and production servers we monitor those well enough to proactively avoid issues. With dev & test servers we don’t do so, or the response team would have a day’s work reacting to all alerts that daily dev & test usage on those servers generate.
Microsoft Hyper-V VSS Writer to a healthy state as described above. Thanks to Live Migration this can be achieved without any down time.
There is experimenting, testing, production testing, production and finally real life environments where not all is done as it should be. Yes, really the world isn’t perfect. Managers sometimes think it’s click, click, Next, click and voila we’ve got a complex multisite system running. Well it isn’t like that and you need some time and skills to make it all work. Yes even in todays “cheap, fast, easy to run your business form your smartphone” ecosystem of the private, hybrid and public cloud, where all is bliss and world peace reigns.
The DELL Compellent Hardware VSS provider & replay manager service handle all this without missing a beat, which is very comforting. As previous experiences with hardware VSS provides of other vendors make us think that these would probably have blown up by now.
As you know by now I’ve been building a high throughput, large volume Disk2Disk backup solution and that has been rather successful. At optimal speed we get 2TB/Hour per backup media server. As we currently have two, we can get to 4TB/hour at maximum throughput
. Currently that is, we’ll see if more is possible. The solution, which I’ll blog about later, is based on the Dell Compellent hardware VSS provider (Dell Compellent Replay Manager 6.2.0.9), Windows Server 2012, CommVault 9.0 R2 and PowerVault storage a the target disks and as it’s working now is saving us many hundred of thousands of Euros compared to dedicated D2D backup appliances or solutions.
The entire process has been a fairly smooth one, which was a relief as decent hardware VSS providers are not easy to come by, based on our experience and that of many colleagues. So, once again the DELL Compellent choice is turning out to be a good one. We did have to fix one small issue along the way.
Whilst using the DELL Compellent Hardware VSS Provider with Commvault Simpana 9.0 R2 to do host level back of the virtual machines on Windows Server 2012 Hyper-V cluster nodes. We ran into a small issue. The backup run fine but we saw this error being logged:
Volume Shadow Copy Service error: Unexpected error querying for the IVssWriterCallback interface. hr = 0×80070005, Access is denied.
. This is often caused by incorrect security settings in either the writer or requestor process.Operation:
Gathering Writer DataContext:
Writer Class Id: {e8132975-6f93-4464-a53e-1050253ae220}
Writer Name: System Writer
Writer Instance ID: {3f4965d8-10ac-411b-bf6d-6a607f237775}
The description found here was not helpful in resolving this. This error is found all over the internet with just about any backup product. The possible causes and solutions that are suggested are as follows:
This is not a solution in our case, which would have surprised me if it had been.
As the domain account used for this service is a member of the local administrators group that already has the permissions this would also have surprised me but we tested it anyway. It turns out this is not the cause either.
But for your information this is how it’s done:
You can eliminate the error condition by adding the access permissions for the Network Service account to the COM Security of the affected server. To add the access permissions for the Network Service account, do the following:
According to Microsoft support this issue occurs when using a 3rd party backup program that utilizes Windows VSS (Volume Shadow Service) and has its own requestor. This is indeed the case here. “It looks like the requestor (the backup application) does not allow system writer to call back into their process and hence generates the error in the application log.” This sounds very plausible ![]()
The fix:
And voila: the error has gone when running backups ![]()
I assume no responsibility if you do this in your environment but I can say that all this works perfectly in our CommVault Simpana 9.0 R2 setup whist backing up the virtual machines on our Hyper-V cluster nodes at the host level using the DELL Compellent hardware VSS provider. And yes this is Windows Server 2012, careful testing, planning helps when being an early adaptor.
As Brad Anderson suggested a few times this year we are leading our businesses to a better IT future. Here’s a quick screen shot of one of our backups media agent servers hooked up to bunch of NL-SAS disks (150TB). One host (older model, no 10Gbps yet, running Windows 2008 R2) is sending backups over its dedicated 1Gbps interface to this media agent server who has a 10Gbps interface. Performance and traffic looks great
. I love it when a plan comes together!
This is what I looks like at host being backed up
Now we’ll see what magic Data Deduplication can do for us with the RTM bits!