Golden Nuggets: Windows Server 2012 R2 Failover Cluster CSV Placement Policy


Some enhancements only become truly evident to people when they see them in action. For many features this means something need to go wrong before they kick in. Others are more visible during normal operations. This is the case with the CSV enhancements in Windows Server 2012 R2 Failover Clustering.

One golden nugget here is the CSV placement policy (which really shines in combination with SOFS/Storage Spaces). This will spread ownership of the CSV amongst the cluster nodes to ensure a balanced distribution. In a failover cluster, one node is the “coordinator node” (owner) for a CSV. The coordinator node owns the physical disk resource that is associated with a logical unit (LUN). All I/O operations for the File System on that LUN are are through the coordinator node. In previous versions there is no automatic rebalancing of coordinator node assignment. This means that all LUNs could potentially be owned by the same node. In storage spaces & SOFS scenarios becomes even more important.

The benefits

  • It helps all nodes carry their share of the workload as it load balances the disk I/O.
  • Failovers of CSV owners are potentially quicker and more predictable/consistent as an even distribution ensures that no one node owns a disproportionate number of CSVs.
  • When losing storage access the number of CSVs that are in redirected mode is potentially less as they are evenly distributed. In an unbalanced cluster it could be for all of them in a worse case scenario.
  • When using SOFS with Storage Spaces it makes sure the Storage Spaces Ownership is distributed fairly.

When does it happen

  • Each time a node leaves or joins the cluster. This means you don’t need to intervene manually or via PowerShell to get an even distribution. This goes for both exiting nodes as when adding a new node. The new node will get a CSV assigned if there is any on surplus on one of the existing nodes.
  • The process also works when you start a failover cluster when it has shut down.

When customers see this in action (it’s most obvious when then add a node as then they are normally watching) they generally smile as the cluster does it job getting  the best possible results out of their hardware.

Advertisements

Manually Merging Hyper-V Checkpoints


A Last ditch Effort

Fist of all you need to realize this might not work. It’s a last ditch effort. There is a reason why you have backups (with tested restores) and why you should monitor your environment for things that are not as they should be. Early intervention can pay off.

Also see blog post on a couple of more preferred actions.

If you have lost checkpoints, you have basically lost data and corruption/data inconsistencies are very much a possibility and reality. If the files have been copied and information about what file is the parent the dates/timestamps are what you have to go by. You might not know for sure if you have them all.

Setting up the demo

For demo purposes we take a test VM and ad files to indicate what checkpoint we’re at.

We start with ORGINAL.TXT on the desktop and we create a checkpoint, which we rename accordingly.

image

We add a file called CHECK01.TXT and we create a checkpoint, which we rename accordingly.

image

We add a file called CHECK02.TXT and we create a checkpoint, which we rename accordingly.

image

We add a file called NOW.TXT no more checkpoints are taken.

image

The file names represent the content you’d see disappear if you applied the checkpoint and we have reflected this in the name for the checkpoints.

image

As we want to merge all the snapshots and and up with a usable VHDX we’ll work back from the most recent differencing disk until all is merged. As you can see this is a straight forward situation and I hope you’ll never be running having to deal with a vast collection of sub trees Smile.

Finding out what are the parents of avhdx files

In this demo it’s pretty obvious what snapshot exist and what avhdx files they represent. We’ve even shown you the single tree visualized in Hyper-V Manager. In reality bad things  have happened and you don’t see this information anymore. So you might have to find out yourself. This is done via inspect disk in Hyper-V manager. I you’re confused about what the parent is of (a)vhdx files this tool will help you find out or show you what the most recent one was.

image

Sometimes the original files have been renamed or moved and that it will show you’re the last known valid parent.

image

Manually Merging the checkpoints

Remember to make a copy of all files as a backup! Also make sure you have enough free diskspace … you need working space! You might need another shot at this. As we want to merge all the snapshots and and up with a usable VHDX we’ll work back from the most recent differencing disk until all is merged in the oldest one which is the vhdx. You can look at the last modified time stamps to find out the correct order in which to work. The most recent avdx is the one used in the virtual machine configuration file and locate the information for the virtual hard disk.

image

The configuration file’s avhdx is the one containing the “NOW” running state of the VM.

Note: You might find some information that you need to rename the extension avhdx to vhdx (or avhd to vhd). The reason for this was that in Windows 2008 Hyper-V Manager did not show avhd files in the Edit virtual disk wizard. You can still do this and it will still works, but you do not need to. Ever since Windows Server 2008 R2 avhd (and with since Windows Server 2012 avhdx) files do show up in Hyper-V Managers Disk edit.

For some insights as to why the order is important read this blog by Ben Armstrong What happens when a snapshot is being merged? [Hyper-V]

WARNING: If you did not start with the most recent one and work your way down, which is the easiest and least confusing way all is not lost. But you will have to reconnect the first more recent (a)vhdx to one to it’s new parent. This is needed as by merging a snapshot out of order more recent one will have lost it’s will have lost it’s original parent.

Here’s how to do this: Select reconnect. This is the page you’ll get if you’d start edit disk wizard as all other option are unavailable due to the missing parent.

image

The wizard will tell you what used to be the parent and allow you to select a new one. Make sure to tick the check box for Ignore ID mismatch or the reconnect will fail as you’re previous out of order merge has created a new a(vhdx). If your in this pickle by renaming (a)vhdx files or after a copy this isn’t needed by the way.

image
Follow the wizard after that and when your done you can launch the edit disk wizard again and perform a merge. It’s paramount that you do not mix up orders when doing so that you reconnect to the parent this or you’ll end up in a right mess. There are many permutations, keep it simple!. Do it in order Smile. If you start having multiple checkpoint trees/subtrees things can get confusing very fast.

You might also have to reconnect if the checkpoints have lost their connection the what they know to be their last parent for other reasons. In that case you do this and when that’s done, you merge. Rinse and repeat. The below walk through assumes you have no reconnects to be done. If so it will tell you like in the example above.

Walk trough:

Open the Edit Disk Wizardimage

Select the most recent avhdx & click “Next”

image

image

We choose to merge the avhdx

image

In our case into its parent disk

image
Verify the options are correct and click “Finish”

image

Let the wizard complete

image

That’s it. You’ve merged the most recent snapshot into it’s parent. That means that you have not lost the most recent state of the virtual machine as when it was running before you shut it down. This can be verified by mounting the now most recent avhdx and looking at the desktop for my user profile. You can see the NOW.txt text file is there!

OK, dismount the avhdx and now it’s rinse and repeat.

image

image

You do this over an over again until your merge the last avhdx into the vhdx.

image

Than you have the vhdx you will use to create a new virtual machine.

image

Make sure you get the generation right.

image

Assign memory

image

Connect to the appropriate virtual switch or not if you’re not ready to do this yet

image

Use your vhdx disk that’s the remaining result of your merging efforts

image

image

When you boot that virtual machine you’ll see that all the text files are there. It’s as if you’ve deleted the checkpoints in the GUI and retained “NOW” in the vhdx.

image

image

Last but not least, you can use PowerShell or even DiskPart for this but I found that most people in this pickle value a GUI. Use what you feel most comfortable with.

Thanks for reading and hope this helps someone. Do remember “big boy” rules apply. This is not safe, easy or obvious in each and every situation so you are responsible for everything you do in your environment. If your in to deep, way over your head, etc. call in some expert help.

3 Ways To Deal With Lingering Hyper-V Checkpoints Formerly Known as Snapshots


Lingering or phantom Hyper-V checkpoints or snapshots

Once in a while the merging of checkpoints, previously known as snapshots, in Hyper-V goes south. An example of this is when checkpoints are not cleaned up and the most recent avhdx or multiple of these remains in use as active virtual disk/still even as you don’t see them anymore as existing in the Hyper-V Manager UI for example. When that happens you can try looking at the situation via PowerShell to see if that show the same situation. Whatever the cause, once in while I come across virtual machines that have one or more avhdx (or avdh) active that aren’t supposed to be there anymore. In that case you have to do some manual housekeeping.

Now please, do not that in Windows Server 2012(R2) Hyper-V replica is using checkpoints and since Windows Server 2012 R2 backups also rely on this. Just because you see a snapshot you didn’t create intentionally, don’t automatically think they’re all phantoms. They might exits temporarily for good reason Winking smile. We’re talking about dealing with real lingering checkpoints.

Housekeeping

Housekeeping comes in a couple of variants form simply dusting of to industrial cleaning. Beware of the fact that the latter should never be a considered a routine operation. It’s not a normal situation. It’s a last ditch resort and perhaps you want to call support to make sure that you didn’t miss anything else.

Basically you have tree options. In order of the easiest & safest to do first these are:

  1. Create a new checkpoint and delete it. Often that process will take care of merging the other (older) lingering avhd/avhdx files as well. This is the easiest way to deal with it and it’s as safe as it gets. Hyper-V cleans up for you, you just had to give it a kick start so to speak.
  2. Shut down the VM and create a new checkpoint. Export that newly created checkpoint. Yes you can do that. This will create a nicely exported virtual machine that only has the relevant vhd/vhdx files and no more checkpoints (avhd/avhdx). Do note that this vhd/vhdx is dynamically expanding one. If that is not to your liking you’ll need to convert it to fixed. But other than that you can dump the old VM (don’t delete everything yet) and replace it by importing the one you just exported. For added security you could first copy the files for save guarding before you attempt this. image
  3. Do manual mergers. This is a more risky process & prone to mistakes. So please do this only on a copy of the files. That way you’ll give Microsoft Support Services a fighting change if things don’t work out or you make a mistake. Also note that in this case you get one or more final VHDX files which you’ll use to create a new virtual machine with to boot from. It’s very hands on.

So that’s the preferred order of things to try/do in regards to safety. The 3rd option, is the last resort. Don’t do it before you’ve tried options 1 and 2. And as said above, if you do need to go for option 3, do it on copies.If you’re unsure on how to proceed with any of this, get an expert involved.

There’s actually another option which is very save but not native to Hyper-V. In the running virtual machine which current state you want to preserve do a V2V using Disk2vhd v2.01. Easy and sort of idiot proof if such a thing exists.

In a next blog post I’ll walk you through the procedure for the 3rd option. So if this is your last resort you can have practiced it before you have to use it in anger. Bit please, if needed, and do make sure it’s really needed as discussed above, try 1 first. If that doesn’t do it. Then try option 2. If that also fails try option 3. Do not that for option 2 and 3 you will have to create a new virtual machine with the resulting VHDX, having the required settings documented will help in this case.

Virtualizing Intensive Workloads on Hyper-V, Can It Be Done?


Can it be done?

All I can say is that, yes, absolutely, you can virtualize resource intensive workloads. Done right you’ll gain all benefits associated with virtualization and you won’t lose your performance & scalability.

Now I have to stress done right. There are a couple of major causes of problems with virtualization. So let’s look at those and see how a few well placed torpedoes can sink your project fast & effective.

Common Sense

One of them is the lack of common sense. If you currently have 10 SQL Servers with 12 15K RPM SAS Disks in RAID 1 and RAID 10 for the OS, TempDB, Logs & Data files, 64 GB of Memory, dual Quad Core sockets and teamed 1Gbps for resilience and throughput and you want to virtualize them you should expect to deliver the same resources to the virtualized servers. It’s technology people. Hoping that a hypervisor will magically create resources out of thin air is setting yourself up for failure. You cannot imagine how often people use cheap controllers, less disk or slower disks, less bandwidth or CPU cycles and then dump their workload on it. Dynamic memory, NUMA awareness, Storage QoS, etc. cannot rescue a undersized, ill conceived solution. I realize you have read that most physical servers are sitting there idle and let their resources go to waste. If you don’t measure this you can get bitten. You can get ripped to pieces when you’re dealing with virtualizing intensive workloads on Hyper-V based on assumptions.

image

Consider the entire stack

The second torpedo is not understanding the technology stack. The integration part of things or the holistic approach in management consulting speak. The times one could think as a storage admin, network admin, server admin, virtualization admin, SQL DBA, Exchange Engineer is long gone. Really, long gone. You need to think about the entire stack. Know your bottle necks, SPOF, weaknesses, capabilities and how these interact. If you’re still on premise for 100% that means you have to be a datacenter admin, not forgetting you might have multiple of those. And you’d better communicate a bit through DevOps to make sure the developers know that all those resources are not magically super redundant, are not continuously available without any limitation and that these do not have infinite scalability.

image

 

Drivers, firmware & bugs can sink your project

Hardware, VAR & ISV support is also a frequent cause of problems. They’ll al tell you that everything is supported. You can learn very fast and very painfully that this is too often not the case or serious bugs are wreaking havoc on your beautiful design. So I live by one of my mantras: “Trust but verify”. However sad it may be, you cannot in good faith trust OEM, VAR and ISVs. I’m not saying they are willfully doing this, but their experience, knowledge isn’t perfect & complete either. You have to do your due diligence. There are too many large scale examples of this right now with Emulex NIC issues around DVMQ. This is a prime example of how you slow acknowledgement of a real issue can ruin your virtualization project for intensive workloads and has been doing so for 9 months and might very well take a year to resolve. Due diligence could have saved you here. A VAR should protect its customers from that, but in reality they often find out when it’s too late. Another example is bugs in storage vendors implementation of ODX causing corruption or extremely slow support for a new version of Windows effectively blocking the use of it in production when you need it for the performance & scalability. I have long learned that losing customers and as such revenue is the only real language vendors understand. So do not be afraid to make hard decisions when you need to.

image

Knowledge & Due Diligence

Know your hypervisor and core technologies well. Don’t think it’s the same a hardware based deployments, don’t think all options and features work everywhere for everything, don’t think all hypervisor work the same. They do not. Know about Exchange and the rules/limits around virtualization. The same goes for SQL Server and any resource intensive workload you virtualize. Don’t think that the same rules apply to all workloads. There is no substitute for knowledge, experience and hands on testing, the verification part of trust but verify, remember? It goes for you as well!

image

It can be done

Yes, we can Winking smile! If you want to see some high level examples to simulate your appetite just browse my blog. Here are some pointers to get you started.

Unmap

 

 

Live migration at the speed of light

Remember , don’t just say “Damn those torpedoes, full speed ahead” but figure out why, where, when and how you’ll get the job done.

Live Migration Speed Check List – Take It Easy To Speed It Up


When configuring live migrations it’s easy to go scrounge on all the features and capabilities we have in Windows Server 2012 R2.

There is no one stopping you configuring 50 simultaneous live migrations. When you have only one, two or even four 1Gbps NICs at your disposal,  you might stick to 1 or 2 VMs per available 1Gbps. But why limit yourself if you have one or multiple 10Gbps pipes or bigger ready to roll? Well let’s discuss a little what happens when you do a live migration on a Hyper-V cluster with CSV storage. Initiating a live migrations kicks of a slew of activities.

  1. First it is establish form where (aka the source host) to where we are migrating (aka the target host).
  2. Permissions are checked, are we allowed to do this?
  3. Do we have enough memory on the target to do this? If so allocate that memory.
  4. Set up a skeleton VM on the target host that is a perfect copy of the source VM’s  specifications and configure dependencies on the target host.
  5. Let’s see if we can get a network connection set up and running. If that works, we’re cool and can now transfer the memory.
  6. A bitmap is created to track the changes to the memory pages of the source VM’s pages. Each memory page is copied from the source host to the target host VM during which the memory page is marked clean.
  7. As long as the source VM is running memory is changing, which continues to be tracked in the bitmap and as such that page is mapped as dirty over there. In an iterative process this dirty memory is copied over again and so on. This continues until the remaining dirty memory is minimal. This will take longer if the VM is very memory intensive.
  8. The tiniest amount of not yet copied dirty memory is that part of a VMs state that is copied during “black out”. For this to happen the VM on the source host is paused, the remaining state is copied.
  9. A final check is done to confirm all is well and then the virtual machine is resumed on the target host.
  10. Any remains of the VM on the source host are cleaned up.

That’s actually a lot of work and as you can see copying the state is just part of the process. The more bandwidth & the lower the latency we throw at this part of the process becomes less of the total time spent during live migration.

If you can’t fill of just fill the bandwidth of your 10/40/46Gbps pipe or pipes & you operate at line speed, what’s left as overhead? Everything that’s not actual the copy of VM state. The trick is to keep the host busy so you minimize idle time of the network copies. I.e we want to fill up that bandwidth just right but  not go overboard otherwise  the work to manage a large number of multiple live migrations might actually slow you down. Compare it to juggling with balls. You might be very good and fast at it but when you have to many balls to attend to you’ll get into trouble because you have to spread you attention to wide, i.e. you’re doing more context switching that is optimal.

So tweaking the number of simultaneous live migrations to your environment is the last step in making sure a node is drained as fast as possible. Slowing things down can actually speed things up.  So when you get your 10Gbps or better pipes in production it pays of to test a bit and find the best settings for your environment.

Let’s recap all of the live migration optimization tips I have given over the years and add a final word of advice.  Those who have been reading my blog for a while know I enjoy testing to find what works best and I do tweak settings to get best performance and results. However you have to learn and accept that it makes no sense in real life to hunt for 1% or 2% reduction in live migration speeds. You’ll get one off  hiccups that slow you down more than that.

So what you need to do is tweak the things that matter the most and will get you 99% results?

  • Get the biggest pipe you need & can afford. Bigger pipes are always better than lots of aggregated smaller pipes when it come to low latency & high throughput.
  • Choose the best performance settings Hyper-V offers you. You can choose from TCP/IP,Compression, SMB. Ben Armstrong has a blog post on this Faster Live Migration–Which Option Should You Choose? I’d like to add that you can use NIC teaming for live migration as well and prior to Windows Server 2012 R2 that was the only way to aggregate bandwidth. Now you have more options. I prefer SMB but when I don’t have 10Gbps at my disposal I have found that compression really makes a difference. In my home  lab where I have only 1Gbps, the horror, it stopped me from going crazy Smile (being addicted to 10Gbps).

image

  • Optimize the power settings for your server BIOS if you want an extra speed & smoothness with 10Gbps (less so with 1Gbps). Look here An Early Look At Live Migration Over TCP/IP & Multichannel In Windows Server 2012 R2 Preview, the network traffic is a lot more stable, i.e. a flat line!  In Windows 2008 R2 this was a real need for 10Gbps or you’d be stuck at 16% max.
  • Enable Jumbo Frames for another 15-20%. Thanks to Multi Channel I can visualize this now. See also this blog post Live Migration Can Benefit From Jumbo Frames. The pictures say it all!
  • Figure out the best number of simultaneous live migrations in your environments. Well you just read this blog, so now you know.  Start at 4 and experiment upwards. Tune it back down if the speed deteriorates. The “best” number depends on your environment.

If you do these 5 things you’ll have really gotten the best performance out of your infrastructure that’s possible for live migration. Bar compression, which is not magic either but reducing the GB you need to transport at the cost of CPU cycles, you just cannot push more than 1.25GB/s trough a single 10Gbps pipe and so on. You might keep looking to grab another 1% or 2% improvement left and right  but might I suggest you have more pressing issues to attend to that, when fixed are a lot more rewarding? Knocking 1 or 2 seconds of a 100 second host evacuation is not going to matter, it’s a glitch. Stop, don’t over engineer it, don’t IBM it, just move on. If you don’t get top performance after tweaking these 5 settings you should look at all the moving parts involved between the host as the issue is there (drivers, firmware, cables, switch configurations, …) as you have a mistake or problem somewhere along the way.

ODX Doesn’t Support IDE But Works With Both VHDX And VHD Virtual Disk Format


This question came up recently, once again, and deserves it a little blog post. If you want to see the benefits of ODX you’ll need to connect your virtual disks to a vSCSI controller or other supported controller option. These are iSCSI, vFC, a SMB 3 File Share or a pass-through disk. But unless you have really good reason to use pass-through disks, don’t. It’s limiting you in to many ways.

Basically in generation 1 virtual machines that boot from a vIDE this rules out the system disk. So the tip here is to store your data that’s moved around in or between virtual machines in vSCSI attached VDH or (preferably) VHDX  virtual disks. If you can use generation 2 virtual machines, you’ll be able to leveraged ODX on the system partition as well as it boots from vSCSI Smile.

It goes without saying you need to store any virtual disks  involved on ODX capable LUNs via iSCSI, FC, FCoE, SMB 3 File Share or SAS for ODX to be available to the virtual machine.

Also beware that ODX only works on NTFS partitioned disks. The files cannot be compressed or encrypted.  Sparse files are not supported either. And finally, the volume cannot be BitLocker protected.

Here’s a screenshot of a copy of 30GB worth of ISO files to a VHDX attached to a vSCSI controller:image

Here’s a screenshot of a copy of 30GB worth of ISO files to a VHDX attached to a vIDE controller.

image

You’ll notice quite a difference. Depending on the load on the controllers/SAN it’s on average 3 times slower than the same action to a VHDX disk on a vSCSI controller.

Hyper-V UNMAP Does Work With SAN Snapshots And Checkpoints But Not Always As You First Expect


Recently I was asked to take a look at why UNMAP was not working predictably  in a Windows Server 2012 R2 Hyper-V environment. No, this is not a horror story about bugs or bad storage solutions. Fortunately, once the horror option was of the table I had a pretty good idea what might be the cause.

San snapshots are in play

As it turned out everything was indeed working just fine. The unexpected behavior that made it seem that UNMAP wasn’t working well or at least at moments they didn’t expected it was caused by the SAN snapshots. Once you know how this works you’ll find that UNMAP does indeed work predictably.

Snapshots on SANs are used for automatic data tiering, data protection and various other use cases. As long as those snapshots live, and as such the data in them, UNMAP/Trim will not free up space on the SAN with thinly provisioned LUNs. This is logical, as the data is still stored on the SAN for those snapshots, hard deleting it form the VM or host has no impact on the storage the SAN uses until those snapshots are deleted or expire. Only what happens in the active portion is directly impacted.

An example

  • Take a VM with a dynamically expanding VHDX that’s empty and mapped to drive letter D. Note the file size of the VHDX and the space consumed on the thinly provisioned SAN LUN where it resides.
  • Create 30GB of data in that dynamically expanding  virtual hard disk of the virtual machine
  • Create a SAN snapshot
  • Shift + Delete that 30GB of data from the dynamically expanding virtual hard disk in the virtual machine. Watch the dynamically expanding VHDX  grow in size, just like the space consumed on the SAN
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: it remains +/- the same.
  • Shut down the VM and look at the size of the dynamic VHDX file. It shrinks to the size before you copied the data into it.
  • Boot the VM again and copy 30GB of data to the dynamically expanding VHDX in the VM again.
  • See the size of the VHDX grow and notice that the space consumed on the SAN for that LUN goes up as well.
  • Shift + Delete that 30GB of data from the dynamically expanding  virtual hard disk in the virtual machine
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: It drops, as the data you delete is in the active part of your LUN (the second 30GB you copied), but it will not drop any more than this as the data kept safe in the frozen snapshot of the LUN is remains there (the first 30GB you copied)
  • When you expire/delete that snapshot on the SAN  we’ll see the size on the thinly provisioned SAN LUN  drop to the initial size of this exercise.

I hope this example gave you some insights into the behavior

Conclusion

So people who have snapshot based automatic data tiering, data protection etc. active in their Hyper-V environment and don’t see any results at all should check those snapshot schedules & live times. When you take them into consideration you’ll see that UNMAP does work predictably, all be it in a “delayed” fashion Smile.

The same goes for Hyper-V checkpoints (formerly known as snapshots). When you create a checkpoint the VHDX is kept and you are writing to a avhdx (differencing disk) meaning that any UNMAP activity will only reflect on data in the active avhdx file and not in the “frozen” parent file.