In Defense of Switch Independent Teaming With Hyper-V


For many old timers (heck, that includes me) NIC teaming with LACP mode was the best of the best, at least when it comes to teaming options. Other modes often led to passive/active, less than optimal receiving network traffic aggregation. Basically, and perhaps over simplified, I could say the other options were only used if you had no other choice to get things to work. Which we did a lot … I used Intel’s different teaming modes for various reasons in the past (before we had MLAG, VLT, VPC, …). Trying to use LACP where possible was a good approach in the past in physical deployments and early virtualized environments when 1Gbps networking dominated the datacenter realm and Windows did not have native support for LBFO.

But even LACP, even in those days, had some drawbacks. It’s the most demanding form of teaming. For one it required switch stacking. This demands the same brand and type of switches and that means you have no redundancy during firmware upgrades. That’s bad, as the only way to work around that is to move all workload to another rack unit … if you even had the capability to do that! So even in days past we chose different models if teaming out of need or because of the above limitations for high availability. But the superiority of NIC teaming with LACP still stands for many and as modern switches support MLAG, VLT, etc. the drawback of stacking can be avoided. So does that mean LACP for NIC teaming is always the superior choice today?

Some argue it is and now they have found support in the documentation about Microsoft CPS system documentation about Microsoft CPS system. Look, even if Microsoft chose to use LACP in their solutions it’s based on their particular design and the needs of that design I do not concur that this is the best overall. It is however a valid & probably the choice for their specific setup. While I applaud the use of MLAG (when available to you a no or very low cost) to have all bases covered but it does not mean that LACP is the best choice for the majority of use cases with Hyper-V deployments. Microsoft actually agrees with me on this in their Windows Server 2012 R2 NIC Teaming (LBFO) Deployment and Management guide. They state that Switch Independent configuration / Dynamic distribution (or Hyper-V Port if on Hyper-V and if not on W2K12R2)  is the best possible default choice is for teaming in both native and Hyper-V environments. I concur, even if perhaps not that strong for native workloads (it depends). Exceptions to this:

  • Teaming is being performed in a VM (which should be rare),
  • Switch dependent teaming (e.g., LACP) is required by policy, or
  • Operation of a two-member Active/Standby team is required by policy.

In other words in 2 out of 3 cases the reason is a policy, not a technical superior solution …

Note that there are differences between Address Hash, Hyper-V Port mode & the new dynamic distribution modes and the latter has made things better in W2K12R2 in regards to bandwidth but you’ll need the read the white papers. Use dynamic as default, it is the best. Also note that LACP/Switch Dependent doesn’t mean you can send & receive to and from a VM over the aggregated bandwidth of all team members. Life is more complicated than that. So if that’s you’re main reason for switch dependent, and think you’re done => be ware Winking smile.

Switch Independent is also way better for optimization of VMQ. You have more queues available (sum-of-queues) and the IO path is very predictable & optimized.

If you don’t control the switches there’s a lot more cross team communication involved to set up teaming for your hosts. There’s more complexity in these configurations so more possibilities for errors or bugs. Operational ease is also a factor.

The biggest draw back could be that for receiving traffic you cannot get more than the bandwidth a single team member can deliver. That’s true but optimizing receiving traffic has it’s own demands and might not always be that great if the switch configuration isn’t that smart & capable. Do I ever miss the potential ability to aggregate incoming traffic. In real life I do not (yet) but in some configurations it could do a great job to optimize that when needed.

When using 10Gbps or higher you’ll rarely be in a situation where receiving traffic is higher than 10Gbps or higher and if you want to get that amount of traffic you really need to leverage DVMQ. And a as said switch independent teaming with port of dynamic mode gives you the most bang for the buck. as you have more queues available. This drawback is mitigated a bit by the fact that modern NICs have way larger number of queues available than they used to have. But if you have more than one VM that is eating close to 10Gbps in a non lab environment and you planning to have more than 2 of those on a host you need to start thinking about 40Gbps instead of aggregating a fistful of 10Gbps cables. Remember the golden rules a single bigger pipe is always better than a bunch of small pipes.

When using 1Gbps you’ll be at that point sooner and as 1Gbps isn’t a great fit for (Dynamic) VMQ anyway I’d say, sure give LACP a spin to try and get a bit more bandwidth but will it really matter? In native workloads it might but with a vSwith?  Modern CPUs eat 1Gbps NICs for breakfast, so I would not bother with VMQ. But when you’re tied to 1Gbps it’s probably due to budget constraints and you might not even have stackable, MLAG, VLT or other capable switches. But the arguments can be made, it depends (see Don’t tell me “It depends”! But it does!). But in any case I start saving for 10Gbps Smile

Today as the PC8100 series and the N4000 Series (budget 10Gbps switches, yes I know “budget” is relative but in the 10Gbps world, but they offer outstanding value for money), I tend to set up MLAG with two of these per rack. This means we have all options and needs covered at no extra cost and without sacrificing redundancy under any condition. However look at the needs of your VMs and the capability of your NICs before using LACP for teaming by default. The fact that switch independent works with any combination of budget switches to get redundancy doesn’t mean it’s only to be used in such scenarios. That’s a perk for those without more advanced gear, not a consolation price.

My best advise: do not over engineer it. Engineer it for the best possible solution for the environment at hand. When choosing a default it’s not about the best possible redundancy and bandwidth under certain conditions. It’s about the best possible redundancy and bandwidth under most conditions. It’s there that switch independent comes into it’s own, today more than ever!

There is one other very good, but luckily also a very rare case where LACP/Switch dependent will save you and switch independent won’t: dead switch ports, where the port becomes dysfunctional. So while switch independent protects against NIC, Switch, cable failures, here it doesn’t help you as it doesn’t know (it’s about link failures, not logical issues on a port).

For the majority of my Hyper-V deployments I do not use switch dependent / LACP. The situation where I did had to do with Windows NLB in combination with ICMP Multicast.

Note: You can do VLT, MLAG, stacking and still leverage switch independent teaming, LACP or static switch dependent is NOT mandatory even when possible.

Hyper-V did not find virtual machines to import from the location . The operation failed with error code ‘32784’.


I got contacted by some people how ran into some issues importing VMs from W2K12R2 Hyper-V into W2K12 Hyper-V. They got bitten by this “little” issue: Importing a VM that is exported from Windows Server 2012 R2 into Windows Server 2012 is not supported

This means you get greeted by

Hyper-V did not find virtual machines to import from the location <folder location>.
The operation failed with error code ‘32784’.

image

No the trick of not exporting the VM but doing an “in place” registration doesn’t cut it. That’s great for W2K8R2 to W2K12 or W2K12 to W2K12R2 but not from W2K12R2 to a lower version. In that way the title of the KB article could be seen as a bit misleading or incomplete, but the contents is pretty clear.

And that’s it. Woeps! What you have 200 VMs on the LUNs form the old cluster you already blew away to build the new one? You do have a tested exit plan for this right? Uh no?

Facepalm Combo

Oh MAN, NOOOOO!

Now if it’s only one or two VMs you can always work around this by creating new VMs using the old VHDXs. This will leave you to deal with networking cleanup inside of the VMs and configuring TCP/IP. PowerShell can help here but in large volumes this remains as serious effort. This is also the time that documentation pays!

Now what if this happens to you when you’re trying to roll back a migration of a hyper-V cluster (revert W2K12R2 to W2K12 for example). Well for one you should have know as you did test all this right? Right?!

What are your other options to roll back other than  the above? From the top of my head and without details?

  • Move back to your old cluster Smile You didn’t already nuke it, I hope.
  • If you have a SAN take a snapshot of the LUNs before you move them to Windows Server 2012 R2 for faster fall back. But beware, if you’re running applications that require some tender loving care in relation to snapshots like Exchange  or Active Directory in those VMs … shutting all VMs down before you create the can help snapshot mitigates issues but is not a full proof approach! “Know thy apps”!
  • A great backup & RESTORE solution to get you back up and running also comes in handy but don’t forget that it requires you to know your apps as well here. Yes, it’s not always just “CLICKEDYCLICKCLICKDONE”
  • Perhaps it’s now time to activate your paused replicas on the DRC cluster or hosts?  You did test this didn’t you?

Now for anyone involved in a migration to Windows Server 2012 R2 there is no excuse not to know this in advance and to test out the new cluster hardware as much as you can. This minimizes the chance you’ll need to fall back. And please test your exit scenarios, really, I mean it.  Also please, you can migrate one LUN/CSV at the time. Try to run the VMs on the first migrated LUN/CSV before you do all the others. That way you can do some damage control.

Now, this is not great but it is what it is and at least now you know before your migrate Winking smile. We’ve also asked MSFT to make falling back a bit less “"involved” in future versions. Perhaps they’ll do that, I’m pretty sure they’ll consider it. And by what we’ve seen in the recently available Technical Preview they did!

Microsoft Hyper-V S3 Cap warning when upgrading a Hyper-V Virtual Machine


When you do an in place upgrade of a Hyper-V virtual machine you’ll get a warning that Microsoft Hyper-V S3 Cap may not work after the upgrade and that you need to update the driver prior to the upgrade.This warning is logged to the Windows Compatibility Report.htm.

image

Microsoft Hyper-V S3 Cap is an old S3 Trio 765 emulated video device and the driver isn’t included anymore so you’ll get this particular warning. This will never give you an issues, all drivers needed are indeed in the install bits. You can safely ignore this and successfully upgrade.

Some people uninstall the device via device manager but basically that’s pure cosmetics & doesn’t really serve a purpose.

This warning is an artifact of the generation 1 virtual machines who still have this device on a PCI bus.  Below is a screenshot of a VM with W2K12R2, generation 1. As you can see the Microsoft Hyper-V S3 Cap is perfectly fine. No worries.image

As a matter of fact you will not even see this device on a generation 2 virtual machine and we should not see this with an upgrade of those.

image

I will have to wait on a public preview of Windows vNext to test an upgrade of a generation 2 machine to prove my thinking that this cosmetic error won’t be there anymore.

Fixing “Windows cannot find the Microsoft Software License Terms. Make sure the installation sources are valid and restart the installation” Or "Windows installation encountered an unexpected error. Verify the installation sources are accessible, and restart the installation. Error code: 0xE0000100"


When trying to install a Windows 2012 (R2) or Windows 8(8.1) VM you can encounter the following error:

"Windows cannot find the Microsoft Software License Terms.  Make sure the installation sources are valid and restart the installation."

Right after selecting the operating system.image

or perhaps even this error

"Windows installation encountered an unexpected error. Verify the installation sources are accessible, and restart the installation.

Error code: 0xE0000100"

image

The main reason for this on Hyper-V is that you have been to conservative on memory allocation and it could pass some checks. You can hit these errors when you did not assign enough memory to the virtual machine or accepted the default. The default is 512MB and I’ve noticed that on Windows Server  2012 (R2) Hyper-V this can be to little.

image

So the fix is a easy as upping the assigned amount of memory. I went for 1024MBimage

Now start the VM again, hit any key to boot form the virtual DVD to start the setup. After selecting the OS version to install you’re now greeted by the screen to accept the license terms instead of a warning.

image

So click next and install your VM.

Manually Merging Hyper-V Checkpoints


A Last ditch Effort

Fist of all you need to realize this might not work. It’s a last ditch effort. There is a reason why you have backups (with tested restores) and why you should monitor your environment for things that are not as they should be. Early intervention can pay off.

Also see blog post on a couple of more preferred actions.

If you have lost checkpoints, you have basically lost data and corruption/data inconsistencies are very much a possibility and reality. If the files have been copied and information about what file is the parent the dates/timestamps are what you have to go by. You might not know for sure if you have them all.

Setting up the demo

For demo purposes we take a test VM and ad files to indicate what checkpoint we’re at.

We start with ORGINAL.TXT on the desktop and we create a checkpoint, which we rename accordingly.

image

We add a file called CHECK01.TXT and we create a checkpoint, which we rename accordingly.

image

We add a file called CHECK02.TXT and we create a checkpoint, which we rename accordingly.

image

We add a file called NOW.TXT no more checkpoints are taken.

image

The file names represent the content you’d see disappear if you applied the checkpoint and we have reflected this in the name for the checkpoints.

image

As we want to merge all the snapshots and and up with a usable VHDX we’ll work back from the most recent differencing disk until all is merged. As you can see this is a straight forward situation and I hope you’ll never be running having to deal with a vast collection of sub trees Smile.

Finding out what are the parents of avhdx files

In this demo it’s pretty obvious what snapshot exist and what avhdx files they represent. We’ve even shown you the single tree visualized in Hyper-V Manager. In reality bad things  have happened and you don’t see this information anymore. So you might have to find out yourself. This is done via inspect disk in Hyper-V manager. I you’re confused about what the parent is of (a)vhdx files this tool will help you find out or show you what the most recent one was.

image

Sometimes the original files have been renamed or moved and that it will show you’re the last known valid parent.

image

Manually Merging the checkpoints

Remember to make a copy of all files as a backup! Also make sure you have enough free diskspace … you need working space! You might need another shot at this. As we want to merge all the snapshots and and up with a usable VHDX we’ll work back from the most recent differencing disk until all is merged in the oldest one which is the vhdx. You can look at the last modified time stamps to find out the correct order in which to work. The most recent avdx is the one used in the virtual machine configuration file and locate the information for the virtual hard disk.

image

The configuration file’s avhdx is the one containing the “NOW” running state of the VM.

Note: You might find some information that you need to rename the extension avhdx to vhdx (or avhd to vhd). The reason for this was that in Windows 2008 Hyper-V Manager did not show avhd files in the Edit virtual disk wizard. You can still do this and it will still works, but you do not need to. Ever since Windows Server 2008 R2 avhd (and with since Windows Server 2012 avhdx) files do show up in Hyper-V Managers Disk edit.

For some insights as to why the order is important read this blog by Ben Armstrong What happens when a snapshot is being merged? [Hyper-V]

WARNING: If you did not start with the most recent one and work your way down, which is the easiest and least confusing way all is not lost. But you will have to reconnect the first more recent (a)vhdx to one to it’s new parent. This is needed as by merging a snapshot out of order more recent one will have lost it’s will have lost it’s original parent.

Here’s how to do this: Select reconnect. This is the page you’ll get if you’d start edit disk wizard as all other option are unavailable due to the missing parent.

image

The wizard will tell you what used to be the parent and allow you to select a new one. Make sure to tick the check box for Ignore ID mismatch or the reconnect will fail as you’re previous out of order merge has created a new a(vhdx). If your in this pickle by renaming (a)vhdx files or after a copy this isn’t needed by the way.

image
Follow the wizard after that and when your done you can launch the edit disk wizard again and perform a merge. It’s paramount that you do not mix up orders when doing so that you reconnect to the parent this or you’ll end up in a right mess. There are many permutations, keep it simple!. Do it in order Smile. If you start having multiple checkpoint trees/subtrees things can get confusing very fast.

You might also have to reconnect if the checkpoints have lost their connection the what they know to be their last parent for other reasons. In that case you do this and when that’s done, you merge. Rinse and repeat. The below walk through assumes you have no reconnects to be done. If so it will tell you like in the example above.

Walk trough:

Open the Edit Disk Wizardimage

Select the most recent avhdx & click “Next”

image

image

We choose to merge the avhdx

image

In our case into its parent disk

image
Verify the options are correct and click “Finish”

image

Let the wizard complete

image

That’s it. You’ve merged the most recent snapshot into it’s parent. That means that you have not lost the most recent state of the virtual machine as when it was running before you shut it down. This can be verified by mounting the now most recent avhdx and looking at the desktop for my user profile. You can see the NOW.txt text file is there!

OK, dismount the avhdx and now it’s rinse and repeat.

image

image

You do this over an over again until your merge the last avhdx into the vhdx.

image

Than you have the vhdx you will use to create a new virtual machine.

image

Make sure you get the generation right.

image

Assign memory

image

Connect to the appropriate virtual switch or not if you’re not ready to do this yet

image

Use your vhdx disk that’s the remaining result of your merging efforts

image

image

When you boot that virtual machine you’ll see that all the text files are there. It’s as if you’ve deleted the checkpoints in the GUI and retained “NOW” in the vhdx.

image

image

Last but not least, you can use PowerShell or even DiskPart for this but I found that most people in this pickle value a GUI. Use what you feel most comfortable with.

Thanks for reading and hope this helps someone. Do remember “big boy” rules apply. This is not safe, easy or obvious in each and every situation so you are responsible for everything you do in your environment. If your in to deep, way over your head, etc. call in some expert help.

3 Ways To Deal With Lingering Hyper-V Checkpoints Formerly Known as Snapshots


Lingering or phantom Hyper-V checkpoints or snapshots

Once in a while the merging of checkpoints, previously known as snapshots, in Hyper-V goes south. An example of this is when checkpoints are not cleaned up and the most recent avhdx or multiple of these remains in use as active virtual disk/still even as you don’t see them anymore as existing in the Hyper-V Manager UI for example. When that happens you can try looking at the situation via PowerShell to see if that show the same situation. Whatever the cause, once in while I come across virtual machines that have one or more avhdx (or avdh) active that aren’t supposed to be there anymore. In that case you have to do some manual housekeeping.

Now please, do not that in Windows Server 2012(R2) Hyper-V replica is using checkpoints and since Windows Server 2012 R2 backups also rely on this. Just because you see a snapshot you didn’t create intentionally, don’t automatically think they’re all phantoms. They might exits temporarily for good reason Winking smile. We’re talking about dealing with real lingering checkpoints.

Housekeeping

Housekeeping comes in a couple of variants form simply dusting of to industrial cleaning. Beware of the fact that the latter should never be a considered a routine operation. It’s not a normal situation. It’s a last ditch resort and perhaps you want to call support to make sure that you didn’t miss anything else.

Basically you have tree options. In order of the easiest & safest to do first these are:

  1. Create a new checkpoint and delete it. Often that process will take care of merging the other (older) lingering avhd/avhdx files as well. This is the easiest way to deal with it and it’s as safe as it gets. Hyper-V cleans up for you, you just had to give it a kick start so to speak.
  2. Shut down the VM and create a new checkpoint. Export that newly created checkpoint. Yes you can do that. This will create a nicely exported virtual machine that only has the relevant vhd/vhdx files and no more checkpoints (avhd/avhdx). Do note that this vhd/vhdx is dynamically expanding one. If that is not to your liking you’ll need to convert it to fixed. But other than that you can dump the old VM (don’t delete everything yet) and replace it by importing the one you just exported. For added security you could first copy the files for save guarding before you attempt this. image
  3. Do manual mergers. This is a more risky process & prone to mistakes. So please do this only on a copy of the files. That way you’ll give Microsoft Support Services a fighting change if things don’t work out or you make a mistake. Also note that in this case you get one or more final VHDX files which you’ll use to create a new virtual machine with to boot from. It’s very hands on.

So that’s the preferred order of things to try/do in regards to safety. The 3rd option, is the last resort. Don’t do it before you’ve tried options 1 and 2. And as said above, if you do need to go for option 3, do it on copies.If you’re unsure on how to proceed with any of this, get an expert involved.

There’s actually another option which is very save but not native to Hyper-V. In the running virtual machine which current state you want to preserve do a V2V using Disk2vhd v2.01. Easy and sort of idiot proof if such a thing exists.

In a next blog post I’ll walk you through the procedure for the 3rd option. So if this is your last resort you can have practiced it before you have to use it in anger. Bit please, if needed, and do make sure it’s really needed as discussed above, try 1 first. If that doesn’t do it. Then try option 2. If that also fails try option 3. Do not that for option 2 and 3 you will have to create a new virtual machine with the resulting VHDX, having the required settings documented will help in this case.

Defragmenting your CSV Windows 2012 R2 Style with Raxco Perfect Disk 13 SP2


When it comes to defragmenting CSV it seemed we took a step back when it comes to support from 3rd party vendors. While Windows provides for a great toolset to defragment a CSV it seemed to have disappeared form 3r party vendor software. Even from the really good Raxco Perfect disk. They did have support for this with Windows 2008 R2 and I even mentioned that in a blog.

If you need information on how to defragment a CSV in Windows 2012 R2, look no further.There is an absolutely fantastic blog post on the subject How to Run ChkDsk and Defrag on Cluster Shared Volumes in Windows Server 2012 R2, by Subhasish Bhattacharya one of the program managers in the Clustering and High Availability product group. He’s a great guy to talk shop to by the way if you ever get the opportunity to do so. One bizarre thing is that this must be the only place where PowerShell (Repair-ClusterSharedVolume cmdlet) is depreciated in lieu of chkdsk.

3rd party wise the release of Raxco Perfect Disk 13 SP2 brought back support for defragmenting CSV.

image

I don’t know why it took them so long but the support is here now. It looks like they struggled to get the CSVFS (the way CSV are now done since Windows Server 2012) supported. Whilst add it, they threw in support for ReFS by the way. This is the first time I’ve ever seen this. Any way it’s here and that’s good because I have a hard time accepting that any product (whatever it does) supports Hyper-V if it can’t handle CSV, not if you want to be taken seriously anyway. No CSV support equals = do not buy list in my book.

Here’s a screenshot of Perfect disk defragmenting away. One of the CSV LUNs in my lab is a SSD and the other a HDD.

image

Notice that in Global Settings you can tweak the behavior when defragmenting optimization of various drive types, including CSVFS but you just have to leave the default on unless you like manual labor or love PowerShell that much you can’t forgo any opportunity to use it Winking smile

image

Perfect disk cannot detect what kind of disks you have behind the CSV LUN so you might want to change the optimization method if you’re running SSD instead of HHD.

image

I’d love for Raxco to comment on this or point to some guidance.

What would also be beneficial to a lot of customers is guidance on defragmentation on the different auto-tiering storage arrays. That would make for a fine discussion I think.