I Can’t Afford 10GBps For Hyper-V And Other Lies


You’re wrong

There, I said it. Sure you can. Don’t think you need to be a big data center to make this happen. You just need to think and work outside the box a bit and when you’re not a large enterprise, that’s a bit more easy to do. Don’t do it like a big name brand, traditionalist partner would do it (strip & refit the entire structural cabling in the server room, high end gear with big margins everywhere). You’re going for maximum results & value, not sales margins and bonuses.

I would even say you can’t afford to stay on 1Gbps much longer or you’ll be dealing with the fall out of being stuck in the past. Really some of us are already look at > 10Gbps connections to the servers, actually. You need to move from 1Gbps or you’ll be micro managing a way around issues sucking all the fun out of your work with ever diminishing results and rising costs for both you and the business.

Give your Windows Server 2012R2 Hyper-V environment the bandwidth it needs to shine and make the company some money. If all you want to do is to spent as little money as possible I’m not quite sure what your goal is? Either you need it or you don’t.  I’m convinced we need it. So we must get it. Do what it takes. Let me show you one way to get what you need.

Sounds great what do I do?

Take heart, be brave and of good courage! Combine it with skills, knowledge & experience to deliver a 10Gbps infrastructure as part of ongoing maintenance & projects. I just have to emphasize that some skills are indeed needed, pure guts alone won’t do it.

First of all you need to realize that you do not need to rip and replace your existing network infrastructure. That’s very hard to get approval for, takes too much time and rapidly becomes very expensive in both dollars and efforts. Also, to be honest, quiet often you don’t have that kind of pull. I for one certainly do not. And if I’d try to do that way it takes way too many meetings, diplomacy, politics, ITIL, ITML & Change Approval Board actions to make it happen. This adds to the cost even more, both in time and money. So leave what you have in place, for this exercise we assume it’s working fine but you can’t afford to have wait for many hours while all host drains in 6 node cluster and you need to drain all of them to add memory. So we have a need (OK you’ll need a better business case than this but don’t make to big a deal of it or you’ll draw unwanted attention) and we’ve taking away the fear factor of fork lift replacing the existing network which is a big risk & cost.

So how do I go about it?

Start out as part of regular upgrades, replacement or new deployments. The money is their for those projects. Make sure to add some networking budget and leverage other projects need to support the networking needs.

Get a starter budget for a POC of some sort, it will get your started to acquire some more essential missing  bits.

By reasonably cheap switches of reasonable port count that do all you need. If they’re readily available in a frame work contract, great. You can get it as part of the normal procedures. But if you want to nock another 6% to 8% of the cost order them directly from the vendor. Cut out the middle man.

Buy some gear as part of your normal refresh cycle. Adapt that cycle life time a bit to suit your needs where possible. Funding for operation maintenance & replacement should already be in place right?

Negotiate hard with your vendor. Listen, just like in the storage world, the network world has arrived at a point where they’re not going to be making tons of money just because they are essential. They have lots of competition and it’s only increasing. There are deals to be made and if you chose the right hardware it’s gear that won’t lock you into proprietary cabling, SPF+ modules and such. Or not to much anyway Smile.

Design options and choices

Small but effective

If you’re really on minimal budget just introduce redundant (independent) stand alone 10Gbps switches for the East-West traffic that only runs between the nodes in the data center. CSV, Live Migration, backup. You don’t even need to hook it up to the network for data traffic, you only need to be able to remotely manage it and that’s what they invented Out Off Band (OOB) ports for. See also an old post of mine Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4). In the smallest cheapest scenario I use just 2 independent switches. In the other scenario build a 2 node spine and the leaf. In my examples I use DELL network gear. But use whatever works best for your needs and your environment. Just don’t go the “nobody ever got fired for buying XXX” route, that’s fear, not courage! Use cheaper NetGear switches if that fits your needs. Your call, see my  recent blog post on this 10Gbps Cheap & Without Risk In Even The Smallest Environments.

Medium sized excellence

First of all a disclaimer: medium sized isn’t a standardized way of measuring businesses and their IT needs. There will be large differences depending on you neck of the woods Smile.

Build your 10Gbps infrastructure the way you want it and aim it to grow to where it might evolve. Keep it simple and shallow. Go wide where you need to. Use the Spine/Leaf design as a basis, even if what you’re building is smaller than what it’s normally used for. Borrow the concept. All 10Gbps traffic, will be moving within that Spine/Leaf setup. Only client server traffic will be going out side of it and it’s a small part of all traffic. This is how you get VM mobility, great network speeds in the server room avoiding the existing core to become a bandwidth bottleneck.

You might even consider doing Infiniband where the cost/Gbps is very attractive and it will serve you well for a long time. But it can be a hard sell as it’s “another technology”.

Don’t panic, you don’t need to buy a bunch of Nexus 7000’s  or Force10 Z9000 to do this in your moderately sized server room. In medium sized environment I try to follow the “Spine/Leaf” concept even if it’s not true ECMP/CLOSS, it’s the principle. For the spine choose the switches that fit your size, environment & growth. I’ve used the Force10 S4810 with great success and you can negotiate hard on the price. The reasons I went for the higher priced Force10 S4810 are:

  • It’s the spine so I need best performance in that layer so that’s where I spend my money.
  • I wanted VLT, stacking is a big no no here. With VLT I can do firmware upgrades without down time.
  • It scales out reasonably by leveraging eVLT if ever needed.

For the ToR switches I normally go with PowerConnect 81XX F series or the N40XXF series, which is the current model. These provide great value for money and I can negotiate hard on price here while still getting 10Gbps with the features I need. I don’t need VLT as we do switch independent NIC teaming with Windows. That gives me the best scalability wit DVMQ & vRSS and allows for firmware upgrades without any network down time in the rack. I do sacrifice true redundant LACP within the rack but for the few times I might really need to have that I could go cross racks & still maintain a rack a failure domain as the ToRs are redundant. I avoid stacking, it’s a single point of failure during firmware upgrades and I don’t like that. Sure I can could leverage the rack a domain of failure to work around that but that’s not very practical for ordinary routine maintenance. The N40XXF also give me the DCB capabilities I need for SMB Direct.

Hook it up to the normal core switch of the existing network, for just the client/server.(North/South) traffic. I make sure that any VLANs used for CSV, live migration, can’t even reach that part of the network.  Even data traffic (between virtual machines, physical servers) goes East-West within your Spine/Leave and never goes out anyway unless you did something really weird and bad.

As said, you can scale out VLT using eVLT that creates a port channel between 2 VLT domains. That’s nice. So in a medium sized business you’re pretty save in growth. If you grow beyond this, we’ll be talking about a way larger deployment anyway and true ECMP/CLOS and that’s not the scale I’m dealing with where. For most medium sized business or small ones with bigger needs this will do the job. ECMP/CLOS Spine/leaf actually requires layer 3 in the design and as you might have noticed I kind if avoid that. Again, to get to a good solution today instead of a real good solution next year which won’t happen because real good is risky and expensive. Words they don’t like to hear above your pay grade.

The picture below is just for illustration of the concept. Basically I normally have only one VLT domain and have two 10Gbps switches per rack. This gives me racks as failure domains and it allows me to forgo a lot of extra structural cabling work to neatly provide connectivity form the switches  to the server racks .image

You have a  scalable, capable & affordable 10Gbps or better infrastructure that will run any workload in style.. After testing you simply start new deployments in the Spine/Leaf and slowly mover over existing workloads. If you do all this as part of upgrades it won’t cause any downtime due to the network being renewed. Just by upgrading or replacing current workloads.

The layer 3 core in the picture above is the uplink to your existing network and you don’t touch that. Just let if run until there nothing left in there and you can clean it up or take it out. Easy transition. The core can be left in place or replaces when needed due to age or capabilities.

To keep things extra affordable

While today the issues with (structural) 10Gbps copper CAT6A and NICs/Switches seem solved, when I started doing 10Gbps fibre cabling of Copper Twinax Direct Attach was the only way to go. 10GBaseT wasn’t an option yet and I still love the flexibility of fibre, it consumes less space and weighs less then CAT6A. Fibre also fits easily in existing cable infrastructure. Less hassle. But CAT6A will work fine today, no worries.

If you decide to do fibre, buy OM3, you can get decent, affordable cabling on line. Order it as consumable supplies.

Spend some time on the internet and find the SFP+ that works with your switches to save a significant amount of money. Yup some vendor switches work with compatible non OEM branded SPF+ modules. Order them as consumable supplies, but buy some first to TEST! Save money but do it smart, don’t be silly.

For patch cabling 10Gbps Copper Twinax Direct Attach works great for short ranges and isn’t expensive, but the length is limited and they get thicker & more sturdy and thus unwieldy by length. It does have it’s place and I use them where appropriate.

Isn’t this dangerous?

Nope. Technology wise is perfectly sound and nothing new. Project wise it delivers results, fast, effective and without breaking the bank. Functionally you now have all the bandwidth you need to stop worrying and micromanaging stuff to work around those pesky bandwidth issues and focus on better ways of doing things. You’ve given yourself options & possibilities. Yay!

Perhaps the approach to achieve this isn’t very conventional. I disagree. Look, anyone who’s been running projects & delivering results knows the world isn’t that black and white. We’ve been doing 10Gbps for 4 years now this way and with (repeated) great success while others have to wait for the 1Gbps structural cabling to be replaced some day in the future … probably by 10Gbps copper in a 100Gbps world by the time it happens. You have to get the job done. Do you want results, improvements, progress and success or just avoid risk and cover your ass? Well then, choose & just make it happen. Remember the business demands everything at the speed of light, delivered yesterday at no cost with 99.999% uptime.  So this approach is what they want, albeit perhaps not what they say.

Advertisements

Live Migration Speed Check List – Take It Easy To Speed It Up


When configuring live migrations it’s easy to go scrounge on all the features and capabilities we have in Windows Server 2012 R2.

There is no one stopping you configuring 50 simultaneous live migrations. When you have only one, two or even four 1Gbps NICs at your disposal,  you might stick to 1 or 2 VMs per available 1Gbps. But why limit yourself if you have one or multiple 10Gbps pipes or bigger ready to roll? Well let’s discuss a little what happens when you do a live migration on a Hyper-V cluster with CSV storage. Initiating a live migrations kicks of a slew of activities.

  1. First it is establish form where (aka the source host) to where we are migrating (aka the target host).
  2. Permissions are checked, are we allowed to do this?
  3. Do we have enough memory on the target to do this? If so allocate that memory.
  4. Set up a skeleton VM on the target host that is a perfect copy of the source VM’s  specifications and configure dependencies on the target host.
  5. Let’s see if we can get a network connection set up and running. If that works, we’re cool and can now transfer the memory.
  6. A bitmap is created to track the changes to the memory pages of the source VM’s pages. Each memory page is copied from the source host to the target host VM during which the memory page is marked clean.
  7. As long as the source VM is running memory is changing, which continues to be tracked in the bitmap and as such that page is mapped as dirty over there. In an iterative process this dirty memory is copied over again and so on. This continues until the remaining dirty memory is minimal. This will take longer if the VM is very memory intensive.
  8. The tiniest amount of not yet copied dirty memory is that part of a VMs state that is copied during “black out”. For this to happen the VM on the source host is paused, the remaining state is copied.
  9. A final check is done to confirm all is well and then the virtual machine is resumed on the target host.
  10. Any remains of the VM on the source host are cleaned up.

That’s actually a lot of work and as you can see copying the state is just part of the process. The more bandwidth & the lower the latency we throw at this part of the process becomes less of the total time spent during live migration.

If you can’t fill of just fill the bandwidth of your 10/40/46Gbps pipe or pipes & you operate at line speed, what’s left as overhead? Everything that’s not actual the copy of VM state. The trick is to keep the host busy so you minimize idle time of the network copies. I.e we want to fill up that bandwidth just right but  not go overboard otherwise  the work to manage a large number of multiple live migrations might actually slow you down. Compare it to juggling with balls. You might be very good and fast at it but when you have to many balls to attend to you’ll get into trouble because you have to spread you attention to wide, i.e. you’re doing more context switching that is optimal.

So tweaking the number of simultaneous live migrations to your environment is the last step in making sure a node is drained as fast as possible. Slowing things down can actually speed things up.  So when you get your 10Gbps or better pipes in production it pays of to test a bit and find the best settings for your environment.

Let’s recap all of the live migration optimization tips I have given over the years and add a final word of advice.  Those who have been reading my blog for a while know I enjoy testing to find what works best and I do tweak settings to get best performance and results. However you have to learn and accept that it makes no sense in real life to hunt for 1% or 2% reduction in live migration speeds. You’ll get one off  hiccups that slow you down more than that.

So what you need to do is tweak the things that matter the most and will get you 99% results?

  • Get the biggest pipe you need & can afford. Bigger pipes are always better than lots of aggregated smaller pipes when it come to low latency & high throughput.
  • Choose the best performance settings Hyper-V offers you. You can choose from TCP/IP,Compression, SMB. Ben Armstrong has a blog post on this Faster Live Migration–Which Option Should You Choose? I’d like to add that you can use NIC teaming for live migration as well and prior to Windows Server 2012 R2 that was the only way to aggregate bandwidth. Now you have more options. I prefer SMB but when I don’t have 10Gbps at my disposal I have found that compression really makes a difference. In my home  lab where I have only 1Gbps, the horror, it stopped me from going crazy Smile (being addicted to 10Gbps).

image

  • Optimize the power settings for your server BIOS if you want an extra speed & smoothness with 10Gbps (less so with 1Gbps). Look here An Early Look At Live Migration Over TCP/IP & Multichannel In Windows Server 2012 R2 Preview, the network traffic is a lot more stable, i.e. a flat line!  In Windows 2008 R2 this was a real need for 10Gbps or you’d be stuck at 16% max.
  • Enable Jumbo Frames for another 15-20%. Thanks to Multi Channel I can visualize this now. See also this blog post Live Migration Can Benefit From Jumbo Frames. The pictures say it all!
  • Figure out the best number of simultaneous live migrations in your environments. Well you just read this blog, so now you know.  Start at 4 and experiment upwards. Tune it back down if the speed deteriorates. The “best” number depends on your environment.

If you do these 5 things you’ll have really gotten the best performance out of your infrastructure that’s possible for live migration. Bar compression, which is not magic either but reducing the GB you need to transport at the cost of CPU cycles, you just cannot push more than 1.25GB/s trough a single 10Gbps pipe and so on. You might keep looking to grab another 1% or 2% improvement left and right  but might I suggest you have more pressing issues to attend to that, when fixed are a lot more rewarding? Knocking 1 or 2 seconds of a 100 second host evacuation is not going to matter, it’s a glitch. Stop, don’t over engineer it, don’t IBM it, just move on. If you don’t get top performance after tweaking these 5 settings you should look at all the moving parts involved between the host as the issue is there (drivers, firmware, cables, switch configurations, …) as you have a mistake or problem somewhere along the way.

Windows NLB On Windows Server 2012 R2 Hyper-V: A Personal Preferred Configuration Using IGMP With Multicast


To know and see the issues we are dealing with in this demo, you need to read this blog post first: Windows NLB Nodes “Misconfigured” after Simultaneous Live Migration on Windows Server 2012 (R2).

We were dealing with some issues on on several WNLB clusters running on a Windows Server 2012 R2 Hyper-V cluster after a migration from an older cluster. So go read that and come back Smile.

Are you back? Good.

Let’s look at the situation we’ll use to show case one possible solution to the issues. If you have  a 2 node Hyper-V cluster, are using NIC Teaming for the switch and depending on how teaming is set up you’ll might run into these issues. Here we’ll use a single switch to mimic a stacked one (the model available to me is non stackable and I have only one anyway).

  • Make sure you enable MAC Spoofing on the appropriate vNIC or vNICs in the advanced settings

image

  • Note that there is no need to use a static MAC address or copy your VIP mac into the settings of your VM  with Windows Server 2012 (R2) Hyper-V
  • Set up WNLB with IGMP multicast as option. While chancing this there will be some advice warnings thrown at you Smile

image

image

I’m not going in to the fact that since W2K8 the network default configurations are all about security. You might have to do some configuration work to get the network flow to do what it needs to do. Lots on this  weak host/strong host model behavior on the internet . Even wild messy ramblings by myself here.

On to the switch itself!

Why IGMP multicast? Unicast isn’t the best option and multicast might not cut it or be the best option for your environment and IGMP is less talked about yet it’s a nice solution with Windows NLB, bar replacing into with a hardware load balancer. For this demo I have a DELL PowerConnect 5424 at my disposal. Great little switch, many of them are still serving us well after 6 years on the job.

What MAC address do I feed my switch configurations?

Ah! You are a smart cookie, aren’t you. A mere ipconfig reveals only the unicast MAC address of the NIC. The GUI on WNLB shows you the MAC address of the VIP. Is that the correct one for my chosen option, unicast, multicast or IGMP multicast?  No worries, the GUI indeed shows the one you need based on the WNLB option you configure. Also, take a peak at nlb.exe /? and you’ll find a very useful option called ip2mac.

Let’s run that against our VIP:

image

And compare it to what we see in the GUI, you’ll notice that show the MAC to use with IGMP multicast as well.

image

You might want to get the MAC address before you configure WNLB from unicast to IGMP multicast. That’s where the ip2mac option comes in handy.

Configuring your switch(es)

We have a multicast IP address that we’ll convert into the one we need to use. Most switches like the PowerConnect 5424 in the example will do that for you by the way.

I’m not letting the joining of the members to the Bridge Multicast Group happen automatically so I need to configure this. I actually have to VLANs, each Hyper-V host has 2 LACP NIC team with Dynamic load balancing connected to an LACP LAGs on this switch (it’s a demo, yes, I know no switch redundancy).  I have tow as some WNLB nodes have multiple clusters and some of these are on another VLAN.

I create a Bridge Multicast Group. For this I need the VLAN, the IGMP multicast MAC address and cluster IP address

When I specify the IGMP multicast MAC I take care to format it correctly with “:” instead of “–“ or similar.

You can type in the VIP IP address or convert is per this KB yourself. If you don’t the switch will sort you out.

The address range of the multicast group that is used is 239.255.x.y, where x.y corresponds to the last two octets of the Network Load Balancing virtual IP address.

For us this means that our VIP of 172.31.3.232 becomes 239.255.3.232. The switch handles typing in either the VIP or the converted VIP equally well.

image

This is what is looks like, here there are two WNLB clusters in ICMP multicast mode configured. There are more on the other VLAN.

image

We leave Bridge Multicast Forwarding here for what it is, no need in this small setup. Same for IGMP Snooping. It’s enable globally and we’ve set the members statically.

We make unregistered multicast is set to forwarding (default).

image

Basically, we’re good to go now. Looking at the counters of the interfaces & LAGs you should see that the multicast traffic is targeted at the members of the LAGs/LAGs and not all interfaces of the switch. The difference should be clear when you compare the counters adding up before and after you configured IGMP.

image

The Results

No over the top switch flooding, I can simultaneously live migrate multiple WLNB nodes and have them land on the same switch without duplicate IP address warning. Will this work for you. I don’t know. There are some many permutations that I can’t tell you what you should do in your particular situation to make it work well. I’ll just quote myself from my previous blog post on this subject:

“"If you insist you want my support on this I’ll charge a least a thousand Euro per hour, effort based only. Really. And chances are I’ll spend 10 hours on it for you. Which means you could have bought 2 (redundancy) KEMP hardware NLB appliances and still have money left to fly business class to the USA and tour some national parks. Get the message?”

But you have seen some examples on how to address issues & get a decent configuration to keep WNLB humming along for a few more years. I really hope it helps out some of you struggling with it.

Wait, you forgot the duplicate IP Address Warning!

No, I didn’t. We’ll address that here. There are some causes for this:

  • There is a duplicate IP address. If so, you need to address this.
  • A duplicate IP address warning is to be expected when you switch between unicast and multicast NLB cluster modes (http://support.microsoft.com/kb/264645). Follow the advice in the KB article and clearing the ARP tables on the switches can help and you should get rid of it, it’s transient.
  • There are other cause that are described here Troubleshooting Network Load Balancing Clusters. All come down to the fact that somehow you’re getting multiple MAC address associated with the same IP address. One possible cause can be that you migrated form an old cluster to a new cluster, meaning that the pool of dynamic IP addresses is different and hence the generated VIP MAC … aha!
  • Another reason, and again associated multiple MAC address associated with the same IP address is that you have an old static ARP entry for that IP address somewhere on your switches. Do some house cleaning.
  • If all the above is perfectly fine and you’re certain this is due to some Hyper-V live migration, vSwitch, firmware, driver bug you can get rid of the warning by disabling ARP checks on the cluster members. Under HKLM\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters, create a DWORD value with as name “ArpRetryCount” and set the value to 0. Reboot the server for this to take effect. In general this is not a great idea to do. But if you manage your IP addresses well and are sure no static entries are set on the switch it can help avoid this issue. But please, don’t just disable “ArpRetryCount” and ignore the root causes.

Conclusion

You can still get WNLB to work for you properly, even today in 2014. But it’s time to start saying goodbye to Windows NLB. The way the advanced networking features are moving towards layer 3 means that “useful hacks” like MAC spoofing for Windows NLB are going no longer going to work.  But until you have implement hardware load balancing I hope this blog has given you some ideas & tips to keep Windows NLB running smoothly for now. I’ve done quite few and while it takes some detective work & testing, so far I have come out victorious. Eat that Windows NLB! I have always enjoyed making it work where people said it couldn’t be done. But with the growing important of network virtualization and layer 3 in our networks, this nice hack, has had it’s time.

For some reasons developers like Windows NLB as  “it’s easy and they are in control as it runs on their servers”. Well … as you have seen nothing comes free and perhaps our time is better spend in some advanced health checking and failover in hardware load balancing. DevOps anyone?

Windows NLB Nodes Misconfigured after Simultaneous Live Migration on Windows Server 2012 (R2)


Here’s the deal. While Windows NLB on Hyper-V guests might seem to work OK you can run into issues. Our biggest challenge was to keep the WNLB cluster functional when all or multiple node of the cluster are live migrated simultaneously. The live migration goes blazingly fast via SMB over RDMA nut afterwards we have a node or nodes in an problematic state and clients being send to them are having connectivity issues.

After live migrating multiple or all nodes of the Windows NLB cluster simultaneously the cluster ends up in this state:

image

A misconfigured interface. If you click on the error for details you’ll see

image

Not good, and no we did not add those IP addresses manually or so, we let the WNLB cluster handle that as it’s supposed to do. We saw this with both fixed MAC addresses (old school WNLB configuration of early Hyper-V deployments) and with dynamic MAC addresses. On all the nodes MAC spoofing is enabled on the appropriate vNICs.

The temporary fix is rather easy. However it’s a manual intervention and as such not a good solution. Open up the properties of the offending node or nodes (for every NLB cluster that running on that node, you might have multiple).

image

Click “OK” to close it …

image

… and you’re back in business.

image

image

Scripting this out somehow with nlb.exe or PowerShell after a guest gets live migrated is not the way to go either.

But that’s not all. In some case you’ll get an extra error you can ignore if it’s not due to a real duplicate IP address on your network:

image

We tried rebooting the guest, dumping and recreating the WNLB cluster configuration from scratch. Clearing the switches ARP tables. Nothing gave us a solid result.

No you might say, Who live migrates multiple WNLB nodes at the same time? Well any two node Hyper-V cluster that uses Cluster Aware Updating get’s into this situation and possibly bigger clusters as well when anti affinity is not configured or chose to keep guest on line over enforcing said anti affinity, during a drain for an intervention on a cluster perhaps etc. It happens. Now whether you’ll hit this issue depends on how you configure and use your switches and what configuration of LBFO you use for the vSwitches in Hyper-V.

How do we fix this?

First we need some back ground and there is way to much for one blog actually. So many permutations of vendors, switches, configurations, firmware & drivers …

Unicast

This is the default and Thomas Shinder has an aging but  great blog post on how it works and what the challenges are here. Read it. It you least good option and if you can you shouldn’t use it. With Hyper-V we and the inner workings and challenges of a vSwitch to the mix. Basically in virtualization Unicast is the least good option. Only use it if your network team won’t do it and you can’t get to the switch yourself. Or when the switch doesn’t support mapping a unicast IP to a multicast MAC address. Some tips if you want to use it:

  1. Don’t use NIC teaming for the virtual switch.
  2. If you do use NIC teaming for the virtual switch you should (must):
    • use switch independent teaming on two different switches.
    • If you have a stack or just one switch use multicast or even better IGMP with multicast to avoid issues.

I know, don’t shout at me, teaming on the same switch, but it does happen. At least it protects against NIC issues which are more common than switch or switch port failures.

Multicast

Again, read Thomas Shinder his great blog post on how it works and what the challenges are here.

It’s an OK option but I’ll only use it if I have a switch where I can’t do IGMP and even then I do hope I can do two things:

  1. Add a static entry for the cluster IP address  / MAC address on your switch if it doesn’t support IGMP multicast:
    • arp [ip] [cluster multicast mac*] ARPA  > arp 172.31.1.232  03bf.bc1f.0164 ARPA
  2. To prevent switch flooding occurs, as with the unicast configure your switch which ports to use for multicast traffic:
    • mac-address-table static [cluster multicast mac] [vlan id] [interface]  > mac-address-table static 03bf.bc1f.0164 vlan 10 interface Gi1/0/1

The big rotten thing here is that this is great when you’re dealing with physical servers. They don’t tend to jump form switch port to switch port and switch to switch on the fly like a virtual machine live migrating. You just can’t hardcode all the vSwitch ports into the physical switches, one they move and depending on the teaming choice there are multiple ports, switches etc …it’s not allowed and not possible. So when using multicast in a Hyper-V environment stick to 1). But here’s an interesting fact. Many switches that don’t support 1) do support 2). Fun fact is that most commodity switches do seems to support IGMP … and that’s your best choice anyway! Some high end switches don’t support WNLB well but in that category a hardware load balancer shouldn’t be an issue. But let’s move on to my preferred option.

  • IGMP With Multicast (see IGMP Support for Network Load Balancing)

    This is your best option and even on older, commodity switches like a DELL PowerConnect 5424 or 5448 you can configure this. It was introduced in Windows Server 2003 (did not exist in NT4.0 or W2K). It’s my favorite (well, I’d rather use hardware load balancing) in a virtual environment. It works well with live migration, prevents switch flooding and with some ingenuity and good management we can get rid of other quirks.

    So Didier, tell us, how to we get our cookie and eat it to?

    Well, I will share the IGMP with Multicast solution with you in a next blog. Do note that as stated above there are some many permutations of Windows, teaming, WNL, switches  & firmware/drivers out there I give no support and no guarantees. Also, I want to avoid writing a  100 white paper on this subject?. If you insist you want my support on this I’ll charge at least a thousand Euro per hour, effort based only. Really. And chances are I’ll spend 10 hours on it for you. Which means you could have bought 2 (redundancy) KEMP hardware NLB appliances and still have money left to fly business class to the USA and tour some national parks. Get the message?

    But don’t be sad. In the next blog we’ll discuss some NIC teaming for the vSwitch, NLB configuration with IGMP with Multicast and show you a simple DELL PowerConnect 5424 switch example that make WNLB work on a W2K12R2 Hyper-V cluster with NIC teaming for the vSwitch and avoids following issues:

    • Messed up WNLB configuration after the simultaneous live migration of all or multiple NLB Nodes.
    • You avoid “false” duplicate IP address goof ups (at the cost of  IP address hygiene management).
    • You prevent switch port flooding.

    I’d show you on redundant Force10 S4810 but for that I need someone to ship me some of those with SFP+ modules for the lab, free of cost for me to keep Winking smile

    Conclusion

    It’s time to start saying goodbye to Windows NLB. The way the advanced networking features are moving towards layer 3 means that “useful hacks” like MAC spoofing for Windows NLB are going no longer going to work.  But until you have implement hardware load balancing I hope this blog has given you some ideas & tips to keep Windows NLB running smoothly for now. I’ve done quite few and while it takes some detective work & testing, so far I have come out victorious. Eat that Windows NLB!

  • Don’t Let Live Migration Settings Behavior Confuse You


    Configuring live migration settings on a cluster

    In the cluster under Networks, Live Migration Settings you can select what networks are available for live migration and their order of preference.image

    image

    Basically this setting determines what NICs can be used. It also determines and in what order of preference the available networks can be used by Live Migration. It does not determine bandwidth aggregation or failover. All it does is provide the order in which the redundant networks will be used. It’s up to the cluster service NETFT, Multichannel or SMB Direct to provide the bandwidth aggregation if possible As you can see we use LM/CSV over SMB and as our two NICS are RDMA capable 10Gbps NIC, multichannel will discover RDMA capabilities & leverage SMB Direct if it can be establish otherwise it will just stick with multichannel. If you would  team that NIC shows up as just one network. Also not that if you lose a NIC during live migration it might fail for some VMs under certain scenarios, but you cluster nodes will maintain the capability & recover. The  names of the network reflect this: LM1+CSV2 & CSV1+LM2 will be used both but if for some reason multichannel goes completely south the names reflect the metrics of these networks. The lowest is CSV1+LM2 and the second lowest is LM1+CSV2, reflect on how NETFT will select to use which automatically based on the metrics. I.e. It’s “self documentation” for human consumption.

    image

    Sometimes you might get surprising results. An example. If you’ve selected SMB for Live Migration and you have selected only one of the NICs here.image Still when you look at perform you might see both being used! What’s happening is that multichannel will kick in (and use two or more similar NICs when it finds them and if applicable move to RDMA.

    image

    So here we select SMB for the live migration type and the two equally capable 10GBps NICs available for live migration it will use them, even if you selected only one of them in the cluster network settings for live migration.

    Why is that? Well, there is still another location where live migration settings are defined and that is in Hyper-V Manager. Take a look at the screenshot below.

    image

    The Incoming live migration settings is set to “Use any available network for live migration”. If you have this on it will still leverage both as when one is used multichannel drag the other one into action anyway, no matter what you set in network settings for live migration in the Cluster GUI (it set to use only one and dimmed out).

    Do note that on Hyper-V Manager the settings for live migrations specify “Incoming live migrations”. That leads us to believe that it’s the target, the node where the VMs are migrated to that determines what NICs get used. But let’s put this to the test.

    Testing A Hyper-V cluster with two nodes – Node A and Node B

    On the cluster network settings you select only one network.

    image

    On  Hyper-V cluster Node A you have configured the following for live migration via Hyper-V Manager to “Use these IP addresses for live migration”. You cannot add or remove networks, the networks used are defined by the cluster.

    image

    On  Hyper-V cluster Node B you have configured the following for live migration via Hyper-V Manager to “Use any available network for live migration”.

    As we now kick of a live migration from node A to node B we’ll see both NICs being used. Why well because Node B is the target and Node B has the setting “Use any available network for live migration”. So why then only these 2, why not pick up any other suitable NICs? Well we’ve configured the live migration on both nodes to use SMB. As this cluster is RDMA capable that means it will leverage multichannel/SMB direct. The auto configuration will select the best, equally capable NICs for this and that’s these two in our scenario. Remember the capabilities of the NICs have to match. So no mixtures of 1 * 1Gbps and 1 *10Gbps or 1 * multichannel and 1 SMB Direct.

    The confusion really sets in that even if live migrate from Node B to A it will also use both NICs! Hum, that is “Incoming Live Migrations” is not always “correct” it seems, not when using SMB as a performance option at least. Apparently multichannel will kick in in both directions.

    If you set both to Node A and Node B to “Use these IP addresses for live migration” and leave the cluster network setting with only one network it does only use one, even with SMB as a performance option for live migration. This is expected.

    Note: I had one interesting hiccup while testing this configuration: when doing the latter one of the VMs failed in live migration of the entire host. I ran it again and that one VM still used both networks. All others went well during host migration with just one being used. That was a bit of a huh Confused smile moment and it sure tripped me up & kept me busy for a while. I blame RDMA and the position of the planets & constellations.

    Things aren’t always what they seem at first and it’s good to keep that in mind. The moment you think you got if figured out, you’re wrong Winking smile. So look again & investigate.

    Failed Live Migrations with Event ID 21502 Planned virtual machine creation failed for virtual machine ‘VM Name’: An existing connection was forcibly closed by the remote host. (0x80072746) Caused By Wrong Jumbo Frame Settings


    OK so Live Migration fails and you get the following error in the System even log with event id 21502:

    image

    Planned virtual machine creation failed for virtual machine ‘DidierTest01’: An existing connection was forcibly closed by the remote host. (0x80072746). (Virtual Machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).

    Failed to receive data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0x80072746).

    There are some threads on the TechNet forums on this like here http://social.technet.microsoft.com/Forums/en-US/805466e8-f874-4851-953f-59cdbd4f3d9f/windows-2012-hyperv-live-migration-failed-with-an-existing-connection-was-forcibly-closed-by-the and some blog post pointing to TCP/IP Chimney settings causing this but those causes stem back to the Windows Server 2003 / 2008 era.

    In the Hyper-V event log Microsoft-Windows-Hyper-V-VMMS-Admin you also see a series of entries related to the failed live migration point to the same issue: image

      
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:15 AM
    Event ID:      20413
    Task Category: None
    Level:         Information
    Keywords:     
    User:          SYSTEM
    Computer:      SRV1.BLOG.COM
    Description:
    The Virtual Machine Management service initiated the live migration of virtual machine  ‘DidierTest01’ to destination host ‘SRV2’ (VMID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).
     
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:26 AM
    Event ID:      22038
    Task Category: None
    Level:         Error
    Keywords:     
    User:          SYSTEM
    Computer:      SRV1.BLOG.COM
    Description:
    Failed to send data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0x80072746).
     
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:26 AM
    Event ID:      21018
    Task Category: None
    Level:         Error
    Keywords:     
    User:          SYSTEM
    Computer:      SRV1.BLOG.COM
    Description:
    Planned virtual machine creation failed for virtual machine ‘DidierTest01’: An existing connection was forcibly closed by the remote host. (0x80072746). (Virtual Machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).
     
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:26 AM
    Event ID:      22040
    Task Category: None
    Level:         Error
    Keywords:     
    User:          SYSTEM
    Computer:      SRV1.BLOG.COM
    Description:
    Failed to receive data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0x80072746).
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:26 AM
    Event ID:      21024
    Task Category: None
    Level:         Error
    Keywords:     
    User:          SYSTEM
    Computer:      srv1.blog.com
    Description:
    Virtual machine migration operation for ‘DidierTest01’ failed at migration source ‘SRV1’. (Virtual machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27)

    There is something wrong with the network and if all checks out on your cluster & hosts it’s time to look beyond that. Well as it turns out it was the Jumbo Frame setting on the CSV and LM NICs.

    Those servers had been connected to a couple of DELL Force10  S4810 switches. These can handle an MTU size up to 12000. And that’s how they are configured. The Mellanox NICs allow for MTU Sizes up to 9614 in their Jumbo Frame property.  Now super sized jumbo frames are all cool until you attach the network cables to another switch like a PowerConnect 8132 that has a max MTU size of 9216. That moment your network won’t do what it’s supposed to and you see errors like those above. If you test via an SMB share things seem OK & standard pings don’t show the issue. But some ping tests with different mtu sizes & the –f (do no fragment) switch will unmask the issue soon. Setting the Jumbo Frame size on the CSV & LM NICs to 9014 resolved the issue.

    Now if on the server side everything matches up but not on the switches you’ll also get an event id 21502 but with a different error message:

    Event ID: 21502 The Virtual Machine Management Service failed to establish a connection for a Virtual machine migration with host XXXX. A connection attempt failed because the connected party did not properly respond after a period of time, or the established connection failed because connected host has failed to respond (0X8007274C)

    image

    This is the same message you’ll get for a known cause of shared nothing live migration failing as described in this blog post by Microsoft Shared Nothing Migration fails (0x8007274C).

    So there you go. Keep an eye on those Jumbo Frame setting especially in a mixed switch environment. They all have their own capabilities, rules & peculiarities. Make sure to test end to end and you’ll be just fine.

    Live Migration Can Benefit From Jumbo Frames


    Does live migration benefit from Jumbo frames? This question always comes back so I’d just blog it hear again even if I have mentioned it as part of other blog posts. Yes it does! How do I know. Because I’ve tested and used it with Windows Server 2008 R2, 2012 & 2012 R2. Why? because I have a couple of mantra’s:

    • Assumption are the mother of all fuckups
    • Assume makes an ASS out of U and ME
    • Trust but verify

    What can I say. I have been doing 10Gbps since for Live Migration with Hyper-V. And let me tell you my experiences with an otherwise completely optimized server (mainly BIOS performance settings): It will help you with up to 20% more bandwidth use.

    And thanks to Windows Server 2012 R2 supporting SMB for live migration we can very nicely visualize this with 2*10Gbps NICS, not teamed, used by live migration leveraging SMB Multichannel. On one of the 10Gbps we enable Jumbo Frames on the other one we do not. We than live migrate a large memory VM back and forth. Now you tell me which one is which.

    image

    Now enable Jumbo frames on both 10Gbps NICs and again we live migrate the large memory VM back and forth. More bandwidth used, faster live migration.

    image

    I can’t make it any more clear. No jumbo frames will not kill your performance unless you have it messed up end to end. Don’t worry if you have a cheaper switch where you can only enable it switch wide instead op port per port. The switch is a pass through. So unless you set messed up sizes on sender/receiving host that the switch in between can’t handle, it will work even without jumbo frames and without heaven falling down on your head Smile. Configure it correctly, test it, and you’ll see.