Still Need To Optimizing Power Settings On DELL 12th Generation Servers For Lightning Fast Hyper-V Live Migrations?


Do you remember my blog from 2011 on optimizing some system settings to get way better Live Migration performance with 10Gbps NICs?  It’s over here Optimizing Live Migrations with a 10Gbps Network in a Hyper-V Cluster. This advice still holds true, but the power optimization settings & interaction between DELL Generation 12 Server and with Windows Server 2012 has improved significantly. Where with Windows Server 2008 R2 we could hardly get above 16% bandwidth consumption out of the box with Live Migration over a 10Gbps NIC today this just works fine.

Don’t believe me?image You do now? A cool Winking smile

For overall peak system performance you might want to adjust your Windows configuration settings to run the High Performance preferred power plan, if that’s needed.image

You might do no longer need to dive into the BIOS. Of cause if you have issues because your hardware isn’t that intelligent and/or are still running Windows 2008 R2 you do want to there. As when it comes to speed we want it all and we want it now Smile and than you still want to dive into the BIOS and tweak it even on the DELL 12th Generation hardware. Test & confirm I’d say but you should notice a difference, all be it smaller than with Windows Server 2008 R2.

Well let’s revisit this again as we are now no longer working with Generation 10 or 11 servers with an “aged” BIOS. Now we have decommissioned the Generation 10 server,  upgrade the BIOS of our Generation 11 Servers and acquired Generation 12 servers. We also no use UEFI for our Hyper-V host installations. The time has come to become familiar with those and the benefits they bring. It also future proofs our host installations.

So where and how do I change the power configuration settings now? Let’s walk through one together. Reboot your server and during the boot cycle hit F2 to enter System Setup.image

Select System BIOSimage

Click on System Profile Settingsimage

The settings you want to adapt are:

  • CPU Power Management should be on Maximum Performance
  • Setting Memory Frequency to “Maximum Performance”
  • C1E states should be disabled
  • C states should be disabled

image

That’s it. The below configuration has optimized your power settings on a DELL Generation 12 server like the R720.image

When don, click “Back” and than Finish. A warning will pop up and you need to confirm you want to safe your changes. Click “Yes” if you indeed want to do this.image

You’ll get a nice confirmation that your settings have been saved. Click “OK” and then click Finish.image

Confirm that you want to exit and reboot by clicking yes and voila, when the server comes back on it will be running a full speed at the cost of more power consumption, extra generated heat and cooling.image

Remember, if you don’t need to run at full power, don’t. And if you consider using  Dynamic Optimization and Power Optimization in System Center Virtual Machine Manager 2012. Save a penguin!

Complete VM Mobility Across The Data Center with SMB 3.0, RDMA, Multichannel & Windows Server 2012 (R2)


Introduction

The moment I figured out that Storage Live Migration (in certain scenarios) and Shared Nothing Live Migration leverage SMB 3.0 and as such Multichannel and RDMA in Windows Server 2012 I was hooked. I just couldn’t let go of the concept of leveraging RDMA for those scenarios.  Let me show you the value of my current favorite network design for some demanding Hyper-V environments. I was challenged a couple of time on the cost/port of this design which is, when you really think of it, a very myopic way of calculating TCO/ROI. Really it is. And this week at TechEd North America 2013 Microsoft announced that all types of Live Migrations support Multichannel & RDMA (next to compression) in Windows Server 2012 R2.  Watch that in action at minute 39 over here at Understanding the Hyper-V over SMB Scenario, Configurations, and End-to-End Performance. You should have seen the smile on my face when I heard that one! Yes standard Live Migration now uses multiple NIC (no teaming) and RDMA for lightning fast  VM mobility & storage traffic. People you will hit the speed boundaries of DDR3 memory with this! The TCO/ROI of our plans just became even better, just watch the session.

So why might I use more than two 10Gbps NIC ports in a team with converged networking for Hyper-V in Windows 2012? It’s a great solution for sure and a combined bandwidth of 2*10Gbps is more than what a lot of people have right now and it can handle a serious workload. So don’t get me wrong, I like that solution. But sometimes more is asked and warranted depending on your environment.

The reason for this is shown in the picture below. Today there is no more limit on the VM mobility within a data center. This will only become more common in the future.

image

This is not just a wet dream of virtualization engineers, it serves some very real needs. Of cause it does. Otherwise I would not spend the money. It consumes extra 10Gbps ports on the network switches that need to be redundant as well and you need to have 10Gbps RDMA capable cards and DCB capable switches.  So why this investment? Well I’m designing for very flexible and dynamic environments that have certain demands laid down by the business. Let’s have a look at those.

The Road to Continuous Availability

All maintenance operations, troubleshooting and even upgraded/migrations should be done with minimal impact to the business. This means that we need to build for high to continuous availability where practical and make sure performance doesn’t suffer too much, not noticeably anyway. That’s where the capability to live migrate virtual machines of a host, clustered or not, rapidly and efficiently with a minimal impact to the workload on the hosts involved comes into play.

Dynamics Environments won’t tolerate downtime

We also want to leverage our resources where and when they are needed the most. And the infrastructure for the above can also be leveraged for that. Storage live migration and even Shared Nothing Live Migration can be used to place virtual machine workloads where they are getting the resources they need. You could see this as (dynamically) optimizing the workload both within and across clusters or amongst standalone Hyper-V nodes. This could be to a SSD only storage array or a smaller but very powerful node or cluster in regards to CPU, memory and Disk IO. This can be useful in those scenarios where scientific applications, number crunching or IOPS intesive  software or the like needs them but only for certain times and not permanently.

Future proofing for future storage designs

Maybe you’re an old time fiber channel user or iSCSI rules your current data center and Windows Server 2012 has not changed that. But that doesn’t mean it will not come. The option of using a Scale Out File Server and leverage SMB 3.0 file shares to providing storage for Hyper-V deployments is a very attractive one in many aspects. And if you build the network as I’m doing you’re ready to switch to SMB 3.0 without missing a heart beat. If you were to deplete the bandwidth x number of 10Gbps can offer, no worries you’ll either use 40Gbps and up or Infiniband. If you don’t want to go there … well since you just dumped iSCSI or FC you have room for some more 10Gbps ports Smile

Future proofing performance demands

Solutions tend to stay in place longer than envisioned and if you need some long levity and a stable, standard way of doing networking, here it is. It’s not the most economical way of doing things but it’s not as cost prohibitive as you think. Recently I was confronted again with some of the insanities of enterprise IT. A couple of network architects costing a hefty daily rate stated that 1Gbps is only for the data center and not the desktop while even arguing about the cost of some fiber cable versus RJ45 (CAT5E). Well let’s look beyond the North – South traffic and the cost of aggregating band all the way up the stack with shall we? Let me tell you that the money spent on such advisers can buy you in 10Gbps capabilities in the server room or data center (and some 1Gbps for the desktops to go) if you shop around and negotiate well. This one size fits all and the ridiculous economies of scale “to make it affordable” argument in big central IT are not always the best fit in helping the customers. Think  a little bit outside of the box please and don’t say no out of habit or laziness!

Conclusion

In some future blog post(s) we’ll take a look at what such a network design might look like and why. There is no one size fits all but there are not to many permutations either. In our latest efforts we had been specifically looking into making sure that a single rack failure would not bring down a cluster. So when thinking of the rack as a failure domain we need to spread the cluster nodes across multiple racks in different rows. That means we need the network to provide the connectivity & capability to support this, but more on that later.

SMB Direct RoCE Does Not Work Without DCB/PFC


Introduction

SMB Direct RoCE Does Not Work Without DCB/PFC. “Yes”, you say, “we know, this is well documented. Thank you.” but before you sign of hear me out.

Recently I plugged to RoCE cards into some test servers and linked them to a couple of 10Gbps switches. I did some quick large file copy testing and to my big surprise RDMA kicked in with stellar performance even before I had installed the DCB feature, let alone configure it. So what’s the deal here. Does it work without DCB? Does the card fail back to iWarp? Highly unlikely. I was expecting it to fall back to plain vanilla 10Gbps and not being used at all but it was. A short shout out to Jose Barreto to discuss this helped clarify this.

DCB/PFC is a requirement RoCE

The more busy the network gets the faster the performance will drop. Now in our test scenario we had two servers  for a total of 4 RoCE ports on the network consisting of a beefy 48 port 10Gbps switches. So we didn’t see the negative results of this here.

DCB (Data Center Bridging) and Priority Flow Control are considered a requirement for any kind of RoCE deployment. RDMA with RoCE operates at the Ethernet layer. That means there is no overhead from TCP/IP, which is great for performance. This is the reason you want to use RDMA actually. It also means it’s left on it’s own to deal with Ethernet-level collisions and errors. For that it needs DCB/PFC other wise you’ll run into performance issues due to a ton of retries at the higher network layers.

The reason that iWarp doesn’t require DCB/PCF is that it works at the TCP/IP level also offloaded by using a TCP/IP stack on the NIC instead of the OS. So errors are handled by TCP/IP at a cost: iWarp results in the same benefits as RoCE but it doesn’t scale that well. Not that iWarp performance is lousy, far form! Mind you, for bandwidth management reasons,you’d be better of using DCB or some form of QoS as well.

Conclusion

So no, not configuring  DCB on your servers and the switches isn’t an option, but apparently it isn’t blocked either so beware of this. It might appear to be working fine but it’s a bad idea. Also don’t think it defaults back to iWarp mode, it doesn’t, as one card does one thing not both. There is no shortcut. RoCE RDMA does not work error free out of the box so you do have the install the DCB feature and configure it together with the switches.

SMB 3.0 Multichannel Auto Configuration In Action With RDMA / SMB Direct


Most of you might remember this slide by Jose Barreto on SMB Multichannel  Auto Configuration in one of his many presentations:image

  • Auto configuration looks at NIC type/speed => Same NICs are used for RDMA/Multichannel (doesn’t mix 10Gbps/1Gbps, RDMA/non-RDMA)
  • Let the algorithms work before you decide to intervene
  • Choose adapters wisely for their function

You can fine tune things if and when needed (only do this when this is really the case) but let’s look at this feature in action.

So let’s look at this in real life. For this test we have 2 * X520 DA 10Gbps ports using 10.10.180.8X/24 IP addresses and 2 * Mellanox  10Gbps RDMA adaptors with 10.10.180.9X/24 IP addresses. No teaming involved just multiple NIC ports. Do not that these IP addresses are on different subnet than the LAN of the servers. Basically only the servers can communicate over them, they don’t have a gateway, no DNS servers and are as such not registered in DNS either (live is easy for simple file sharing).

image

Let’s try and copy a 50Gbps fixed VHDX file from server1 to server2 using the DNS name of the target host (pixelated), meaning it will resolve to that host via DNS and use the LAN IP address 10.10.100.92/16 (the host name is greyed out). In the below screenshot you see that the two RDMA capable cards are put into action. The servers are not using  the 1Gbps LAN connection. Multichannel looked at the options:

  • A 1Gbps RSS capable Link
  • Two 10Gbps RSS capable Links
  • Two 10Gbps RDMA capable links

Multichannel concluded the RDMA card is the best one available and as we have two of those it use both. In other words it works just like described.

image

Even if we try to bypass DNS and we copy the files explicitly via the IP address (10.10.180.84)  assigned to the Intel X520 DA cards Multichannel intelligence detects that it has two better cards  that provide RDMA available and as you can see it uses the same NICs  as in the demo before.  Nifty isn’t it Smile

 image

If you want to see the other NICs in action we can disable the Mellanox card and than Multichannel will choose the two X520 DA cards. That’s fine for testing but in real life you need a better solution when you need to manually define what NICs can be used. This is done using PowerShell Smile (take a look at Jose Barrto’s blog The basics of SMB PowerShell, a feature of Windows Server 2012 and SMB 3.0  for more info).

New-SmbMultichannelConstraint –ServerName SERVER2 –InterfaceAlias “SLOT 6 Port 1”, “SLOT 6 Port 2”

This tells a server it can only use these two NICs which in this example are the two Intel X520 DA 10Gbps cards to access Server2. So basically you configure/tell the client what to use for SMB 3.0 traffic to a certain server. Note the difference in send/receive traffic between RDMA/Native 10Gbps.

On Server1, the client you see this:

image

On Server2, the server you see this:

image

Which is indeed the constraint set up as we can verify with:

Get-SmbMultichannelConstraint

image

We’re done playing so let’s clean up all the constraints:

Get-SmbMultichannelConstraint | Remove-SmbMultichannelConstraint

image

Seeing this technology it’s now up to the storage industry to provide the needed  capacity and IOPS I a lot more affordable way. Storage Spaces have knocked on your door, that was the wake up call Winking smile. In an environment where we throw lots of data around we just love SMB 3.0

DELL PowerConnect 8024F Is Now Stackable


A colleague pointed me the latest firmware update (4.2.0.4) for the DELL PowerConnect 8024F switches. As I was reading the release notes one item in particular caught my attention. The PowerConnect 8024/8024f/M8024-k switches are now stackable. You can put up to 6 switches in one stack using the regular front ports (SFP+). You might remember form a previous blog post on 10Gbps, Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4), where I mentioned that we got a great deal on those switches. I also mentioned that the only thing lacking in these switches and what would make this the best 10Gbps switch when comparing value for money is the ability to stack them. I quote myself:

“They could make that 8024F an unbeatable price/quality deal if they would make them stackable.”

I’ve been called visionary before but I won’t go into that that insider joke right now Winking smile. Now it’s for sure that not just my little blog post that made this update happen but it is a nice New Year’s gift. More features & options with hardware you already own is always nice. So I guess a lot of people have made the same observation, both customers & DELL themselves. You could just “smell” by the available command & configuration that this switch could be made stackable and they did.

Is Ethernet based stacking perfect? No (there is very little perfection in this world). The biggest drawback, if you need that feature,  is the fact that you can hot plug the stacking links. But for all other practical purposes it’s a nice deal. Why? Well, now that these switches supports Ethernet based stacking you will be able to choose more types of NIC Teaming to use for your servers. That means those teaming configurations that are dependent on stacking, such as for active-active NIC Teaming across two switches to be more precise. I find this pretty good news.

You all know I’m very enthusiastic to use the NIC Teaming build into Windows 8 and I will use it where and when I can. But there will be for many years to come a lot of Windows 2008 R2 systems to support and install. So it’s always good to see your hardware vendors improving their gear to give you more options. For the pricing I got on the 8024F in the last project and the needs of the solution we could deal with not being able to stack. Stacking via Ethernet using other switches was way more expensive, not even to mention the ones using stacking module ones (real premium pricing). So we got the best deal for our needs.

For 10Gbps switches stacking over Ethernet give you up to 80Gbps with a maximum of 8 uplinks so bandwidth is not as much a concern. With 1Gbps switches it is, which makes stacking modules the only way to go there I think. If you need massive bandwidth and you probably do. The drawback, as with all forms of inter switch links (a LAG for example) is that this method means you’re losing ports for other purposes. But you need to look at your needs and do the math. I think buying with investment protection is good but don’t always buy in preparation for the time you’ll become a fortune 500 company. That takes a while and in the mean while you’ll be very well served anyway.

Another related feature that’s new is Nonstop Forwarding (NSF). NSF allows the forwarding plane of stack units to continue forwarding packets even while the control and management planes restart. This could be a power failure, some hardware of software error or even an upgrade. This feature is common to all stackable switches as far as I know and is needed. Not that ‘m saying the redundant loop in stack is bad or overkill, far from it, but that takes care of other scenarios that NSF is designed to handle.

Carsten Rachfahl interviews me on 10Gbps networking with Hyper-V


I was in Frankfurt Germany last week at The Experts Conference 2011 Europe (http://www.theexpertsconference.com/europe/2011/). I met up with a lot of great IT professionals from the on line community as you can read in a previous blog post . The video interview that was mentioned in that post is now on line. It’s a very good quality one and Carsten Rachfahl made an excellent interviewer who manages to make the entire process both educational and enjoyable.

The results are available for all to view right here on Carsten’s site (in German) where you can also find his other blog posts http://www.hyper-v-server.de/videos/videointerview-mit-didier-van-hoye-zu-hyper-v-und-10gbit/

Optimizing Live Migrations with a 10Gbps Network in a Hyper-V Cluster


Introduction

You’ll find the following recommendations on line about optimizing Live Migrations:

  1. Use bigger pipes (10Gbps is better than 1Gbps)
  2. Enable Jumbo Frames
  3. Up the Receive Buffer to 8192 (Exchange 2010 virtualization recommendation for Live Migration)

As we’ve been building Hyper-V Cluster since the early betas let me share some experiences with this. For the curious I used Intel® Ethernet X520 SFP+ Direct Attach Server Adapters & DELL PowerConnect 8024F 10Gbps switches for my testing. See my blog posts on considerations about the use of 10Gbps in Hyper-V clusters here:

  1. Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment (Part 1/4)
  2. Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)
  3. Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)
  4. Introducing 10Gbps & Integrating It Into Your Network Infrastructure (Part 4/4)

Bigger pipes are better

On bigger pipes I can only say that if you can afford them and need them you should get them. End of discussion.

Jumbo frames rock

Jumbo frames help out a lot (+/- 20 %), especially with the larger memory virtual machines.

The golden nugget

So far so good, but there is one golden nugget of information I want to share. There is little trip wire that can prevent you from getting your optimal performance. Advanced power settings in the BIOS. If you read my blogs you might have come across a blog post Consider CPU Power Optimization Versus Performance When Virtualizing and I encourage you to go and read that post as it holds a lot of good info but also is very relevant to this post. Because we have yet another reason to make sure your BIOS is set right to achieve a decent return on investment in quality hardware.

In our experience those power saving settings, the C states and the C1 states are also not very helpful when it comes to Live Migration & such. I got from a meager 20% bandwidth use all the way up to 35-45% at best with jumbo frames enabled and the power settings set to ”Full Power”. A lot better but still not very impressive.

clip_image002

Now go ahead and disable the C states AND the C1E state to achieve 55% to 65%.

clip_image004

Now the speed of a live migration varies greatly between virtual machines that are idle or running a full load, both CPU & memory wise. It also depends on the load the host you’re migrating from and to, but this impact is less when you disable those advance CPU power settings.

Look at the following screen shots

clip_image005

A SQL Server with 50GB of RAM being live migrated over 10Gbps. Jumbo frames enabled, Power Settings optimized but with C1E & C States enabled.

clip_image006

A SQL Server with 50GB of RAM being live migrated over 10Gbps. Jumbo frames enabled, Power Settings optimized but with C1E & C States disabled.

The live migration of this virtual SQL Server takes between 74-78 seconds. Not bad!

image

By the way these settings also help with 1Gbps but there is isn’t as spectacular. You use 99% instead of 75-80% of you bandwidth. And improvement yes, but not on the same scale as with 10Gbps for speeding up Live Migrations.

As you can see in this post on the TechNet support groups, this seems to be a common occurrence. It’s not just me who’s seeing things: Live Migration on 10GbE only 16%. even Dell chimed in there confirming these findings in their labs.

Receive Buffer

There is one setting that’s been advised for Exchange 2010 virtualization with Hyper-V that I have not seen improve speeds and that’s upping the Receive Buffer 8192. You can read this in Best Practices for Virtualizing Exchange Server 2010 with Windows Server® 2008 R2 Hyper V™. In some cases I tested this even reduces the results, especially when you have C1E & C states enabled. It is also a confusing recommendation as they state to set the Receive Buffer to 8192 .This value however is dependent on the NIC type and driver so you might only be able to set it to 4096 or so. The guidance should state to set it as high a possible but I have not seen any benefits. Do mind that I did not test this with a Hyper-V cluster running a virtualized Exchange 2010 guest. Your mileage may vary. Trust but verify is the age old adagio. Also keep in mind I’m running 10Gbps, so the effect of this setting might be not be what it could do for a 1Gbps network, but on the whole I’m not convinced. If you implement all other recommendations you’ll saturate a 1Gbps already.

What does this mean?

The sad news is that in virtual environments or other high performance configurations the penguins have to give way to performance. I wish it was different but unfortunately it isn’t.

By the way, this is vendor agnostic. You’ll see this with HP, DELL, CISCO in all form factors whether they are tower, rack or blade servers. The main thing you need to make sure is that the BIOS allows you to disable the C States en power settings. Not all vendors/BIOS version allow for this I read so make sure you check this. Some CISCO blades have annoying on this front, ruining the performance of VDI projects with less than optimal CPU performance but they have released an updated BIOS now to fix this.

Look, it makes no sense saving on power if it means you’ll by more servers to compensate for the lack of performance per unit. In my honest opinion a lot of all the hardware optimizations are awesome but they still have a long way to go in making sure it doesn’t incur such a hit even on performance. Right sizing servers in number & type of CPU, power supplies etc. still seems the best way to avoid waiting energy and money. Buying more power than needed and counting on the power consumption optimizations to reduce operating cost can be a good idea to protecting your investment for expected future increases in resource demand within the service life of your hardware. On average that is 3 to 5 years depending on the environment & needs.

Conclusion

Three things are needed for lightning fast Live Migrations:

  1. Bandwidth. Hence the 10Gbps network. There is no substitute for bigger pipes.
  2. Jumbo Frames. Configure them right & you’ll reap the benefits
  3. Disable C1E& C states. Also Configure your servers power options for maximum performance.
  4. I have not been able to confirm the receive buffer has a big impact on Live Migration speed or does any good at all. Test this to find out if it works for you

Remember that you’ll be able to do multiple Live Migrations in parallel with Windows 8. So a 10Gbps pipe will be used at full capacity then. Being able to use more networks for Live Migration will only increase the capability to evacuate a host fast or to move virtual machines for load balancing across a cluster. If you look at the RDMA, infiniband, 40/100Gbps evolutions becoming available in the next 12 to 36 months 10Gbps will become a lot more mainstream while at the same time the options for network connectivity will become more diversified. 10Gbps prices are dropping but for the moment they do remain high enough to keep people away.

Introducing 10Gbps & Integrating It into Your Network Infrastructure (Part 4/4)


This is a 4th post in a series of 4. Here’s a list of all parts:

  1. Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment (Part 1/4)
  2. Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)
  3. Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)
  4. Introducing 10Gbps & Integrating It Into Your Network Infrastructure (Part 4/4)

In my blog post “Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)” in a series of thoughts on 10Gbps and Hyper-V networking a discussion on NIC teaming brought up the subject of 10Gbps for virtual machine networks. This means our switches will probably no longer exist in isolation unless those virtual Machines don’t ever need to talk to anything outside what’s connected to those switches. This is very unlikely. That means we need to start thinking and talking about integrating the 10Gbps switches in our network infrastructure. So we’re entering the network engineers their turf again and we’ll need to address some of their concerns. But this is not bad news as they’ll help us prevent some bad scenarios.

Optimizing the use of your 10Gbps switches

As not everyone runs clusters big enough, or enough smaller clusters, to warrant an isolated network approach for just cluster networking. As a result you might want to put some of the remaining 10Gbps ports to work for virtual machine traffic. We’ve already pointed out that your virtual machines will not only want to talk amongst themselves (it’s a cluster and private/internal networks tend to defeat the purpose of a cluster, it just doesn’t make any sense as than they are limited within a node) but need to talk to other servers on the network, both physical and virtual ones. So you have to hook up your 10Gbps switches from the previous example to the rest of the network. Now there are some scenarios where you can keep the virtual machine networks isolated as well within a cluster. In your POC lab for example where you are running a small 100% virtualized test domain on a cluster in a separate management domain but these are not the predominant use case.

But you don’t only have to have to integrate with the rest of your network, you may very well want to! You’ve seen 10Gbps in action for CSV and Live Migration and you got a taste for 10Gbps now, you’re hooked and dream of moving each and every VM network to 10Gbps as well. And while your add it your management network and such as well. This is nothing different from the time you first got hold of 1Gbps networking kit in a 100 Mbps world. Speed is addictive, once you’re hooked you crave for more Smile

How to achieve this? You could do this by replacing the existing 1Gbps switches. That takes money, no question about it. But think ahead, 10Gbps will be common place in a couple of years time (read prices will drop even more). The servers with 10Gbps LOM cards are here or will be here very soon with any major vendor. For Dell this means that the LOM NICs will be like mezzanine cards and you decide whether to plug in 10Gbps SPF+ or Ethernet jacks. When you opt to replace some current 1Gbps switches with 10gbps ones you don’t have to throw them away. What we did at one location is recuperate the 1Gbps switches for out of band remote access (ILO/DRAC cards) that in today’s servers also run at 1Gbps speeds. Their older 100Mbps switches where taken out of service. No emotional attachment here. You could also use them to give some departments or branch offices 1gbps to the desktop if they don’t have that yet.

When you have ports left over on the now isolated 10Gbps switches and you don’t have any additional hosts arriving in the near future requiring CSV & LM networking you might as well use those free ports. If you still need extra ports you can always add more 10Gbps switches. But whatever the case, this means up linking those cluster network 10Gbps switches to the rest of the network. We already mentioned in a previous post the network people might have some concerns that need to be addressed and rightly so.

Protect the Network against Loops & Storms

The last thing you want to do is bring down your entire production network with a loop and a resulting broadcast storm. You also don’t want the otherwise rather useful spanning tree protocol, locking out part of your network ruining your sweet cluster setup or have traffic specifically intended for your 10Gbps network routed over a 1Gbps network instead.

So let us discuss some of the ways in which we can prevent all these bad things from happening. Now mind you, I’m far from an expert network engineer so to all CCIE holders stumbling on to this blog post, please forgive me my prosaic network insights. Also keep in mind that this is not a networking or switch configuration course. That would lead us astray a bit too far and it is very dependent on your exact network layout, needs, brand and model of switches etc.

As discussed in blog post Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4) you need a LAG between your switches as the traffic for the VLANs serving heartbeat, CSV, Live Migration, virtual machines but now also perhaps the host management and optional backup network must flow between the switches. As long as you have only two switches that have a LAG between them or that are stacked you have not much risk of creating a loop on the network. Unless you uplink two ports directly with a network cable. Yes, that happens, I once witnessed a loop/broadcast storm caused by someone who was “tidying up” spare CAT5E cables buy plugging all ends up into free port switches. Don’t ask. Lesson learned: disable every switch port not in use.

Now once you uplink those two or more 10Gbps switches to your other switches in a redundant way you have a loop. That’s where the Spanning Tree protocol comes in. Without going into detail this prevents loops by blocking the redundant paths. If the operational path becomes unavailable a new path is established to keep network traffic flowing. There are some variations in STP. One of them is Rapid Spanning Tree Protocol (RSTP) that does the same job as STP but a lot faster. Think a couple of seconds to establish a path versus 30 seconds or so. That’s a nice improvement over the early days. Another one that is very handy is the Multiple Spanning Tree Protocol (MSTP). The sweet thing about the latter is that you have blocking per VLANs and in the case of Hyper-V or storage networks this can come in quite handy.

Think about it. Apart from preventing loops, which are very, very bad, you also like to make sure that the network traffic doesn’t travel along unnecessary long paths or over links that are not suited for its needs. Imagine the Live Migration traffic between two nodes on different 10Gbps switches travelling over the 1Gbps uplinks to the 1Gbps switches because the STP blocked the 10Gbps LAG to prevent a loop. You might be saturating the 1Gbps infrastructure and that’s not good.

I said MSTP could be very handy, let’s address this. You only need the uplink to the rest of the network for the host management and virtual machine traffic. The heartbeat, CSV and Live Migration traffic also stops flowing when the LAG between the two 10Gbps switches is blocked by the RSTP. This is because RSTP works on the LAG level for all VLANs travelling across that LAG and doesn’t discriminate between VLAN. MSTP is smarter and only blocks the required VLANS. In this case the host management and virtual machines VLANS as these are the only ones travelling across the link to the rest of the network.

We’ll illustrate this with some picture based on our previous scenarios. In this example we have the cluster network going to the 10Gbps switches over non teamed NICs. The same goes for the virtual machine traffic but the NICs are teamed, and the host management NICS. Let’s first show the normal situation.

 clip_image002

Now look at a situation where RSTP blocks the purple LAG. Please do note that if the other network switches are not 10Gbps that traffic for the virtual machines would be travelling over more hops and at 1Gbps. This should be avoided, but if it does happens, MSTP would prevent an even worse scenario. Now if you would define the VLANS for cluster network traffic on those (orange) uplink LAGs you can use RSTP with a high cost but in the event that RSTP blocks the purple LAG you’d be sending all heartbeat, CSV and Live Migration traffic over those main switches. That could saturate them. It’s your choice.

clip_image004

In the picture below MSTP saves the day providing loop free network connectivity even if spanning tree for some reasons needs to block the LAG between the two 10Gbps switches. MSTP would save your cluster network traffic connectivity as those VLAN are not defined on the orange LAG uplinks and MSTP prevents loops by blocking VLAN IDs in LAGs not by blocking entire LAGs

clip_image006

To conclude I’ll also mention a more “star like” approach in up linking switches. This has a couple of benefits especially when you use stackable switches to link up to. They provide the best bandwidth available for upstream connections and they provide good redundancy because you can uplink the lag to separate switches in the stack. There is no possibility for a loop this way and you have great performance on top. What’s not to like?

clip_image008

Well we’ve shown that each network setup has optimal, preferred network traffic paths. We can enforce these by proper LAG & STP configuration. Other, less optimal, paths can become active to provide resiliency of our network. Such a situation must be addressed as soon as possible and should be considered running on “emergency backup”. You can avoid such events except for the most extreme situations by configuring the RSTP/MSTP costs for the LAG correctly and by using multiple inter switch links in every LAG. This does not only provide for extra bandwidth but also protects against cable or port failure.

Conclusion

And there you have it, over a couple of blog posts I’ve taken you on a journey through considerations about not only using 10Gbps in your Hyper-V cluster environments, but also about cluster networks considerations on a whole. Some notes from the field so to speak. As I told you this was not a deployment or best practices guide. The major aim was to think out loud, share thoughts and ideas. There are many ways to get the job done and it all depends on your needs an existing environment. If you don’t have a network engineer on hand and you can’t do this yourself; you might be ready by now to get one of those business ready configurations for your Hyper-V clustering. Things can get pretty complex quite fast. And we haven’t even touched on storage design, management etc.. The purpose of these blog was to think about how Hyper-V clustering networks function and behave and to investigate what is possible. When you’re new to all this but need to make the jump into virtualization with both feet (and you really do) a lot help is available. Most hardware vendors have fast tracks, reference architectures that have a list of components to order to build a Hyper-V cluster and more often than not they or a partner will come set it all up for you. This reduces both risk and time to production. I hope that if you don’t have a green field scenario but want to start taking advantage of 10Gbps networking; this has given you some food for thought.

I’ll try to share some real life experiences,what improvements we actually see, with 10Gbps speeds in a future blog post.

Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)


This is a 3th post in a series of 4. Here’s a list of all parts:

  1. Introducing 10Gbps Networking In Your Hyper-V Failover Cluster Environment (Part 1/4)
  2. Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)
  3. Introducing 10Gbps & Thoughts On Network High Availability For Hyper-V (Part 3/4)
  4. Introducing 10Gbps & Integrating It Into Your Network Infrastructure (Part 4/4)

As you saw in my previous blog post “Introducing 10Gbps With A Dedicated CSV & Live Migration Network (Part 2/4)” we created an isolated network for Hyper-V cluster networking needs, i.e. Heartbeat, Cluster Shared Volume and Live Migration traffic. When you set up failover clustering you’re doing so to achieve some level of high availability. We did this by using 2 switches and setting up redundant paths to them, making use of the fault tolerance the cluster networks offer us. The darks side of high availability is that is always exposes the next single point of failure and when it comes to networking that means you’ll need redundant NICs, NIC ports, cabling and switches. That’s what we’ll discuss in this blog post. All the options below are just that. There is never an obligation to use them everywhere and it might be not needed depending on the type of network and the business needs we’re talking about. But one thing I have learned is to build options into your solutions. You want ways and opportunities to work around issues while you fix them.

Redundant Switches

The first thing you’ll need to address is the loss of a switch. The better ones have redundant power supplies but that’s about it. So you’ll need to have (at least) two switches and make sure you have redundant connections to both switches. That implies both switches can talk to each other as they form one functional unit even when it is an isolated network as in our example.

One of the ways we can achieve this is by setting up a Link Aggregation Group (LAG) over Inter Switch Links (ISL). The LAG makes all the connections available between the switches for the VLANs you define. There are different types of LAG but one of the better ones is a LAG with LACP.

Stacking your switches might also be a solution if they support that. You might need stacking modules for that. Basically this turns two or more switches into one big switch. One switch in the stack acts as the master switch that maintains the entire stack and provides a single configuration and monitoring point. If a switch in the stack fails the remaining switches will bypass the failed switch via the stacking modules. Depending on the quality of your network equipment you can have some disruption during a the failure of the master switch as then another switch needs to take on that role and this can take anything between 3 seconds and a minute depending on vendor, type, firmware, etc. Network people like this. And as each switch contains the entire the stack configuration it’s very easy to replace a dead switch in a stack. Just rip out the dead one, plug in the replacement one and the stack will do the rest.

We note that more people have access to switches that can handle LAGs versus those who have stackable ones. The reason for this is that the latter tend to be more pricy.

Redundant Network Cards & Ports

Now whether you’re using LAGs or stacking the idea is that you connect your NICs to different switches for redundancy. The question is do we need to do something with the NIC configuration or not to benefit from this? Do we have redundancy in via a cluster wide virtual switch or not? If not can we use NIC teaming? Is NIC teaming always needed or a good idea? Ok, let’s address some of these questions.

First of all, Hyper-V in the current Windows Server 2008 SP1 version has no cluster wide virtual switch that can provide redundancy for your virtual machine network(s). But please allow me to dream about Hyper-V 3.0. To achieve redundancy for the virtual machine networks you’ll need to turn to NIC teaming. NIC teaming has various possible configurations depending on vendor and the capabilities of the switches in use. You might be familiar with terminology like Switch Fault Tolerance (SFT), Adaptive Fault Tolerance (AFT), Link Aggregation Control protocol (LACP), VM etc. Apart from all that the biggest thing to remember is that NIC teaming support has to come from the hardware vendor(s). Microsoft doesn’t support it directly for Hyper-V and Hyper-V gets assess to a NIC team NIC via the Windows operating system.

On NIC Teaming

I’m going to make a controversial statement. NIC teaming can be and is often a cause of issues and it can expensive in time to both set up and fix if it fails. Apart from a lot of misconceptions and terminology confusion with all the possible configurations we have another issue. NIC teaming introduces complexity with drivers & software that is at least a hundred fold more likely to cause failures than today’s high quality network cards. On top of that sometimes people forget about the proper switch configuration. Ouch!

Do a search on Hyper-V and NIC teaming and you’ll see the headaches it causes so many people. Do you need to stay away from it? Is it evil? No, I’m not saying that. Far from that, NIC teaming is great. You need to decide carefully where and when to use it and in what form. Remember when you can handle & manage the complexity need to achieve high availability, generally speaking you’re good to go. If complexity becomes a risk in itself, you’re on the wrong track.

Where do I stand on NIC teaming? Use it when it really provides the benefits you seek. Make sure you have the proper NICs, Switches and software/drivers for what you’re planning to do. Do your research and test. I’ve done NIC Teaming that went so smooth I never would have realized the headaches it can give people. I’ve done NIC teaming where buggy software and drivers drove me crazy.

I’d like to mention security here. Some people tend to do a lot of funky, tedious configurations with VLANS in an attempt to enhance security. VLANS are not security mechanisms. They can be used in a secure implementation but by themselves they achieve nothing. If you’re doing this via NIC teaming/VLANS I’d like to note that once someone has access to your Hyper-V management console and /or the switches you’re toast. Logical and physical security cannot be replaced or ignored.

NIC Teaming To Enhance Throughput

You can use NIC teaming enhance bandwidth/throughput. If this is you major or only goal, you might not even be worried about using multiple switches. Now NIC teaming does help to provide better bandwidth but, sure but nothing beats buying 10Gbps switches & NICs. Really, switches with LAGs or stack and NIC Teaming are great but bigger pipes are always better for raw throughput. If you need twice or quadruple the ports only for extra bandwidth this gets expensive very fast. And if, on top of that, you need consultants because you don’t have a network engineer to set it all up just for that purpose, save your money and invest it in hardware.

NIC Teaming For Redundancy

Do you use NIC teaming for redundancy? Yes, this is a very good reason when it fits the needs. Do you do this for all networks? No, it depends. Just for heartbeat, CSV & Live migration traffic it might be overkill. The nature of these networks in a Hyper-V Cluster is such that you don’t really need it as they can mutually provide redundancy for each other. But what if a NIC port fails when I’m doing a live migration? Won’t that mean the live migration will fail? Yes. But once the NIC is out of the picture Live Migration will just work over the CSV network if you set it up that way. And you’re back in business while you fix the issue. Have I seen live migration fail? Yes, sure. But it never left the VM messed up, that kept running. So you fix the issue and Live Migrate it again.

The same goes for the other networks. CSV should not give you worries. That traffic gets queued and send to the next available network available for CSV. Heart beat is also not an issue. You can afford the little “down time” until it is sent over the next available network for cluster communication. Really a properly set up cluster doesn’t go down when a cluster networks fails if you have multiple of them.

But NIC teaming could/would prevent even this ever so slight interruption you say. It can, yes, depending on how you set this up, so not always by definition. But it’s not needed. You’re preventing something benign at great cost. Have you tested it? Is it always a lossless, complete transparent failover? No a single packet dropped? Not one ping failed? If so, well done! At what cost and for what profit did you do it? How often do your NICs and switch ports fail? Not very often. Also remember the extra complexity and the risk of (human) configuration errors. As always, trust but verify, testing is your friend.

Paranoia Is Your Friend

If you set up NIC teaming without separate NIC cards (not ports) and the PCIe slot goes bad NIC teaming won’t save you. So you need multiple network cards. On top of that, if you decide to run all networks over that team you put all your eggs in that one basket. So perhaps you might need 2 teams distributed over multiple NIC cards. Oh boy redundancy and high availability do make for expensive setups.

Combine NIC Teaming & VLANS Work Around Limited NIC Ports

This can be a good idea. As you’ll be pushing multiple networks (VLANs) over the same pipe you want redundancy. So NIC teaming here can definitely help out. You’ll need to consider the amount of network traffic in this case as well. If you use load balancing NIC teaming you can get some extra bandwidth, but don’t expect miracles. Think about the potential for bottle necks, QoS and try to separate bandwidth hogs on separate teams. And remember, bigger pipes are always better, so consider 10Gbps when you are in a bandwidth crunch.

Don’t Forget About The Switches

As a friendly reminder about what we already mentioned above, don’t forget to use different switches for up linking the NIC ports. If you do forget your switch is the single point of failure (SPOF). Welcome to high availability: always hitting the net SPOF and figuring out how big the risk is versus the cost in money and complexity to deal with it J. Switches don’t often fail but I’ve seen sys admins pull out the wrong PDU cables. Yes human error lures in all corners in all possible variations. I know this would never happen to you, and certainly not twice, but other people are not so skillful. And for those who’d rather be lucky than good I have bad news. Luck runs out. Inevitably bad things happen to all of our systems.

Some Closings Thoughts

One rule of thumb I have is not to use NIC teaming to save money by reducing NIC Cards, NIC ports, number of switches or switch ports. Use it when it serves your needs and procure adequate hardware to achieve your goals. You should do it because you have a real need to provide the absolute best availability and then you put down the money to achieve it. If you talk the talk, you need to walk the walk. And while not the subject of this post, your Active Directory or other core infrastructure services are not single points of failure , are they? Winking smile

If you do want to use it to save money or work around a lack of NIC ports, there is nothing wrong with that, but say so and accept the risk. It’s a valid decision when you have you have your needs covered and are happy with what that solution provides.

When you take all of this option into consideration, where do you end with NIC teaming and network solutions for Hyper-V clusters? You end up with the “Business ready” or “reference architecture” offered by DELL or HP. They weigh all pros and cons against each other and make a choice based on providing the best possible solution for the largest number of customers at acceptable costs. Is this the best for you? That could very well be. It all depends. They make pretty good configurations.

I tend to use NIC teaming only for the Virtual Machine networks. That’s where the biggest potential service interruption exists. I have in certain environments when NIC teaming was something that was not chosen mediated that risk by providing 2 or 3 single NIC for 2 or 3 virtual networks in Hyper-V. That reduced the impact to 1/3 of the virtual machines. And a fix for a broken NIC is easy; just attach the VMs to a different virtual network. You can do this while the virtual machines are running so no shutdown is required. As an added benefit you balance the network traffic over multiple NICs.

10Gbps with NIC teaming and VLANs provide for some very nice scenarios. This is especially true especially if you have bandwidth hungry applications running in boatload of VMs. This all means that we need to start thinking and talking about integrating the 10Gbps switches in our network infrastructure. So that means we’re entering the network engineers their turf and we’ll need to address some of their concerns. But this is not bad news as they’ll help us prevent some bad scenarios. But that will be discussed in a next blog post.

10Gbps Bragging Right Nothing More


Well, when you get to play with some 10Gbps network gear, experiment a little you see some pretty nice file copy transfer times. But things are not very consistent. “It all depends”. So do experiment a lot & test things out. At a given moment you get rewarded with this:

image

That’s right.The most successful experiment was 61% of a 10Gbps pipe used and the file transfer speed for a 25GB vhd file was in one word amazing. We used jumbo frames and we disabled (same old story) except for the fact that this kind of throughput turns the receiving server in shell shocked piece of hardware. During the 35 seconds it could get anything else done, bar for those poor disks. Perhaps you’ll want to throttle it down a little in a production environment Winking smile