Design Considerations For Converged Networking On A Budget With Switch Independent Teaming In Windows Server 2012 Hyper-V


Last Friday I was working on some Windows Server 2012 Hyper-V networking designs and investigating the benefits & drawbacks of each. Some other fellow MVPs were also working on designs in that area and some interesting questions & answers came up (thank you Hans Vredevoort for starting the discussion!)

You might have read that for low cost, high value 10Gbps networks solutions I find the switch independent scenarios very interesting as they keep complexity and costs low while optimizing value & flexibility in many scenarios. Talk about great ROI!

So now let’s apply this scenario to one of my (current) favorite converged networking designs for Windows Server 2012 Hyper-V. Two dual NIC LBFO teams. One to be used for virtual machine traffic and one for other network traffic such as Cluster/CSV/Management/Backup traffic, you could even add storage traffic to that. But for this particular argument that was provided by Fiber Channel HBAs. Also with teaming we forego RDMA/SR-IOV.

For the VM traffic the decision is rather easy. We go for Switch Independent with Hyper-V Port mode. Look at Windows Server 2012 NIC Teaming (LBFO) Deployment and Management to read why. The exceptions mentioned there do not come into play here and we are getting great virtual machine density this way. With lesser density 2-4 teamed 1Gbps ports will also do.

But what about the team we use for the other network traffic. Do we use Address hash or Hyper-V port mode. Or better put, do we use native teaming with tNICs as shown below where we can use DCB or Windows QoS?

image

Well one drawback here with Address Hash is that only one member will be used for incoming traffic with a switch independent setup. Qos with DCB and policies isn’t that easy for a system admin and the hardware is more expensive.

So could we use a virtual switch here as well with QoS defined on the Hyper-V switch?

image

Well as it turns out in this scenario we might be better off using a Hyper-V Switch with Hyper-V Port mode on this Switch independent team as well. This reaps some real nice benefits compared to using a native NIC team with address hash mode:

  • You have a nice load distribution of the different vNIC’s send/receive traffic over a single member of the NIC team per VM. This way we don’t get into a scenario where we only use one NIC of the team for incoming traffic. The result is a better balance between incoming and outgoing traffic as long an none of those exceeds the capability of one of the team members.
  • Easy to define QoS via the Hyper-V Switch even when you don’t have network gear that supports QoS via DCB etc.
  • Simplicity of switch configuration (complexity can be an enemy of high availability & your budget).
  • Compared to a single Team of dual 10Gbps ports you can get a lot higher number of VM density even they have rather intensive network traffic and the non VM traffic gets a lots of bandwidth as well.
  • Works with the cheaper line of 10Gbps switches
  • Great TCO & ROI

With a dual 10Gbps team you’re ready to roll. All software defined. Making the switches just easy to use providers of connectivity. For smaller environments this is all that’s needed. More complex configurations in the larger networks might be needed high up the stack but for the Hyper-V / cloud admin things can stay very easy and under their control. The network guys need only deal with their realm of responsibility and not deal with the demands for virtualization administration directly.

I’m not saying DCB, LACP, Switch Dependent is bad, far from. But the cost and complexity scares some people while they might not even need. With the concept above they could benefit tremendously from moving to 10Gbps in a really cheap and easy fashion. That’s hard (and silly) to ignore. Don’t over engineer it, don’t IBM it and don’t go for a server rack phD in complex configurations. Don’t think you need to use DCB, SR-IOV, etc. in every environment just because you can or because you want to look awesome. Unless you have a real need for the benefits those offer you can get simplicity, performance, redundancy and QoS in a very cost effective way. What’s not to like. If you worry about LACP etc. consider this, Switch independent mode allows for nearly no service down time firmware upgrades compared to stacking. It’s been working very well for us and avoids the expense & complexity of vPC, VLT and the likes of that. Life is good.

Windows Hyper-V Server 2012 Live Migration DOES support pass-through disks–KB2834898 is Wrong


See update in yellow in line (April 11th 2013)

I recently saw KB2834898 (pulled) appear and it’s an important one. This fast publish statement is important as until recently it was accepted that Live Migration with pass through disks was supported with Windows Server 2012 Hyper-V Live Migration (just like with Windows Server 2008 R2 Hyper-V) as long as the live migration is managed by the Hyper-V cluster, i.e. the pass through disk is a clustered resource => see http://social.technet.microsoft.com/wiki/contents/articles/440.hyper-v-how-to-add-a-pass-through-disk-on-a-failover-cluster.aspx

UPDATE April 11th 2013: Now after consulting some very knowledgeable people at Microsoft (like Jeff Woolsey and Ben Armstrong) this KB article is not factual correct and leaves much to be desired. It’s wrong, as pass-through disks are still supported  with Live Migration in Windows Server 2012 Hyper-V, when managed by the cluster, just like before in Windows 2008 R2. The KB article has been pulled meanwhile.

Mind you that Shared Nothing Live Migration with pass through disks have never been supported as there is no way to move the pass through disk between hosts. Storage Live Migration is not really relevant in this scenario either, there are no VHDX file to copy apart fro the OS VHDX. Live migrations between stand alone host are equally irrelevant. Hence it’s a Hyper-V Cluster game only for pass through disks.

I have never been a fan of pass through disks and we have never used them in production. Not in the Windows Server 2008 R2 era let alone in the Windows Server 2012 time frame. No really we never used them, not even in our SQL Server virtualization efforts as we just don’t like the loss of flexibility of VHDX files and due to the fact that they tend to complicate things (i.e. things fail like live migration).

I advise people to strongly reconsider if they think they need them and only to use them if they are really sure they actually do have a valid use case. I know some people had various reasons to use them in the past but I have always found them to be a bit of over engineering. One of the better reasons might have been that you needed disks larger then 2TB but than I would advise iSCSI and now with Windows Server 2012 also virtual Fibre Channel (vFC), which is however not needed due to VHDX now supporting up to 64TB in size. Both these options support Live Migration and are useful for in guest clustering, but not as much for size or performance issues in Windows Server 2012 Hyper-V. On the performance side of things we might have eaten a small IO hit before in lieu of the nice benefits of using VHDs. But even a MSFT health check of our Virtualized SQL Server environment didn’t show any performance issues, Sure your needs may be different from ours but the performance argument with Windows Server 2012 and VHDX can be laid to rest. I refer you to my blog Hyper-V Guest Storage Performance: Above & Beyond 1 Million IOPS for more information of VHDX performance improvements and to Windows Server 2012 with Hyper-V & The New VHDX Format Leads The Way for VHDX capabilities in general (size, unmap, …).

Is see only one valid reason why you might have to use them today. You have  > 2TB disks in the VM and your backup vendor doesn’t support the VHDX format. Still a reality today unfortunately Annoyed But that can be fixed by changing to another one Winking smile

Belgian TechDays 2013 Sessions Are On Line


Just a short heads up to let you all know that the sessions of the TecDays 2013 in Belgium are available on the TechNet site. The slide decks can be found on http://www.slideshare.net/technetbelux

In case you want to see my two sessions you can follow these links:

Now there are plenty more good sessions so I encourage you to browse and have a look. Kurt Roggen his session on PowerShell is a great one to start with.

Windows Server 2012 NIC Teaming Mode “Independent” Offers Great Value


There, I said it. In switching, just like in real life, being independent often beats the alternatives. In switching that would mean stacking. Windows Server 2012 NIC teaming in Independent mode, active-active mode makes this possible. And if you do want or need stacking for link aggregation (i.e. more bandwidth) you might go the extra mile and opt for  vPC (Virtual Port Channel a la CISCO) or VTL (Virtual Link Trunking a la Force10 – DELL).

What, have you gone nuts? Nope. Windows Server 2012 NIC teaming gives us great redundancy with even cheaper 10Gbps switches.

What I hate about stacking is that during a firmware upgrade they go down, no redundancy there. Also on the cheaper switches it often costs a lot of 10Gbps ports (no dedicated stacking ports). The only way to work around this is by designing your infrastructure so you can evacuate the nodes in that rack so when the stack is upgraded it doesn’t affect the services. That’s nice if you can do this but also rather labor intensive. If you can’t evacuate a rack (which has effectively become your “unit of upgrade”) and you can’t afford the vPort of VTL kind of redundant switch configuration you might be better of running your 10Gbps switches independently and leverage Windows Server 2012 NIC teaming in a switch independent mode in active active. The only reason no to so would be the need for bandwidth aggregation in all possible scenarios that only LACP/Static Teaming can provide but in that case I really prefer vPC or VLT.

Independent 10Gbps Switches

Benefits:

  • Cheaper 10Gbps switches
  • No potential loss of 10Gbps ports for stacking
  • Switch redundancy in all scenarios if clusters networking set up correctly
  • Switch configuration is very simple

Drawbacks:

  • You won’t get > 10 Gbps aggregated bandwidth in any possible NIC teaming scenario

Stacked 10Gbps Switches

Benefits:

  • Stacking is available with cheaper 10Gbps switches (often a an 10Gbps port cost)
  • Switch redundancy (but not during firmware upgrades)
  • Get 20Gbps aggregated bandwidth in any scenario

Drawbacks:

  • Potential loss of 10Gbps ports
  • Firmware upgrades bring down the stack
  • Potentially more ‘”complex” switch configuration

vPC or VLT 10Gbps Switches

Benefits:

  • 100% Switch redundancy
  • Get > 10Gbps aggregated bandwidth in any possible NIC team scenario

Drawbacks:

  • More expensive switches
  • More ‘”complex” switch configuration

So all in all, if you come to the conclusion that 10Gbps is a big pipe that will serve your needs and aggregation of those via teaming is not needed you might be better off with cheaper 10Gbps leverage Windows Server 2012 NIC teaming in a switch independent mode in active active configuration. You optimize 10Gbps port count as well. It’s cheap, it reduces complexity and it doesn’t stop you from leveraging Multichannel/RDMA.

So right now I’m either in favor of switch independent 10Gbps networking or I go full out for a vPC (Virtual Port Channel a la CISCO) or VTL (Virtual Link Trunking a la Force10 – DELL) like setup and forgo stacking all together. As said if you’re willing/capable of evacuating all the nodes on a stack/rack you can work around the drawback. The colors in the racks indicate the same clusters. That’s not always possible and while it sounds like a great idea, I’m not convinced.

image

When the shit hits the fan … you need as little to worry about as possible. And yes I know firmware upgrades are supposed to be easy and planned events. But then there is reality and sometimes it bites, especially when you cannot evacuate the workload until you’re resolved a networking issue with a firmware upgrade Confused smile Choose your poison wisely.

Hyper-V Cluster Node Pause & Drain fails – Live Migrations fail with “The requested operation cannot be completed because a resource has locked status”


One night I was doing some maintenance on a Hyper-V cluster and I wanted to Pause and drain one of the nodes that was up next for some tender loving care. But I was greeted by some messages:

image

[Window Title]
Resource Status

[Main Instruction]
The requested operation cannot be completed because a resource has locked status.

[Content]
The requested operation cannot be completed because a resource has locked status.

[OK]

Strange, the cluster is up and running, none of the other nodes had issues and operational wise all VMs are happy as can be. So what’s up? Not to much in the error logs except for this one related to a backup. Aha …We fire up disk part and see some extra LUNs mounted + using “vssadmin list writers“ we find:

clip_image002

 

 

Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Unexpected error

Bingo! Hello old “friend”, I know you! The Microsoft Hyper-V VSS Writer goes into an error state during the making of hardware snapshots of the LUNs due to almost or completely full partitions inside the virtual machines. Take a look at this blog post on what causes this and how to fix fit. As a result we can’t do live migrations anymore or Pause/Drain the node on which the hardware snapshots are being taken.

And yes, after fixing the disk space issue on the VM (a SDT who’s pumped the VM disks 99.999% full) the Hyper-V VSS writer get’s out of the error state and the hardware provider can do it’s thing. After the snapshots had completed everything was fine and I could continue with my maintenance.

PowerShell: Monitoring DrainStatus of a Hyper-V Host & The Time Limited Value of Information In Beta & RC Era Blogs


I was writing some small PowerShell scripts to kick pause and resume Hyper-V cluster hosts and I wanted to monitor the progress of draining the virtual machines of the node when pausing it. I found this nice blog about Draining Nodes for Planned Maintenance with Windows Server 2012 discussing this subject and providing us with the properties to do just that.

It seems we have two common properties at our disposal: NodeDrainStatus and NodeDrainTarget.

image

So I set to work but I just didn’t manage to get those properties to be read. It was like they didn’t exist. So I pinged Jeff Wouters who happens to use PowerShell for just about anything and asked him if it was me being stupid and missing the obvious. Well it turned out to be missing the obvious for sure as those properties do no exist. Jeff told me to double check using:

Get-ClusterNode MyNode -cluster MyCluster | Select-Object -Property *

Guess what, it’s not NodeDrainStatus and NodeDrainTarget but DrainStatus and DrainTarget.

image

What put me off here was the following example in the same blog post:

Get-ClusterResourceType "Virtual Machine" | Get-ClusterParameter NodeDrainMoveTypeThreshold

That should have been a dead give away. As we’ve been using MoveTypeTresHold a lot the recent months and there is no NodeDrain in that value either. But it just didn’t register. By the way you don’t need to create the property either is exists. I guess this code was valid with some version (Beta?) but not anymore. You can just get en set the property like this

Get-ClusterResourceType “Virtual Machine” -Cluster MyCluster | Get-ClusterParameter MoveTypeThreshold

Get-ClusterResourceType “Virtual Machine” -Cluster MyCluster | Set-ClusterParameter MoveTypeThreshold 2000

So lessons learned. Trust but verify Smile.  Don’t forget that a lot of things in IT have a time limited value. Make sure that to look at the date of what you’re reading and about what pre RTM version of the product the information is relevant to.

To conclude here’s the PowerShell snippet I used to monitor the draining process.


Suspend-clusternode –Name crusader -Cluster warrior -Drain

Do
{
    Write-Host (get-clusternode –Name “crusader” -Cluster warrior).DrainStatus -ForegroundColor Magenta    
    Sleep 1
}
until ((get-clusternode –Name “crusader” -Cluster warrior).DrainStatus -ne "InProgress")

If ((get-clusternode –Name “crusader” -Cluster warrior).DrainStatus -eq "Completed")
{
    Write-Host (get-clusternode –Name “crusader” -Cluster warrior).DrainStatus -ForegroundColor Green
}

Which outputs

image

Ben Armstrong Interviewed by Carsten Rachfahl on Windows Server 2012 Hyper-V


During the 2013 Global MVP Summit Carsten Rachfahl (@hypervserver) interviewed Ben Armstrong (@VirtualPCGuy.) who is the senior program manager for Hyper-V and as such the guy who has the honorable job of herding us cats Smile during the MVP Summit (he does an excellent job). Click the picture below to view the interview or visit Videointerview mit Ben Armstrong über Hyper-V

image

This interview took place at the CenturyLink Field stadium in Seattle (home of the Seahawks) where we had our little summit party and turned out to be a real gem and we have to thank Carsten for his efforts and Ben for giving the interview. Watch it all to the end to see that we’re all pretty convinced about the qualities of Windows Server 2012 Hyper-V!

Heading Home after the 2013 MVP Global Summit


The 2013 MVP Global Summit has come and gone already. I’m very happy to have attended and I was once again emerged in a culture of sharing knowledge and helping out our fellow MVPs and friends. Thank you Carsten!

image

We shared a lot of experiences we had running Windows Server 2012 Hyper-V in production. We met up with new MVPs and veteran attendees. To all my fellow MVPs and the people at Microsoft I’d like to say that it has been an honor and a privilege to have been able to talk shop with so many highly skilled, intelligent and engaged people. What ever their background they all share a level of commitment to be all they can be in their expertise. Thank you all for taking the time and putting in the effort. I hope to see you all next time!

My Impressions on Windows Server 2012 Hyper-V Cookbook


Having read Windows Server 2012 Hyper-V Cookbook I can safely say that if you need get up to speed with Hyper-V in Windows Server 2012 this is a great book for this purpose.

Having met Leandro Carvalho that’s not a surprise. What is pretty impressive is how he managed to get all you need to know to get going inside of one book that you can still lift with one hand. Now this is not going to make you a veteran Hyper-V enterprise architect over the weekend but it will help you get a well set up and functional Hyper-V environment running, monitored and protected. If you are already familiar with Hyper-V form previous Windows versions this book will also get you up to speed on a lot of the most important new features and improvements.

Windows Server 2012 Hyper-V Cookbook

Now a mere 305 pages are not enough to go into depth on every subject but this book will make a fine learning tool to set up a lab and take your first Windows Server 2012 Hyper-V servers / clusters into production. It also tackles some of the more intimidating stuff to some people like in place upgrades of Hyper-V clusters and disaster recovery. Details like CSV cache, Port ACL and their significance in the new Hyper-V version are not forgotten. I like that attention to detail. Knowing the vastness of what’s new in Windows Server 2012 Hyper-V I’m impressed at how well organized and effective the information is presented. So if you need to get started with Hyper-V, do it here with this book. It will make for a fine foundation to build on and move on to investigate the numerous network configurations, the VHDX format, SMB 3.0 goodness etc.

Heading Towards The 2013 Global MVP Summit


Hello people, I’m making my way to Seattle at the moment to attend the 2013 Global MVP Summit. I’m really looking forward to this as I have a lot of feedback and questions on using Windows Server 2012 and Hyper-V in real live. That and the fact that we’ll get to discuss all this amongst each other and with the product teams. There are not many opportunities where you get to meet up with so many enthusiastic subject matter experts from all over the world.

Last month I checked my Electronic System for Travel Authorization (ESTA) papers and made sure my passport was valid.  So after packing my bag it’s now traveling time as I need to get myself to SEATAC. One of these below works just fine for that purpose I know from experience. While awaiting boarding time I’ve parked myself in LHR. If it’s anything like last year this could be considered a long-haul MVP flight Smile.

image

I’m eager to meet up with friends and acquaintances again to talk shop and some fun. So Hyper-V, Cluster, Storage, Network PMs … my fellow MVPs and I are on our way. See you all very soon!