Windows Server 2012 Hyper-V Supports IPsec Task Offloading


IPsec has been around for a while now. In an ever more security conscious & regulated world you want and/or are required to protect your network communication by
authenticating and encrypting the contents of at least some of your network traffic. Think about SOX and HIPPA and you’ll see that trade or government security requirements are not going anywhere but up for us all. This is not just restricted to military of intelligence organizations.

We’ve seen the ability to offload IPsec traffic to the NIC for a while now. This is great as the IPsec processing is a very CPU intensive workload. Unfortunately it didn’t work for virtual machines . Until now IPsec offloads was only available to host/parent workloads in using Windows Server 2008 R2. The virtualization of high volume network traffic workloads that require encryption means a serious hit on the resources on the host. If you’re willing to pay you might get by by throwing extra host & CPU power at the issue. But what if the load means a single virtual machine with 4 vCPUs can’t hack it? Game over. Sure Windows Server 2012 Hyper-V allows for 32 vCPUs now,  but that is very costly, so this is not a very cost effective solution. So in some cases this lead to those workloads being marked as “unsuited for virtualization”.

But with Windows Server 2012 Hyper-V we get a very welcome improvement, that is the fact that a virtual machine can now also offload the IPsec processing to the physical NIC on the host. That frees up a lot of CPU cycles to perform more application-level work, resulting in better virtualization densities, which means less costs etc.

Let’s take a look where you can set this in the Hyper-V GUI where you’ll find it under the network adaptor /Hardware Acceleration.

image

IPsec offload is also managed by the Hyper-V switch, this controls whether the offloading will be active or not. This is to prevent that the IPsec offload stopping the services if insufficient resources are available. Please do note that IPsec when required in the guest will be done anyway creating an extra CPU burden. So this does not disable IPsec, just the offloading of it. On top of this and in the gravest extreme you can guarantee that IPsec servers can get the resources they need by sacrificing less important guest if needed. by using virtual machine prioritization. The fact that you can configure the number of security associations helps balancing the needs of multiple virtual machines requiring IPsec offload.

To conclude, this wouldn’t be Windows Server 2012 if you couldn’t do all this with PowerShell. Take a look at  Set-VMNetworkAdapter and notice the following parameter:

-IPsecOffloadMaximumSecurityAssociation<UInt32>

This specifies the maximum number of security associations that can be offloaded to the physical network adapter that is bound to the virtual switch and that supports IPSec Task Offload. The thing to notice here is that specify a zero value is used to disable the IPsec Offload feature.

image

Windows 8 Hyper-V Improved Integration Services Setup


In Windows 8 Beta there is a nice and functional improvement in Hyper-V Manager when you want to install or upgrade the Integration Services. It shows you what version (if any) is installed and if an upgrade is needed or not. Until now it just “mentioned” that “a previous” (no version, could be the latest one) were installed and happily let you reinstall them needed or not. Begs the questions how does this all deal with “corrupted” integration services if such a thing exists. I, personally, have never seen it. Uninstall/reinstall I guess when you come across it as I don’t know of a forced/repair install option.

Walkthrough of The Improved Integration Services Setup

In the Virtual Machine console navigate to Action and select “Insert Integration Services Setup Disk”

image

In the Virtual Machine console you’ll see that inserting the integration services disk succeeded.

image

Like before, if the setup process doesn’t start automatically just navigate to the DVD and kick start it yourself.

10

 

As you can see below it now shows what version (if any) of the integration services is already installed and asks you if you want to update. In the example below you can see it has the Windows 2008 R2 SP1 version of the integration services. This is as expected as this machine (a W2K3R2SP2 guest) was imported from a Hyper-V cluster running that Windows 2008 R2 SP1.

Integration Comopnents

 

You click OK and the installation process for the integration services will start.

02

03

 

When the installation is done you’ll be notified that the virtual machines needs to restart.

image

 

The server will reboot and if you then try to install the integration services again it will notify you that it has already the correct version of the integration tools running.

09

 

Remarks

If you hit an error in the Beta of Windows 8 Hyper-V I advise two things I have experienced myself in the labs.

  1. Make sure you have enough disk space. I had one test server that had only a few MB left on the C partition and that bit me Smile
  2. Make sure you do it after a clean reboot. Just to make sure you have no pending hardware detection/installs lingering around. I experienced this one on a Windows 2003 R2 SP2 guest. Error code 1618, yup that means Another installation is already in progress.

04

Windows 8 Hyper-V Cluster Beta Teaser


What does an MVP do after a day of traveling back home from the MVP Summit 2012 in Redmond? He goes to bed and gets up early next morning to upgrade his Windows 2008 R2 SP1 Hyper-V Cluster to Windows 8. That means when I boot the lab nodes these days I get greeted by the “beta fish” we knew from Windows 2008 R /Windows 7 but it’s “metro-ized”

image

Here is a teaser screenshot from concurrent Live Migrations in action on a new Windows Server 8 Beta Hyper-V cluster in the lab. As you can see this 2 node cluster is handling 2 concurrent Live Migrations at the time. The other guests are queued. The number of Live Migrations you can do concurrently is dictated by how much bandwidth you want to pay for. In the lab that isn’t very much as you can see Winking smile.

ConcurrentLiveMigrations

In Hyper-V 3.0 you can choose the networks to use for Live Migrations with a preference order. Just like it was in W2K8R2. So if you want more bandwidth you’ll have to team some NIC ports together or put more NICs in and you should be fine. It does not use multichannel. You have to keep in mind that each live migration only utilizes a single network connection, even if multiple interfaces are provided or network teaming is enabled.  If there are multiple simultaneous live migrations, different migrations will be put on different network connections.

If the Live Migration network should become unavailable the CSV network in this example will take over. The CSV & the Live Migration network serve as each others redundant backup network.

LiveMigNetworks

There is more to come but I have only 24 hours in a day and they are packed. Catch you later!

Windows 8 introduces SR-IOV to Hyper-V


We dive a bit deeper into SR-IOV today. I’m not a hardware of software network engineer but this is my perspective on what it is and why it’s valuable addition to the toolbox of Hyper-V in Windows 8.

What is SR-IOV?

SR-IOV stands for Single Root I/O Virtualization. The “Single Root” part means that the PCIe device can only be shared with one system. The Multi Root I/O Virtualization (MR-IOV) is a specification where it can be shared by multiple systems. This is beyond the scope of this blog but you can imagine this being used in future high density blade server topologies and such to share connectivity among systems.

What does SR-IOV do?

Basically SR-IOV allows a single PCIe device to emulate multiple instances of that physical PCIe device on the PCI bus. So it’s a sort of PCIe virtualization. SR-IOV achieves this by using NICs that support this (hardware dependent) by use physical functions (PFs) and virtual functions (VFs). The physical device (think of this a port on a NIC)  is known as a Physical Function (PF) . The virtualized instances of that physical device (that port on our NIC that gets emulated x times) are the Virtual Functions (VF). A PF acts like a full blown PCIe device and is configurable, it acts and functions like a physical device. There is only one PF per port on a physical NIC. VF are only capable of data transfers in and out of devices and can’t be configured or act like real PCIe devices. However you can have many of them tied to one PF but they share the configuration of the PF.

It’s up to the hypervisor (software dependency)  to  assign one or more of these VFs to a virtual Machine (VM) directly. The guest can then use the VF NIC ports via VF driver (so there need to be VF drivers in the integration components) and traffic is send directly (via DMA) in and out of the guest to the physical NIC bypassing the virtual switch of the hyper visor completely. This reduces overhead on CPU load and increases performance of the host and as such also helps with network I/O to and from the guests, it’s as if the virtual machine uses the physical NIC in the host directly. The hyper visor needs to support SR-IOV because it needs to know what PFs and VFs are en how they work.

image

So SR-IOV depends on both hardware (NIC) and software (hypervisor) that supports it. It’s not just the NIC by the way, SR-IOV also needs a modern BIOS with virtualization support. Now most decent to high end server CPUs today support it, so that’s not an issue. Likewise for the NIC.  A modern quality NIC targeted at the virtualization market supports this.  And of cause SR-IOV also needs to be supported by the hypervisor. Until Windows 8, Hyper-V did not support SR-IOV but now it does.

I’ve read in an HP document that you can have 1 to 6 PFs per device (NIC port) and up to 256 “virtual devices” or VF per NIC today. But in reality that might not viable due to the overhead in hardware resources associated with this. So 64 or 32 VFs might be about the maximum but still, 64*2=128 virtual devices from a dual port 10Gbps NIC is already pretty impressive to me. I don’t know what they are for Hyper-V 3.0 but there will be limits to the number of SR-IOV NIC is a server and the number of VFs per core and host but I think they won’t matter to much for most of us in reality. And as technology advances we’ll only see these limits go up as the SR-IOV standard itself allows for more VFs.

So where does SR-IOV fit in when compared to VMQ?

Well it does away with some overhead that still remains with VMQ. VMQ took away the overload of a single core in the host have to be involved in handle all the incoming traffic. But still the hypervisor still has to touch every packet coming in and out. With SR-IOV that issue is addressed as it allows moving data in and out of a virtual machine to the physical NIC via Direct memory Access (DMA). So with this the CPU bottle neck is removed entirely from the process of moving data in and out of virtual machines. The virtual switch never touches it. To see a nice explanation of SR-IOV take a look at the Intel SR-IOV Explanation video on YouTube.

Intel SR-IOV Explanation

VMQ Coalescing tried to address some of the pain of the next bottle neck of using VMQ, which is the large number of interrupts needed to handle traffic if you have a lot of queues. But as we discussed already this functionality is highly under documented and it’s a bit of black art. Especially when NIC teaming and some NIC advanced software issues come in to play. Dynamic VMQ is supposed to take care of that black art and make it more reliable and easier.

Now in contrast to VMQ & RSS that don’t mix together in a Hyper-V environment you can combine SR-IOV with RSS, they work together.

Benefits Versus The Competition

One of the benefits That Hyper-V 3.0 in Windows 8 has over the competition is that you can live migrate to an node that’s not using SR-IOV. That’s quite impressive.

Potential Drawback Of Using SR-IOV

A draw back is that by bypassing the Extensible Virtual Switch you might lose some features and extensions. Whether this is  very important to you depends on your environment and needs. It would take me to far for this blog post but CISCO seems to have enough aces up it’s sleeve to have an integrated management & configuration interface to manage both the networking done in the extensible virtual switch as the SR-IOV NICs. You can read more on this over here Cisco Virtual Networking: Extend Advanced Networking for Microsoft Hyper-V Environments. Basically they:

  1. Extend enterprise-class networking functions to the hypervisor layer with Cisco Nexus 1000V Series Switches.
  2. Extend physical network to the virtual machine with Cisco UCS VM-FEX.

Interesting times are indeed ahead. Only time will tell what many vendors have to offer in those areas & for what type customer profiles (needs/budgets).

A Possible Usage Scenario

You can send data traffic over SR-IOV if that suits your needs. But perhaps you’ll want to keep that data traffic flowing over the extensible Hyper-V virtual switch. But if you’re using iSCSI to the guest why not send that over the SR-IOV virtual function to reduce the load to the host? There is still a lot to learn and investigate on this subject As a little side note. How are the HBAs in Hyper-V 3.0 made available to the virtual machines? SR-IOV, but the PCIe device here is a Fibre HBA not a NIC. I don’t know any details but I think it’s similar.

Know What Receive Side Scaling (RSS) Is For Better Decisions With Windows 8


Introduction

As I mentioned in an introduction post Thinking About Windows 8 Server & Hyper-V 3.0 Network Performance there will be a lot of options and design decisions to be made in the networking area, especially with Hyper-V 3.0. When we’ll be discussing DVMQ (see DMVQ In Windows 8 Hyper-V), SR-IOV in Windows 8 (or VMQ/VMDq in Windows 2008 R2) and other network features with their benefits, drawbacks and requirements it helps to know what Receive Side Scaling (RSS) is. Chances are you know it better than the other mentioned optimizations. After all it’s been around longer than VMQ or SR-IOV and it’s beneficial to other workloads than virtualization. So even if you’re a “hardware only for my servers” die hard kind of person you can already be familiar with it. Perhaps you even "dislike” it because when the Scalable Networking Pack came out for Windows  2003 it wasn’t such a trouble free & happy experience. This was due to incompatibilities with a lot of the NIC drivers and it wasn’t fixed very fast. This means the internet is loaded with posts on how to disable RSS & the offload settings on which it depends. This was done to get stability or performance back for application servers like Exchange and others applications or services.

The Case for RSS

But since Windows 2008 these days are over. RSS is a great technology that gets you a lot better usage of out of your network bandwidth and your server. Not using RSS means that you’ll buy extra servers to handle the same workload. That wastes both energy and money. So how does RSS achieve this? Well without RSS all the interrupt from a NIC go to the same CPU/Core in multicore processors (Core 0).  In Task Manager that looks not unlike the picture below:

image

Now for a while the increase in CPU power kept the negative effects at bay for a lot of us in the 1Gbps era. But now, with 10Gbps becoming more common every day, that’s no longer the case. That core will become the bottle neck as that poor logical CPU will be running at 100%, processing as much network interrupts in can handle, while the other logical CPU only have to deal with the other workloads. You might never see more than 3.5Gbps of bandwidth being used if you don’t use RSS. The CPU core just can’t keep up. When you use RSS the processing of those interrupts is distributed across al cores.

With Windows 2008 and Windows 2008 R2 and Windows 8 RSS is enabled by default in the operating system. Your NIC needs to support it and in that case you’ll be able to disable or enable it. Often you’ll get some advanced features (illustrated below) with the better NICs on the market. You’ll be able to set the base processor, the number of processors to use, the number of queues etc. That way you can divide the cores up amongst multiple NICs and/or tie NICs to specific cores.

image

image

So you can get fancy if needed and tweak the settings if needed for multi NIC systems. You can experiment with the best setting for your needs, follow the vendors defaults (Intel for example has different workload profiles for their NICs) or read up on what particular applications require for best performance.

Information On How To Make It Work

For more information on tweaking RSS you take a look at the following document http://msdn.microsoft.com/en-us/windows/hardware/gg463392. It holds a lot more information than just RSS in various scenarios so it’s a useful document for more than just this.

Another good guide is the "Networking Deployment Guide: Deploying High-Speed Networking Features". Those docs are about Windows 2008 R2 but they do have good information on RSS.

If you notice that RSS is correctly configured but it doesn’t seem to work for you it’ might be time to check up on the other adaptor offloads like TCP Checksum Offload, Large Send Offload etc. These also get turned of a lot when trouble shooting performance or reliability issues but RSS depends on them to work. If turned off, this could be the reason RSS is not working for you..

Thinking About Windows 8 Server & Hyper-V 3.0 Network Performance


Introduction

The main purpose of this post is, as mentioned in the title, to think. This is not a design or a technical reference. When it comes to designing a virtualization solution for your private cloud their are a lot of components to consider. Storage, networking, CPU, memory all come in to play and there is no one size fits all. It all depends on your needs, budget in combination with how good your  insight into your future plans & requirements are. This is not and easy task. Virtualizing 40 webservers is very different from virtualizing SQL Server, Exchange or SharePoint.  Server virtualization is different from VDI and VDI itself comes in many different flavors.

  • So what workloads are you hosting? Is it a homogeneous or a heterogeneous environment?
  • What kind of applications are you supporting? Client-Server, SOA/Web Services, Cloud apps?
  • What storage performance & features do you need to support all that?
  • What kind of network traffic does this require?
  • What does your business demand? Do you know, do they even know? Can you even know in a private cloud environment?
  • Do you have one customers or many (multi tenancy) and how are they alike or different in both IT needs and business requirements.

The needs of true public cloud builders are different from those running their own private clouds in their own data centers or in a mix of those with infrastructure at a hosting provider. On top of that an SMB environment is different from large enterprises and companies of the same size will differ in their requirements enormously due to the nature of their business.

I’ve written about virtualization and CPU considerations before (NUMA, Power Save settings for both OS & 10Gbps network performance) before. I’ve also discussed a number of posts about 10Gbps networking and different approaches on how to introduce it with out breaking the bank. In 2012 I intend to blog some more on networking and storage options with Windows 8 and Hyper-V 3.0. But I still need to get my hands on the betas and release candidates of Windows 8 to do so. You’ll notice I don’t talk about Infiniband. Well I just don’t circulate in the ecosystems where absolute top notch performance is so important that they can justify and get that kind of budget to throw at those needs.

To set the scene for these blog posts I’ll introduce some considerations around networking options with Hyper-V. There are many features and options both in hardware, technologies, protocols, file systems. Even when everything is intended to make live simpler people might get lost in all the options and choices available.

Windows Server 8 NIC features – The Alphabet Soup

  • Data Center Bridging (DCB)
  • Receive Segment Coalescing (RSC)
  • Receive Side Scaling (RSS)
  • Remote Direct Memory Access (RDMA)
  • Single Root I/O Virtualization (SR-IOV)
  • Virtual Machine Queue (VMQ)
  • IPsec offload (IPsecTO)

A lot of this stuff has to do with converged networks. These offer a lot of flexibility and the potential for cost savings along the way. But convergence & cost savings are not a goal. They are means to an end. Perhaps you can have better, cheaper and more effective solutions leveraging you existing network infrastructure by adding some 10Gbps switches & NICs where they provide the best bang for the buck. Chances are you don’t need to throw it all out and do a fork lift replacement. Use what you need from the options and features. Be smart about it. Remember my post on A Fool With A Tool Is Still A Fool, don’t be that guy!

Now let’s focus on couple of the features here that have to do with network I/O performance and not as much convergence or QOS. As an example of this I like to use Live migration of virtual machines with 10Gbps. Right now with one 10Gbps NIC I can use 75% of the bandwidth of a dedicated NIC for live migration. When running 20 or more virtual machines per host with 4Gbps to 8Gbps of memory and with Windows 8 giving me multiple concurrent Live Migrations I can really use that bandwidth. Why would I want to cut it up to 2 or 3Gbps in that case. Remember the goal. All the features and concepts are just tools, means to and end. Think about what you need.

But wait, in Windows 8 we have some new tricks up our sleeve. Let’s team two 10Gbps NICs put all traffic over that team and than divide the bandwidth up and use QOS to assure Live Migration gets 10Gbps when needed but without taking it away from others network I/O when it’s not needed. That’s nice! Sounds rather cool doesn’t it and I certainly see a use for it. It might not be right if you can’t afford to loose that bandwidth when Live Migration kicks in but if you can … more power & cost savings to you. But there are other reasons not to put everything on one NIC or team.

RSS, VMQ, SR-IOV

One thing all these have in common is that they are used to reduce the CPU load / bottleneck on the host and allow to optimize the network I/O and bandwidth usage of your expensive 10Gbps NICs. Both avoiding having a CPU bottleneck and optimizing the use of the available bandwidth mean you get more out of your servers. That translates in avoiding buying more of them to get the same workload done.

RSS is targeted at the host network traffic. VMQ and SR-IOV are targeted at the virtual machine network traffic but in the end the both result in the same benefits as stated above. RSS & VMQ integrate well with other advanced windows features. VMQ for examples can be used with the extensible Hyper-V switch while RSS can be combined with QOS & DCB in storage & cluster host networking scenarios. So these give you a lot of options and flexibility. SR-IOV or RDMA is more focused on raw performance and doesn’t integrate so well with the more advanced features for flexibility & scalability. I’ll talk some more on this in future blog posts.

Now with all these features that have there own requirements and compatibilities you might want to reconsider putting all traffic over one pair of teamed NICs. You can’t optimize them all in such a scenario and that might hurt you. Perhaps you’ll be fine, perhaps you won’t.

image

So what to use where and when depends on how many NICs you’ll use in your servers and for what purpose. For example even in a private cloud for lightweight virtual machines running web services you might want to separate the host management & cluster traffic from the virtual machine network traffic. You see RSS & VMQ are mutually exclusive. That way you can use RSS for the host/cluster traffic and DVMQ for the virtual machine network. Now if you need redundancy you might see that you’ll already use 2*2NIC with Windows 8 NLB in combination with two switches to avoid a single point of failure. Do your really need that bandwidth for the guest servers? Perhaps not but, you might find that it helps improve density because of better better host & NIC performance helping you avoiding the cost of buying extra servers. If you virtualize SQL servers you’d be even more interested in all this. The picture below is just an illustration, just to get you to think, it’s not a design.

image

I’m sure a lot of matrices will be produced showing what features are compatible under what conditions, perhaps even with some decision charts to help you decide what to use where and when.

Data Protection & Disaster Recovery in Windows 8 Server Hyper-V 3.0


The news coming in from the Build Windows conference is awesome. The speculation of the last months is being validated by what is being told and on top of that more goodness is thrown at us Hyper-V techies.

On the data protection and disaster recovery front some great new weapons are at our disposal. Let’s take a look at some of them.

Live Migration & Storage Live Migration.

Among the goodies are the improvements in Live Migration and the introduction of Storage Live Migration.  Hyper-V 3.0 supports multiple concurrent Live Migrations now, which combined with adequate bandwidth will provide for fast evacuation of problematic hosts. Storage Live Migration means you can move a VM (configuration, VHD & snapshots) to different storage while the guest remains on line so the users are not hindered by this. I’m trying to find out if they will support multiple networks / NICs  with this.

Now to make this shine even more MSFT has another ace up it’s sleeve. You can do Live Migration and Storage Live Migration without the requirement of shared storage on the backend. This combination is a big one. This is means “shared nothing” high availability. Even now when prices for entry level shard storage has plummeted we see SMB being weary of SAN technology. It’s foreign to them and the fact they haven’t yet gained any confidence with the technology makes them hesitant. Also the real or perceived complexity might hold ‘m back. For that segment of the market it is now possible to have high availability anyway with the combo Live Migration / Storage Migration.  Add to this that Hyper-V now supports running virtual machines on a file share and you can see the possibilities of NAS appliances in this space of the market for achieving some very nice solutions.

Replication to complete the picture

To top this of you have replication built in, meaning we have the possibility to provide reasonably fast disaster recovery. It might not be real time data center fail over but a lot of clients don’t need that. However, they do need easy recoverability and here it is. To give you even more options, especially  if you only have one location, you can replicate to the cloud.

So now I start dreaming Smile We have shared nothing Live & Storage Live  Migration, we have replication. What could achieve with this? Do synchronous replication locally over a 10Gbps for example and use that to build something like continuous availability. There we go, we already have requirements for “Windows 8 Server R2”!

NIC Teaming in the OS

No more worries about third party NIC teaming woes. It has arrived in the OS (finally!) and it will support load balancing & failover. I welcome this, again it makes this a lot more feasible for the SMB shops.

IP Virtualization / Address Mobility

Another thing that will aid with any kind of of site  disaster recovery / high availability is IP address Mobility. You have an IP for the hosting of the VM and one for internal use by VM. That means you can migrate to other environments (cloud, remote site) with other addresses as the VM can change the hosted IP address, while the internal IP address remains the same.  Just imagine the flexibility this gives us during maintenance, recovery, trouble shooting network infrastructure issues and all this without impacting the users who depend on the VM to get their job done.

Conclusion

Everything we described is out of the box with Windows 8 Server Hyper-V. To a lot of business this can  mean a  huge improvement in their current  availability and disaster recovery situation. More than ever there is now no more reason for any company to go down or even out of business due to catastrophic data loss as all this technology is available on site, in hybrid scenarios and in the cloud with the providers.