The Hyper-V Amigos Showcast Episode 4: TechEd North America 2014


In episode 4 the original Hyper-V amigos (also 4) get together for a chat. Yes, learn about the history of the name and about the what happened at TechEd North America 2014. How Aidan won speaker idol. How I got to be on stage.

image

Hans is a bit tired but extremely happy due to a certain soccer game outcome Smile. The orange shirt is not by accident. We discuss the keynote, the content, Azure announcements … we jump into one of our favorite topics storage and storage spaces and speculate a bit about vNext timing.

Enjoy!

Live Migration Speed Check List – Take It Easy To Speed It Up


When configuring live migrations it’s easy to go scrounge on all the features and capabilities we have in Windows Server 2012 R2.

There is no one stopping you configuring 50 simultaneous live migrations. When you have only one, two or even four 1Gbps NICs at your disposal,  you might stick to 1 or 2 VMs per available 1Gbps. But why limit yourself if you have one or multiple 10Gbps pipes or bigger ready to roll? Well let’s discuss a little what happens when you do a live migration on a Hyper-V cluster with CSV storage. Initiating a live migrations kicks of a slew of activities.

  1. First it is establish form where (aka the source host) to where we are migrating (aka the target host).
  2. Permissions are checked, are we allowed to do this?
  3. Do we have enough memory on the target to do this? If so allocate that memory.
  4. Set up a skeleton VM on the target host that is a perfect copy of the source VM’s  specifications and configure dependencies on the target host.
  5. Let’s see if we can get a network connection set up and running. If that works, we’re cool and can now transfer the memory.
  6. A bitmap is created to track the changes to the memory pages of the source VM’s pages. Each memory page is copied from the source host to the target host VM during which the memory page is marked clean.
  7. As long as the source VM is running memory is changing, which continues to be tracked in the bitmap and as such that page is mapped as dirty over there. In an iterative process this dirty memory is copied over again and so on. This continues until the remaining dirty memory is minimal. This will take longer if the VM is very memory intensive.
  8. The tiniest amount of not yet copied dirty memory is that part of a VMs state that is copied during “black out”. For this to happen the VM on the source host is paused, the remaining state is copied.
  9. A final check is done to confirm all is well and then the virtual machine is resumed on the target host.
  10. Any remains of the VM on the source host are cleaned up.

That’s actually a lot of work and as you can see copying the state is just part of the process. The more bandwidth & the lower the latency we throw at this part of the process becomes less of the total time spent during live migration.

If you can’t fill of just fill the bandwidth of your 10/40/46Gbps pipe or pipes & you operate at line speed, what’s left as overhead? Everything that’s not actual the copy of VM state. The trick is to keep the host busy so you minimize idle time of the network copies. I.e we want to fill up that bandwidth just right but  not go overboard otherwise  the work to manage a large number of multiple live migrations might actually slow you down. Compare it to juggling with balls. You might be very good and fast at it but when you have to many balls to attend to you’ll get into trouble because you have to spread you attention to wide, i.e. you’re doing more context switching that is optimal.

So tweaking the number of simultaneous live migrations to your environment is the last step in making sure a node is drained as fast as possible. Slowing things down can actually speed things up.  So when you get your 10Gbps or better pipes in production it pays of to test a bit and find the best settings for your environment.

Let’s recap all of the live migration optimization tips I have given over the years and add a final word of advice.  Those who have been reading my blog for a while know I enjoy testing to find what works best and I do tweak settings to get best performance and results. However you have to learn and accept that it makes no sense in real life to hunt for 1% or 2% reduction in live migration speeds. You’ll get one off  hiccups that slow you down more than that.

So what you need to do is tweak the things that matter the most and will get you 99% results?

  • Get the biggest pipe you need & can afford. Bigger pipes are always better than lots of aggregated smaller pipes when it come to low latency & high throughput.
  • Choose the best performance settings Hyper-V offers you. You can choose from TCP/IP,Compression, SMB. Ben Armstrong has a blog post on this Faster Live Migration–Which Option Should You Choose? I’d like to add that you can use NIC teaming for live migration as well and prior to Windows Server 2012 R2 that was the only way to aggregate bandwidth. Now you have more options. I prefer SMB but when I don’t have 10Gbps at my disposal I have found that compression really makes a difference. In my home  lab where I have only 1Gbps, the horror, it stopped me from going crazy Smile (being addicted to 10Gbps).

image

  • Optimize the power settings for your server BIOS if you want an extra speed & smoothness with 10Gbps (less so with 1Gbps). Look here An Early Look At Live Migration Over TCP/IP & Multichannel In Windows Server 2012 R2 Preview, the network traffic is a lot more stable, i.e. a flat line!  In Windows 2008 R2 this was a real need for 10Gbps or you’d be stuck at 16% max.
  • Enable Jumbo Frames for another 15-20%. Thanks to Multi Channel I can visualize this now. See also this blog post Live Migration Can Benefit From Jumbo Frames. The pictures say it all!
  • Figure out the best number of simultaneous live migrations in your environments. Well you just read this blog, so now you know.  Start at 4 and experiment upwards. Tune it back down if the speed deteriorates. The “best” number depends on your environment.

If you do these 5 things you’ll have really gotten the best performance out of your infrastructure that’s possible for live migration. Bar compression, which is not magic either but reducing the GB you need to transport at the cost of CPU cycles, you just cannot push more than 1.25GB/s trough a single 10Gbps pipe and so on. You might keep looking to grab another 1% or 2% improvement left and right  but might I suggest you have more pressing issues to attend to that, when fixed are a lot more rewarding? Knocking 1 or 2 seconds of a 100 second host evacuation is not going to matter, it’s a glitch. Stop, don’t over engineer it, don’t IBM it, just move on. If you don’t get top performance after tweaking these 5 settings you should look at all the moving parts involved between the host as the issue is there (drivers, firmware, cables, switch configurations, …) as you have a mistake or problem somewhere along the way.

DELL Has Great Windows Server 2012 R2 Feature Support – Consistent Device Naming–Which They Help Develop


The issue

Plug ‘n Play enumeration of devices has been very useful for loading device drivers automatically but isn’t deterministic. As devices are enumerated in the order they are received it will be different from server to server but also within the system. Meaning that enumeration and order of the NIC ports in the operating system may vary and “Local Area Connection 2” doesn’t always map to port 2 on the  on board NIC. It’s random. This means that scripting is “rather hard” and even finding out what NIC matches what port is a game of unplugging cables.

Consistent Device Naming is the solution

A mechanism that has to be supported by the BIOS was devised to deal with this and enable consistent naming of the NIC port numbering on the chassis and in the operating system.

But it’s even better. This doesn’t just work with on board NICs. It also works with add on cards as you can see. In the name column it identifies the slot in which the card sits and numbers the ports consistently.

In the DELL 12th Generation PowerEdge Servers this feature is enabled by default. It is not in HP servers for some reason, you need to turn in it on manually.

I first heard about this feature even before Windows Server 2012 Beta was released but as it turns out Dell has been involved with the development of this feature. It was Dell BIOS team members that developed the solution to consistently name network ports and had it standardized via PCI SIG.  They also collaborated with Microsoft to ensure that Windows Server 2012 would support all this.

Here’s a screen shot of a DELL R720 (12th Generation PowerEdge Server) of ours. As you can see the Consistent Device Naming doesn’t only work for the on broad NIC card. It also does a fine job with add on cards of which we have quite a few in this server.image

It clearly shows the support for Consistent Device Naming for the add on cards present in this server. This is a test server of ours (until we have to take it into production) and it has a quad 1Gbps Intel card, a dual Intel X520 DA card and a dual port Mellanox 10Gbps RoCE card. We use it to test out our assumptions & ideas. We still need a Chelsio iWarp card for more testing mind you Winking smile

A closer look

This solution is illustrated the in the “Device Name column” in the screen shot below. It’s clear that the PnP enumerated name (the friendly name via the driver INF file) and the enumerated number value are very different from the number in Name column ( NIC1, NIC2, NIC2, NIC4) even if in this case where by change the order is correct. If the operating system is reinstalled, or drivers changed and the devices re-enumerated, these numbers may change as they did with previous operating systems.

image

The “Name” column is where the Consistent Device Naming magic comes to live. As you can see you are able to easily identify port names as they are numbered consistently, regardless of the “Device Name” column numbering and in accordance with the numbering on the chassis or add on card. This column name will NEVER differ between identical servers of after reinstalling a server because it is not dependent on PnP. Pretty cool isn’t it! Also note that we can rename the Name column and if we choose we can keep the original name in that one to preserve the mapping to the physical hardware location.

In the example below thing map perfectly between the Name column and the Device Name column but that’s pure luck.image

On of the other add on cards demonstrates this perfectly.image

Fixing Two Small DELL Compellent Hardware Hiccups


Here’s two little tips to solve some small hardware issues you might run into with a Compellent SAN. But first, you’re never on your own with CoPilot support. They are just one phone call away so I suggest if you see these to minor issues you give them a call. I speak from experience that CoPilot rocks. They are really good and go the extra mile. Best storage support I have ever experienced.

Notes

  • Always notify CoPilot as they will see the alerts come in and will contact you for sure Smile. Afterwards they’ll almost certainly will do a quick health check for you. But even better during the entire process they keep an eye on things to make sure you SAN is doing just fine. And if you feel you’d like them to tackle this, they will send out an engineer I’m sure.
  • Note that we’re talking about the SC40 controllers & disk bays here. The newer genuine DELL hardware is better than the super micro ones.

The audible alert without any issues what so ever

We kept getting an audible alert after we had long solved any issues on one of the SANs. The system had been checked a couple of times and everything was in perfect working order. Except for that audible alarm that just didn’t want to quit. A low priority issue I know but every time we walk into the data center we were going “oh oh” for a false alert. That’s not the kind of conditioning you want. Alerts are only to be made when needed and than they do need to be acted upon!

Working on this with CoPilot support we got rid of it by reseating the upper I/O module. You can do this on the fly – without pulling SAS-cables out or so, they are redundant, as long as you do it one by one and the cabling is done right (they can verify that remotely for you if needed).

image

But we got lucky after the first one. After the “Swap Clear” was requested  every warning condition was cleared and we got rid of the audible alert beep!  Copilot was on the line with us and made sure all paths are up and running so no bad things could happen. That’s what you have a copilot for.

Front panel display dimming out on a Compellent Disk Bay

We have multiple Compellent SANs and on one of those we had a disk bay with a info panel that didn’t light up anymore. A silly issue but an annoying one as this one also show you the disk bay ID.

image

Do we really replace the disk bay to solve this one? As that light had come on and of a couple of time it could just be a bad contact so my colleague decided to take a look. First  he removed the protective cover and then, using some short & curved screw drivers, he took of the body part. The red arrow indicates the little latch that holds the small ribbon cable in place.

image

That was standing right open. After locking that down the info appeared again on the panel. The covers was screwed on again and voila. Solved.

ODX Doesn’t Support IDE But Works With Both VHDX And VHD Virtual Disk Format


This question came up recently, once again, and deserves it a little blog post. If you want to see the benefits of ODX you’ll need to connect your virtual disks to a vSCSI controller or other supported controller option. These are iSCSI, vFC, a SMB 3 File Share or a pass-through disk. But unless you have really good reason to use pass-through disks, don’t. It’s limiting you in to many ways.

Basically in generation 1 virtual machines that boot from a vIDE this rules out the system disk. So the tip here is to store your data that’s moved around in or between virtual machines in vSCSI attached VDH or (preferably) VHDX  virtual disks. If you can use generation 2 virtual machines, you’ll be able to leveraged ODX on the system partition as well as it boots from vSCSI Smile.

It goes without saying you need to store any virtual disks  involved on ODX capable LUNs via iSCSI, FC, FCoE, SMB 3 File Share or SAS for ODX to be available to the virtual machine.

Also beware that ODX only works on NTFS partitioned disks. The files cannot be compressed or encrypted.  Sparse files are not supported either. And finally, the volume cannot be BitLocker protected.

Here’s a screenshot of a copy of 30GB worth of ISO files to a VHDX attached to a vSCSI controller:image

Here’s a screenshot of a copy of 30GB worth of ISO files to a VHDX attached to a vIDE controller.

image

You’ll notice quite a difference. Depending on the load on the controllers/SAN it’s on average 3 times slower than the same action to a VHDX disk on a vSCSI controller.

Hyper-V UNMAP Does Work With SAN Snapshots And Checkpoints But Not Always As You First Expect


Recently I was asked to take a look at why UNMAP was not working predictably  in a Windows Server 2012 R2 Hyper-V environment. No, this is not a horror story about bugs or bad storage solutions. Fortunately, once the horror option was of the table I had a pretty good idea what might be the cause.

San snapshots are in play

As it turned out everything was indeed working just fine. The unexpected behavior that made it seem that UNMAP wasn’t working well or at least at moments they didn’t expected it was caused by the SAN snapshots. Once you know how this works you’ll find that UNMAP does indeed work predictably.

Snapshots on SANs are used for automatic data tiering, data protection and various other use cases. As long as those snapshots live, and as such the data in them, UNMAP/Trim will not free up space on the SAN with thinly provisioned LUNs. This is logical, as the data is still stored on the SAN for those snapshots, hard deleting it form the VM or host has no impact on the storage the SAN uses until those snapshots are deleted or expire. Only what happens in the active portion is directly impacted.

An example

  • Take a VM with a dynamically expanding VHDX that’s empty and mapped to drive letter D. Note the file size of the VHDX and the space consumed on the thinly provisioned SAN LUN where it resides.
  • Create 30GB of data in that dynamically expanding  virtual hard disk of the virtual machine
  • Create a SAN snapshot
  • Shift + Delete that 30GB of data from the dynamically expanding virtual hard disk in the virtual machine. Watch the dynamically expanding VHDX  grow in size, just like the space consumed on the SAN
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: it remains +/- the same.
  • Shut down the VM and look at the size of the dynamic VHDX file. It shrinks to the size before you copied the data into it.
  • Boot the VM again and copy 30GB of data to the dynamically expanding VHDX in the VM again.
  • See the size of the VHDX grow and notice that the space consumed on the SAN for that LUN goes up as well.
  • Shift + Delete that 30GB of data from the dynamically expanding  virtual hard disk in the virtual machine
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: It drops, as the data you delete is in the active part of your LUN (the second 30GB you copied), but it will not drop any more than this as the data kept safe in the frozen snapshot of the LUN is remains there (the first 30GB you copied)
  • When you expire/delete that snapshot on the SAN  we’ll see the size on the thinly provisioned SAN LUN  drop to the initial size of this exercise.

I hope this example gave you some insights into the behavior

Conclusion

So people who have snapshot based automatic data tiering, data protection etc. active in their Hyper-V environment and don’t see any results at all should check those snapshot schedules & live times. When you take them into consideration you’ll see that UNMAP does work predictably, all be it in a “delayed” fashion Smile.

The same goes for Hyper-V checkpoints (formerly known as snapshots). When you create a checkpoint the VHDX is kept and you are writing to a avhdx (differencing disk) meaning that any UNMAP activity will only reflect on data in the active avhdx file and not in the “frozen” parent file.

TechEd North America 2014 Session


There is something extremely rewarding about seeing your name on the intro slide of a TechEd USA presentation. I helped deliver What’s New in Windows Server 2012 R2 Hyper-V together with Ben Armstrong yesterday and it was quite the experience.

DSCN2817_280

image

A big thank you to Ben and Microsoft for the confidence they have shown in me and the opportunity to do this. A mention to our CEO who has the ability to look beyond the daily needs and facilitates his and encourages his employees to get out of the village to learn, grow and prosper. This is the principle one of my high school teachers lived and worked by, help people be all they can be.

The IT community around the Microsoft ecosystem is both a local and a global one. In this day and age knowledge gets shared and flows freely. People work with people and with organizations. No one gets anywhere in isolation.I’m happy to see so may of my buddies do so well. It’s great to see people succeed, grow, enjoy their work and reap the fruits of their efforts. Look at Benedict Berger who was presenting in the room next to ours or Aidan Finn, a long time community member and experienced speaker who won speaker idol and by doing so secured a speaker slot for next year. This has many reasons and one of them is people believing in you and giving you the chance to grab opportunities. To those I say, thank you very much!