Copy Cluster Roles Hyper-V Cluster Migration Fails at Final Step with error Virtual Machine Configuration ‘VM01′ failed to register the virtual machine with the virtual machine service


I was working on a migration of a nice two node Windows Server 2012 Hyper-V cluster to Windows Server 2012 R2. The cluster consist out of 2 DELL R610 servers and a DELL  MD3200 shared SAS disk array for the shared storage. It runs all the virtual machines with infrastructure roles etc. It’s a Cluster In A Box like set up. This has been doing just fine for 18 months but the need for features in Windows Server 2012 R2 became too much to resists. As the hardware needs to be recuperated and we have a maintenance windows we use the copy cluster roles scenario that we have used so many times before with great success. It’s the Perform an in-place migration involving only two servers scenario documented on TechNet and as described in one of my previous blogs Migrating a Hyper-V Cluster to Windows 2012 R2 for your convenience.

Virtual Machine Configuration ‘VM01′ failed to register the virtual machine with the virtual machine service

As the source host was running on Windows Server 2012 we could have done the live migration scenario but the down time would be minimal and there is a maintenance window. So we chose this path.

So we performed a good health check. of the source cluster and made sure we had no snapshots left hanging around. Yes it’s supported now for this migration scenario but I like to have as few moving parts as possible during a migration.

It all went smooth like silk. After shutting down the VMs on the source cluster node, bringing the CSV off line (and un-presenting the LUN from the source node for good measure), we present that LUN to the target host. We brought the CSV on line and when that was completed successfully we were ready to bring the virtual machines on line and that failed …

Log Name:      Microsoft-Windows-Hyper-V-High-Availability-Admin
Source:        Microsoft-Windows-Hyper-V-High-Availability
Date:          4/02/2014 19:26:41
Event ID:      21102
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      VM01.domain.be
Description:
‘Virtual Machine Configuration VM01′ failed to register the virtual machine with the virtual machine management service.

image

image

 

Let’s dive into the other event logs. On the host the application security and system event log are squeaky clean. The Hyper-V event logs are pretty empty or clean to except for these events in the Hyper-V-VMMS Admin log.

Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
Source:        Microsoft-Windows-Hyper-V-VMMS
Date:          4/02/2014 19:26:40
Event ID:      13000
Task Category: None
Level:         Error
Keywords:     
User:          SYSTEM
Computer:      VM01.domain.be
Description:
User ‘NT AUTHORITY\SYSTEM’ failed to create external configuration store at ‘C:\ClusterStorage\HyperVStorage\VM01′: The trust relationship between this workstation and the primary domain failed.. (0x800706FD)

 

image

Bingo. It must be the fact that no domain controller is available. It’s completely self contained cluster and both domain controller virtual machines are highly available and reside on the CSV. Now the CSV does come on line without a DC since Windows Server 2012 so that’s not the issue. it’s the process of registering the VMs that fails without a DC in an Active Directory environment.

Getting passed this issue

There are multiple ways to resolve this and move ahead with our cluster migration. As the environment is still fully functional on the source cluster I just removed a DC virtual machine from high availability on the cluster. I shut it down and exported it. I than copied it over to the node of the new cluster  (we’re going to nuke the source host afterwards and install W2K12R2, so we moved it to the new host where it could stay) where I put it on local storage and imported it. For this is used the “Register the virtual machine in-place option”. I did not make it high available.

image

After verifying that we could ping the DC and it was up and running well we tried the final phase of the migration again. It went as smooth as we have come to expect!

Other options would have been to host the DC virtual machine on a laptop or other server. If you could no longer get to the the DC for export & import or heck even a shared nothing migration depending on your environment can help you out of this pickle. A restore from backup would also work. But here in that 2 node all in one cluster our approach was fast and efficient.

So there you go. Tip to remember. Virtualizing domain controllers is fully supported, no worries there but you need to make sure that if you have a dependency on a DC you don’t have the DC depending on that dependency. It’s chicken an egg thing.

Conferences On My Roadmap For 2014


Here’s a little roadmap of conferences on my radar screen for 2014. Some I can’t attend because of conflicts in my schedule and other priorities, but I list ‘m here for your consideration.

DELL Enterprise Forum

If you’re working with Dell technologies, either hardware or software this is for you. It very interactive and you get provide feedback to the product teams as well as briefings on what coming. Some are under NDA, some not.

image

TechEd North America 2014

I’m attending, everything has been arranged. So if you’re a blog reader/twitter follower give us a ping.

image

E2EVC 2014 Brussels

This is a non marketing event by experts in virtualization. So these people design, implement and support virtualization solutions for a living.  E2EVC Virtualization Conference is a non-commercial, it does not run a profit for the organizers or speakers. Everybody volunteers. The attendance fee covers the costs of the conference rooms, coffee breaks and such. The value is in the knowledge sharing and the networking.

image

See http://workinghardinit.wordpress.com/2014/02/03/e2evc-2014-brussels/

OSCON 2014

I love watching the OSCON presentations on line and it’s one of my never attended that on my must attend list. Whether that will happen this year remains to be seen.image

TechEd Europe 2014

It runs from October 27th to October 31st 2014 in Barcelona. I hope to meet you there and I hope they are ready at that time to talk about vNext Winking smile. It’s a great opportunity to network and talk shop with so many of my peers I’ll most definitely try to be there. Any vNext information would most certainly make it a not to miss event. They announced it a bit late and they already have lost some of the potential attendees to TechEd North America. Not their best move ever I must say.

image

E2EVC 2014 Barcelona

Details have yet to be announced. But if my schedule allows I will attend and present!

image

Dell World 2014

No details as of yet but for partners & customers this is a valuable opportunity to talk to the product teams, directors, marketing mangers and engage in some serious conversation about technology and where DELL fits in your road map.

image

Microsoft MVP Summit 2014

I will NOT miss the MVP Summit Smile. No details as it’s all NDA.

Hot Iron, Cold Steel & Cables Are Still Paramount In The Era Of The Cloud


Cloud, virtualized, on premise, hosted … the people in the field offices need to connect to them and as such hardware is not dead yet Winking smile. Commodities don’t mean obsolete or “in the cloud” only.

clip_image002

Some nice DELL PowerConnect 5548P switches. We’ve been using this line of switches (since the 53XX series) for many years now and with great success for in the datacenter (before we switched to 10Gbps) and campus/client access. They’ve never let us down at a price/value point that make the economies of using them to good to ignore.

Once in a while, we’re out in the field making sure the people can access their apps, services, servers in the cloud, the data center or at a hosting provider. Meaning we get to play with some hardware and we all enjoy that still Smile. Whilst at work at several sites I’m once again confronted with commodities being treated like specialties with the following results:

  • Overly expensive
  • Very little value & capabilities (under delivery)
  • Slow delivery
  • Churning

To avoid wasting you money or allowing it to be wasted you need to use common sense. If you use advisors get a consigliore, not a racketeer.

1Gbps to the desktop and get some extra ports

I’ve talked about getting affordable 10Gbps without compromising capabilities before so here I’ll look at the access/campus side of the story. I still find many organizations rolling out 100Mbps to the desktop for cost reasons and counting ports in orders of one. Two things to keep in mind. Buy 1Gbps and buy some extra.

Buying vast quantities of something you don’t use but does it power is not a good idea. But being a complete scrooge and not having some extra ports is ridiculous. I have seen many thousands of € wasted in meetings about 10 to 40 switch ports too few in new building projects that have > 5000 outlets. The only real saving I see in in electricity used, if that is a major concern where you are at. Organizations spend tens of thousands of  € discussing something that would be fixed by spending a few thousand which would give extra benefits on top. That’s churning people. Creating work and billable hours by overinflating issues & crying wolf to justify the expenditure that’s supposedly needed to stave of disaster.

On top of that when you do ask those architects to do some modern designs like SMB Direct  & DCB they freak out & repeat the above ritual. Chances are you’ll spend 20.000 to 30.00 euro on a 6 month study that says it can’t be done because of cost & the probability the sky will fall on your head, leaving you empty handed an poorer. You should have taken the money and just done it. Their scams defer responsibilities to untraceable entities, lines the pocket of consulting houses and, as no one is going to take responsibility to stop this madness, it just goes on forever whilst on paper everything is done by the book and compliancy to the rules is achieved.  Until the day some joker, frustrated at the lack of a few ports, attached a cheapo 8 ports switch to the outlet, creates a loop and brings down the buildings network affecting many thousands. Because the design didn’t handle that to well … been there, seen it.

I also disagree with the practice of dropping in 100Mbps unless you have really good reasons. Structural cabling is being put in at Cat6A specifications nowadays and CAT5E has been put I for many years. 1Gbps is not a luxury if you do lots of data transfers within an office and have image intensive needs (more and more that is all of us with video, images, all in high res). Google fiber is coming to residential homes … guess what that could mean to services that can be delivered … Heaven forbid you buy 100Mbps because those fancy overpriced VOIP phones only do 100Mbps & you can’t afford the replace them.

With QoS for VOIP and other use cases some extra bandwidth comes in handy as well for. Also don’t forget software installations & automated rollouts of desktops & laptops. Last but not least it helps deal with crappy network behavior of way to many software packets.

On the number of ports and the price per port. We buy the most minimal support on switches possible. They hardly ever die on you and if something goes bad it’s a port perhaps, and even that is rare. So don’t waste money on support contracts. Buy some extra ports. For one you need some wiggle room and you have spare capacity to deal with port or even switch failures. If you need 400 ports, by 10*48 port switches. You have spare capacity and can even afford to lose a switch. If one really fails you most have a “lifetime warranty”. You finance 1Gbps to the desktop by dumping support you won’t need, buying value commodity switches and avoiding the racketeers mentioned above. If you need a network engineer, hire one, a good one.

Than inevitably the cry comes: “you’ll saturate the uplinks”! Not a big issue for the small office (+/- 60 people) setup we did recently but what about a bit larger environments? Todays commodity switches had dual fiber uplink port,10Gbps capable, for a redundant lag. If you build a star design and not a cascade to a more capable core/top switch & you’re golden. It’s also great future proofing as we use access switches for a long time, over 7 years is not an exception, so give yourself some wiggle room.

Cost you say? Again, forgo the expensive market leaders and you’ll get better value for less money that get the job done very well. Cables, even OM3 fiber, is affordable compared to the labor, construction and maintenance of a  > 1000 employee building. Put in enough cabling to allow for 21st century network traffic and make sure working on it is easy. Good principles used at the wrong place in the wrong way are no good to anyone except for the ones making money of this scam.

Some Insights Into How Windows 2012 R2 Hyper-V Backups Work


How Windows Server 2012 R2 backups differ from Windows Server 2012 and earlier

You’ll remember our previous blog about an error when backing up a virtual machine on Windows Server 2012 R2, throwing this error:

Dealing With Event ID 10103 “The virtual machine ‘VM001′ cannot be hot backed up since it has no SCSI controllers attached. Please add one or more SCSI controllers to the virtual machine before performing a backup. (Virtual machine ID DCFE14D3-7E08-845F-9CEE-21E0605817DC)” In Windows Server 2012 R2

The fix was easy enough, adding a virtual SCSI controller to the virtual machine. But why does it need that now?

Well, this all has to do with the changed way Windows Server 2012 R2 backups work. Before Windows Server 20012 R2 the VSS provider created a VSS snapshot inside the guest virtual machine. That snapshot was exposed to the host, to create a volume snapshot for backup purposes. Right after the volume snapshot has been taken this VSS snapshot inside the guest virtual machine needed to be reverted. The backups then run against that volume snapshot and is consistent thanks to both host & guest VSS capabilities.

For an overview of VSS based backup process in general take a peak at Overview of Processing a Backup Under VSS

Now it is the “Hyper-V Integration Services Shadow Copy Provider” that is being used. When the the host initiates a volume snapshot (Microsoft or hardware VSS provider) the host VSS writer goes in to freeze. This process leverages the Hyper-V Integration Services Shadow Copy Provider  to create the virtual machine checkpoint. After that the volume/LUN/CSV snapshot is taken. When that is done the host VSS writes goes into thaw and the virtual machine checkpoint is deleted. After that the backup runs against the Volume snapshot and at the end that is also deleted. You can follow this process quite nicely in the GUI of your Hyper-V host, you SAN (if you use a Hardware VSS provider).

Dear storage vendors: a great, reliable, fast VSS Hardware Provider is paramount to success in a Microsoft environment. You need to get this absolutely right and out of the door before spending any more time and money on achieving yet more IOPS. Keep scalability in mind when doing this.

Dear backup software vendors: think about the scalability when designing your products. If we have 200 or 500 or a thousand VMs … can we leverage CSV based backups to protect every VM on the LUN or do we need to snap the LUN for every VM backed up? Choice there is good for both data protection schemes and scalability.

At this stage the hardware VSS snapshot is being taken …

image_thumb3

Contrary to common belief this means that the backup will indeed application consistent to the time of the checkpoint as the CSV snapshot being taken is of a consistent checkpoint. It’s the delta in the active avhdx that is only crash consistent, like any running VM by the way. Now pay attention to the screenshot below. The two red arrows are indicating to ntfs source events, two volumes seem to be exposed to the next free drive letters. E: and F: here as C: is the virtual machine OS and D: the DVD.

image_thumb5

Look at the detail. Indeed two. Well it the previous screenshot we only saw one in the CSV path but there are two avhdx files indeed.

image_thumb[1]

Exposing a snapshot on the SAN to a server actually shows us this much better … look here at the avhdx with the GUID and one with “AutoRecovery” in the name. So that makes for two nfts events … and as the backup needs to do this life it requires a vSCSI controller to be present in the virtual machine … and vIDE controller can’t do this.

image_thumb[3]

Anyway, enough under the hood detective work for now, In VEEAM that stage looks like this:

image_thumb7

And on the Compellent it looks like this. The screenshots are from different backups at different times so don’t get confused about the time stamps here. It’s just as illustration of what you can expect to see.

image_thumb12

Now when the CSV snapshot has been taken the virtual machine checkpoint is removed. At that time the backup runs against the CSV snapshot. In our case (hardware VSS provider) this is a snapshot on the SAN that gets exposed in a view and mapped to the off host backup proxy VEEAM server. On the DELL Compellent it looks like this.

image_thumb16

This takes a while to o…but after a while the backup will kick off. Do not that the checkpoint has merged and is no longer visible at this time.

image_thumb18

Once the backup is complete, the mapping is removed, the view deleted and the snapshot expired. So your SAN is left as the backup found it.

There you go. I hope this helped clarify certain things on how Hyper-V guest backups work in Windows 2012 R2. So your backups are still application consistent, just not when you’re running Linux or DOS or NT4.0 as there is no support / VSS for that. However they are based on a  consistent virtual machine snapshot which explains why Hyper-V backups can protect Linux guests very adequately!

Dealing With Event ID 10103 “The virtual machine ‘VM001′ cannot be hot backed up since it has no SCSI controllers attached. Please add one or more SCSI controllers to the virtual machine before performing a backup. (Virtual machine ID DCFE14D3-7E08-845F-9CEE-21E0605817DC)” In Windows Server 2012 R2


I was doing backups of a Windows 2012 R2 Hype-V cluster recently and it runs only Windows Server 2012 R2 virtual machines. It’s a small but very modern and up to date cluster Smile.

Using VEEAM as backup software I have high expectations and VEEAM did deliver. All went well except for one virtual machine.

image

VEEAM states "Processing Error. Guest processing skipped (check guest OS VSS state and integration components version)". Well all  virtual machines  are W2K12R2 as are the cluster host and all IC components are up to date and backup (volume checkpoint) is enabled.

image

I dove into the Hyper-V log and sure enough I found following event:

The virtual machine ‘VM001′ cannot be hot backed up since it has no SCSI controllers attached. Please add one or more SCSI controllers to the virtual machine before performing a backup. (Virtual machine ID DCFE14D3-7E08-845F-9CEE-21E0605817DC).

As it turns out in in Windows Server 2012 R2 the VM requires a SCSI controller for the backup to function. It doesn’t need to have any storage attached. It just needs one to be there (default). So the fix is easy, just add one.

image

image

Click “Apply” and “OK”. You can now start the virtual machine and that’s it. Once we fixed that it was a squeaky clean backup run.

But why does it need to be there?

Well when we monitor the event logs inside a virtual machine we are backing up we see that during the backup process, very briefly a VHDX get’s mounted inside the guest.

image

To answer this question we need to dive into how Windows Server 2012 R2 backups work as that is different from how it used to be. You can read about that over here when it’s published.

Hyper V Amigos Showcast Episode 1: who we are


The very first episode of the Hyper-V Amigos Showcast has gone live. We’ll try to discuss & showcase Hyper-V and related technologies for benefit of all mankind.

In this first episode we introduce ourselves and talk a bit about on the state of our industry and how that relates to our jobs.

Carsten Rachfahl & Didier Van Hoye introduce themselves

Enjoy!

RDMA Over RoCE With DCB Requires Tagged Non Default VLANs


It’s DCB That Requires This

For those of you who are experimenting with the RoCE variant of RDMA for SMB Direct in Windows Server 2012 (R2), make sure you have a VLAN tag in your configuration if this is more than a simple RDMA over two NICs. The moment you get DBC with PFC & ETS involved you’ll need non default tagged VLANs. Do note that PFC alone is good enough, ETS is strictly speaking not a requirement, but I’d consider doing it if you can.

With Enhanced Transmission Selection (ETS) the network traffic type is classified using the priority value in the VLAN tag of the Ethernet frame. The priority value is the Priority Code Point (PCP), which is described in the IEEE 802.1Q specification and uses a 3-bit field in the VLAN tag with eight possible priority values (0 to 7).

Priority-based Flow Control (PFC) allows to individually pause priorities of tagged traffic and helps to provide lossless or “no drop” behavior for a certain priority at the receiving port. As  above, each frame transmitted by a sending port is tagged with a priority value (0 to 7) in the VLAN tag. So for the traffic pause and resume functionality to work we need a VLAN tag to carry the priority value.

Does It Work Without?

But you’ll tell me that, as you may be lacking a DCB capable switch for lab purposes, you used a direct cable between your two RoCE NICs. And guess what RoCE, might have indeed worked for you without a VLAN tag. You can test & get a feel for what RoCE/RDMA can do for you with just the NICs. But as there is no switch involved you’re not using DCB for PFC/ETS and without that the need for the tagged VLAN isn’t there. Also see http://workinghardinit.wordpress.com/2013/05/03/smb-direct-roce-does-not-work-without-dcbpfc/.

So there you go. Design your RoCE/RDMA network based on DCB with PFC( and ETS) and not just on the tests with an direct cable or you might miss a few details that are quite important. Happy testing!