Fixing Hiccups in The SCVMM2008R2 GUI & Database


As you might very well know by experience sometimes the System Center Virtual Machine Manager GUI and database get out of sync with reality about what’s going on for real on the cluster. I’ve blogged about this before in SCVMM 2008 R2 Phantom VM guests after Blue Screen and in System Center Virtual Machine Manager 2008 R2 Error 12711 & The cluster group could not be found (0×1395)

The Issue

Recently I had to trouble shoot the “Missing” status of some virtual machines on a Hyper-V cluster in SCVMM2008R2. Rebooting the hosts, guests, restarting agents, … none of the usual tricks for this behavior seemed to do the trick. The SCVMM2008R2 installation was also fully up to date with service packs & patches so there the issue dot originate.

Repair was greyed out and was no use. We could have removed the host from SCVMM en add it again. That resets the database entries for that host en can help fix the issues but still is not guaranteed to work and you don’t learn what the root cause or solution is. But none of our usual tricks worked.We could have deleted the VMs from the database as in  but we didn’t have duplicates. Sure, this doesn’t delete any files or VM so it should show up again afterwards but why risk it not showing up again and having to go through fixing that.

The Cause

The VMs were in a “Missing” state after an attempted live migration during a manual patching cycle where the host was restarted the before the “start maintenance mode” had completed. A couple of those VMs where also Live Migrated at the same time with the Failover Cluster GUI. A bit of confusion al around so to speak nut luckily all VMs are fully operational an servicing applications & users so no crisis there.

The Fix

DISCLAIMER

I’m not telling you to use this method to fix this issue but you can at your own risk. As always please make sure you have good and verified backups of anything that’s of value to you Smile

We hade to investigate. The good news was that all VMs are up an running, there is no downtime at the moment and the cluster seems perfectly happy Smile.

But there we see the first clue. The Virtual machines on the cluster are not running on the node SCVMM thinks they are running, hence the “Missing” status.

First of all let’s find out what host the VM is really running on in the cluster and see what SCVMM thinks on what host the VM  is running. We run this little query against the VMM database. That gives us all hosts known to SCVMM.

SELECT [HostID],[ComputerName] FROM [VMM].[dbo].[tbl_ADHC_Host]

HostID                                                                        ComputerName

559D0C84-59C3-4A0A-8446-3A6C43ABF618          node1.test.lab

540C2477-00C3-4388-9F1B-31DBADAD1D8C        node2.test.lab

40B109A2-9E6B-47BC-8FB5-748688BFC0DF         node3.test.lab

C2DA03CE-011D-45E3-A389-200A3E3ED62E        node4.test.lab

6FA4ABBA-6599-4C7A-B632-80449DB3C54C         node5.test.lab

C0CF479F-F742-4851-B340-ED33C25E2013          node6.test.lab

D2639875-603F-4F49-B498-F7183444120A             node7.test.lab

CE119AAC-CF7E-4207-BE0B-03AAE0371165         node8.test.lab

AB07E1C2-B123-4AF5-922B-82F77C5885A2           node9.test.lab

(9 row(s) affected)

Voila en now the fun starts. SCVMM GUI tells us “MissingVM” is missing on node4.

We check this in the database to confirm:

SELECT Name, ObjectState, HostId
FROM VMM.dbo.tbl_WLC_VObject
WHERE Name = 'MissingVM'
GO

Which is indeed node4

Name                                                                                                                                                                                                                                                             ObjectState HostId

———  —  ————————————

node4  220  C2DA03CE-011D-45E3-A389-200A3E3ED62E

(1 row(s) affected)


In SCVMM we see that the moving of the VM failed. Between node 4 and node 6.

image

Now let’s take a look at what the cluster thinks … yes there it is running happily on node 6 and not on node 4. There’s the mismatch causing the issue.

So we need to fix this. We can Live Migrate the VM with the Failover Cluster GUI to the node SCVMM thinks the VM still resides on and see if that fixes it. If it does, great! You have to give SCVMM some time to detect all things and update its records.

But what to do if it doesn’t work out?  We can get the HostId from the node where the VM is really running in the cluster, which we can see in the Failover Cluster GUI, from the query we ran above and than update the record:

UPDATE VMM.dbo.tbl_WLC_VObject
SET HostId  = 'C0CF479F-F742-4851-B340-ED33C25E2013'
WHERE Name = 'MissingVM'
GO

We then reset the ObjectState to 0 to get rid of the Missing status. It would do this automatically but it takes a while.

UPDATE VMM.dbo.tbl_WLC_VObject
SET ObjectState = '0'
WHERE Name = 'MissingVM'
GO

After some patience & Refreshing all is well again and test with live migrations proves that all works again.

As I said before people get creative in how to achieve things due to inconsistencies, differences in functionality between Hyper-V Manager, Failover Cluster Manager and SCVMM 2008R2 can lead to some confusing situations. I’m happy to see that in Windows 8 the action you should perform using the Failover Cluster GUI or PowerShell are blocked in Hyper-V Manager. But SCVMM really needs a “reset” button that makes it check & validate that what it thinks is reality.

System Center Virtual Machine Manager 2008 R2 Error 12711 & The cluster group could not be found (0×1395)


The Issues

I recently had to go and fix some issues with a couple of virtual machines in SCVMM 2008 R2. There was one that failed to live migrate with following error:

Error (12711)
VMM cannot complete the WMI operation on server HopelessVm.test.lab because of
error: [MSCluster_ResourceGroup.Name=" df43bf60-7216-47ed-9560-7561d24c7dc8"] The cluster group could not be found.

(The cluster group could not be found (0×1395))
 
Recommended Action
Resolve the issue and then try the operation again

Other than that it looked fine and could be managed with SCVMM 2008 R2. Another one was totally wrecked it seemed. It was in a failed state after an attempted live migration. You couldn’t do anything with it anymore. Repair was “available” but every option there failed so basically that was the end of the game with that VM. Both issues can be resolved with the approach I’ll describe below.

The Cause

After some investigation the cause of this was the fact that this virtual machine had been removed from the failover cluster as a resource was exported & imported using Hyper-V manager on one of the cluster nodes. It was then added back to the failover cluster again to make them high available. All this was done without removing it from SCVMM 2008 R2. By the way, as mentioned above in “The Issues” this can get even worse than just failing live migrations. The same scenario can lead to virtual machines going into a failed state that you can’t repair (retry or undo fail) or ignore and basically you’re stuck at that point. You can’t even stop, start, shutdown the virtual machine anymore, not one single operation works in SCVMM while in the failover cluster GUI and in hyper-v manager everything is fully operational. This is important to note, as the services are fully on line and functional. It’s just in SCVMM that you’re in trouble.

Why did they do it this way? They did it to move the VM to a new CSV. The fact that you delete the VM files when deleting a VM with SCVmm2008R2 made them use Hyper-V manager instead. Now this approach (whatever you think of it) can work but then you need to delete the VM in SCVMM2008R2 after exporting the virtual machine AND before proceeding with the import and making the virtual machine highly available.

People get creative in how to achieve things due to inconsistencies, differences in functionality between Hyper-V Manger and SCVMM 2008R2 (in the latter especially the lack of complete control over naming, files & folders, export/migration behavior) as well as the needs of the failover cluster can lead to some confusing scenarios.

The Supported Fix

Now the easy way to fix this is to export the virtual machine again and delete it in SCVMM 2008 R2. That will remove the virtual machine object from SCVMM, the failover cluster en Virtual Machine Manager. However this virtual machine was so large (50Gb + 750 GB data disk) that there was no room for an export to be made. Secondly an export of such a large VM takes a considerable time and it has to be off line for this operation. This is annoying as SCVMM might be uncooperative at this point, the virtual machine is online en performing it’s duties for the business. So this presented us with a bit of a problem. Stopping the virtual machine, Exporting it using Hyper-V Manager will cause it to go missing in SCVMM 2012 and then you can delete it, importing the virtual machine again and adding it to the failover cluster causes down time.

The Root Cause

Why does this happen? Well when you import a virtual machine into a failover cluster is creates a new unique ID for the virtual machine Resource Group . This happens always. Choosing to reuse an existing ID during import in Hyper-V Manager has nothing to do with this. But VMM uses ID/names to identify a VM, independent of the cluster. So when you did not remove the VM from SCVMM before adding the VM back to the cluster you get a different cluster group ID in the cluster than you have in SCVMM. They both have the same name but there is a disconnect leading to the issues described above.

By the way exporting & importing a VM without first removing the virtual machine from the failover cluster leads to some issues in the Failover cluster so don’t do that either Smile

The “No Down Time” Fix

This is not the first time we need to dive in to the SCVMM database to fix issues. One of my main beef about SCVMM other than inconsistency with the other tools and its lack of control & options in some scenarios is the fact that it doesn’t have enough self-maintenance intelligence & functionality. This leads to the workaround above which are slow and rather annoying or consist of messing around in the SCVMM database, which isn’t exactly supported. Mind you Microsoft has published some T-SQL to clean up such issues themselves. See You cannot delete a missing VM in SCVMM 2008 or in SCVMM 2008 R2 and RemoveMissingVMs. See also my blog SCVMM 2008 R2 Phantom VM guests after Blue Screen post on this subject.

The usual tricks of the trade like refreshing the virtual machine configuration in the failover cluster GUI don’t work here. Neither does the solution to this error described Migrating a System Center Virtual Machine Manager 2008 VM from one cluster to another fails with error 12711. The error is the same but not the cause.

# Add the VMM cmdlets
Add-PSSnapin microsoft.systemcenter.virtualmachinemanager

# Connect to the VMM server
Get-VMMServer –ComputerName MySCVMMServer.test.lab

# Grab the problematic VM and put it into the object $vm
$vm = Get-VM –name “HopelessVM”

#Force a refresh
refresh-vm -force  $vm

In the end we have to fix the mismatch between the VMResourceGroupID in failover cluster and SCVMM by editing the database.

First you navigate to the registry key HKEY_LOCAL_MACHINE\Cluster\Groups\ on one the cluster nodes, do a find for the problematic VM’s name and grab the name of its key, this is the VMResourceGroupID the cluster knows and works with? So now we have the correct VMResourceGroupID: 0f8cabe4-f773-4ae4-b431-ada5a3c9926c

clip_image002

Now you connect to the SCVMM database and run following query to find the VMResourceGroupID that SCVMM thinks that VM has and that it uses causing the issues

SELECT  VMResourceGroupID  FROM tbl_WLC_VMInstance WHERE ComputerName = 'hopelessVM.test.lab'
GO 

The results:

VMResourceGroupID

————————————————–

df43bf60-7216-47ed-9560-7561d24c7dc8

(1 row(s) affected)

The trick than is to simply update that value to the one you just got from the registry by running:

UPDATE tbl_WLC_VMInstance SET VMResourceGroupID = '0f8cabe4-f773-4ae4-b431-ada5a3c9926c' WHERE VMResourceGroupID = 'df43bf60-7216-47ed-9560-7561d24c7dc8'
GO 

Than you need some patience & refresh the GUI a few times. Things will turn back to normal, but in between you might seem some “missing” statuses appear for your problematic VM. These go away fast however. If not you can always use the Microsoft provided script to remove missing VM’s as mentioned above in RemoveMissingVMs.

Warning

What I described above is something you can do to fix these issues fast and effectively when needed. But I’m not telling you this is the way to go, let alone that this is supported. Make sure you have backups of your VMs, Hosts, SCVMM database etc. It only takes one mistake or misinterpretation to royally shoot yourself in your foot Winking smile. It hurts like hell; recovery is long and seldom complete. On top of that it might generate a vacancy in your company whilst you’re escorted out of the building. Be careful out there.

Assigning Large Memory To Virtual Machine Fails: Event ID 3320 & 3050


We had a kind reminder recently that we shouldn’t forget to complete all steps in a Hyper-V cluster node upgrade process. The proof of a plan lies in the execution Smile. We needed to configure a virtual machine with a whooping 50GB of memory for an experiment. No sweat, we have plenty of memory in those new cluster nodes. But when trying to do so it failed with a rather obscure error in System Center Virtual Machine Manager 2008 R2

Error (12711)

VMM cannot complete the WMI operation on server hypervhost01.lab.test because of error: [MSCluster_Resource.Name="Virtual Machine MYSERVER"] The group or resource is not in the correct state to perform the requested operation.

(The group or resource is not in the correct state to perform the requested operation (0x139F))

Recommended Action

Resolve the issue and then try the operation again.

image

One option we considered was that SCVMM2008R2 didn’t want to assign that much memory as one of the old host was still a member of the cluster and “only” has 48GB of RAM. But nothing that advanced was going on here. Looking at the logs found the culprit pretty fast: lack of disk space.

We saw following errors in the Microsoft-Windows-Hyper-V-Worker-Admin event log:

Log Name:      Microsoft-Windows-Hyper-V-Worker-Admin
Source:        Microsoft-Windows-Hyper-V-Worker
Date:          17/08/2011 10:30:36
Event ID:      3050
Task Category: None
Level:         Error
Keywords:     
User:          NETWORK SERVICE
Computer:      hypervhost01.lab.test
Description:
‘MYSERVER’ could not initialize memory: There is not enough space on the disk. (0x80070070). (Virtual machine ID DEDEFFD1-7A32-4654-835D-ACE32EEB60EE)

Log Name:      Microsoft-Windows-Hyper-V-Worker-Admin
Source:        Microsoft-Windows-Hyper-V-Worker
Date:          17/08/2011 10:30:36
Event ID:      3320
Task Category: None
Level:         Error
Keywords:     
User:          NETWORK SERVICE
Computer:      hypervhost01.lab.test
Description:
‘MYSERVER’ failed to create memory contents file ‘C:\ClusterStorage\Volume1\MYSERVER\Virtual Machines\DEDEFFD1-7A32-4654-835D-ACE32EEB60EE\DEDEFFD1-7A32-4654-835D-ACE32EEB60EE.bin’ of size 50003 MB. (Virtual machine ID DEDEFFD1-7A32-4654-835D-ACE32EEB60EE)

Sure enough a smaller amount of memory, 40GB, less than the remaining disk space on the CSV did work. That made me remember we still needed to expand the LUNS on the SAN to provide for the storage space to store the large BIN files associated with these kinds of large memory configurations. Can you say "luxury problems"? The BIN file contains the memory of a virtual machine or snapshot that is in a saved state. Now you need to know that the BIN file actually requires the same disk space as the amount of physical memory assigned to a virtual machine. That means it can require a lot of room. Under "normal" conditions these don’t get this big and we provide a reasonable buffer of free space on the LUNS anyway for performance reasons, growth etc. But this was a bit more than that buffer could master.

As it was stated in the planning that we needed to expand the LUNS a bit to be able to deal with this kind of memory hogs this meant that the storage to do so was available and the LUN wasn’t maxed out yet. If not, we would have been in a bit of a pickle.

So there you go a real life example of what Aidan Finn warns about when using dynamic memory. Also see KB 2504962 “Dynamic Memory allocation in a Virtual Machine does not change although there is available memory on the host” which discusses the scenario where dynamic memory allocation seems not to work due to lack of disk space. Don’t forget about your disk space requirements for the bin files when using virtual machines with this much memory assigned. They tend to consume considerable chunks of your storage space. And even if you don’t forget about it in your planning, please don’t forget the execute every step of the plan Winking smile

Hyper-V Cluster Nodes Upgrade: Zero Down Time With Intel VT FlexMigration


Well the oldest Hyper-V cluster nodes are 3 + years old. They’ve been running Hyper-V clusters since RTM of Hyper-V for Windows 2008 RTM. Yes you needed to update the “beta” versions to the RTM version of Hyper-V that came later Smile Bit of a messy decision back then but all in all that experience was painless.

These nodes/clusters have been upgraded to W2KR2 Hyper-V clusters very soon after that SKU went RTM but now they have reached the end of their “Tier 1” production life. The need for more capacity (CPU, memory) was felt. Scaling out was not really an option. The cost of fiber channel cards is big enough but fiber channel switch ports need activation licenses and the cost for those border on legalized extortion.

So upgrading to more capable nodes was the standing order. Those nodes became DELL R810 servers. The entire node upgrade process itself is actually quite easy. You just live migrate the virtual machines over to clear a host that you then evict from the cluster. You recuperate the fiber channel HBAs to use in the new node that you than add to the cluster. You just rinse and repeat until you’re done with all nodes. Thank you Microsoft for the easy clustering experience in Windows 2008 (R2)! Those nodes now also have 10Gbps networking kit to work with (Intel X520 DA SPF+).

If you do your home work this process works very well. The cool thing there is not much to do on the SAN/HBA/Fiber Switch configuration side as you recuperate the HBA with their World Wide Names. You just need to updates some names/descriptions to represent the new nodes. The only thing to note is that the cluster validation wizard nags about inconsistencies in node configuration, service packs. That’s because the new nodes are installed with SP1 integrated as opposes to the original ones having been upgraded to SP1 etc.

The beauty is that by sticking to Intel CPUs we could live migrate the virtual machines between nodes having Intel E5430 2.66Ghz CPUs (5400-series "Harpertown") and those having the new X7560 2.27Ghz CPUs (Nehalem EX “Beckton”). There was no need to use the “Allow migration to a virtual machine with a different processor” option.  Intel’s investment (and ours) in VT FlexMigration is paying of as we had a zero down time upgrade process thanks to this.

image

You can read more about Intel VT FlexMigration here

And in case you’re wondering. Those PE2950 III are getting a second life. Believe it or not there are software vendors that don’t have application live cycle management, Virtualization support or roadmaps to support. So some hardware comes in handy to transplant those servers when needed. Yes it’s 2011 and we’re still dealing with that crap in the cloud era. I do hope the vendors of those application get the message or management cuts the rope and lets them fall.

System Center Virtual Machine Manager 2008 R2 SP1 Upgrade Walkthrough


Some people downloading the System Center Virtual Machine Manager 2008 R2 SP1 seem to be confused that it is the entire product ISO. It’s a big download but the upgrade itself, when you have a healthy environment is fast and easy. To my knowledge there is no SP1 upgrade file only, you get one package for all needs.  I’ve provided a screenshot walk trough of the process below and it really only takes a couple of minutes on the servers deployed it on. There is both an evaluation version available or a licensed version via the licensing site or the TechNet subscribers downloads.

Do note that the below process is for those who are upgrading from  System Center Virtual Machine Manager 2008 R2  to System Center Virtual Machine Manager 2008 R2 SP1. If you have the RC installed take a look at following blog post by  Maarten Wijsman to see how to upgrade the SQL database used by SCVMM2008R2 SP1 Release Candidate with the UpgradeVMMR2SP1RC.exe tool. The download is here at the Microsoft Connect site  (Live ID).

Run the setup.exe and click setup VMM Server or any other component you need to upgrade. If you click VMM server it will detect other components as well.

1

 

The installation files are extracted …

3

 

Accept the license agreement and click next

2

 

As you can see it detected that I’m also running the Virtual Machine Manager Administration Console. Click on ‘Upgrade” to continue.

5

 

If the account you’re using doesn’t have the needed SQL Server permissions you can provide alternate credentials that do have those. Click “Next” to continue.

6

 

It will then upgrade all detected components one by one ….

7

 

11

… until you reach the Completion form. That’s it you’re done.

 

12

 

You have to go through this process for all servers where you have Virtual Machine Manager components installed to complete the entire upgrade. When you have you can now configure Dynamic Memory from your SCVNN administrator Console.  Nice Smile

image

System Center Virtual Machine Manager 2008 R2 SP1 & 2012 Béta Available


Good news, today March 22nd 2011 System Center Virtual Machine Manager 2008 R2 SP1 went RTM. I’ll update this short post with the download link when it becomes available ==> UPDATE: download it here from Microsoft (Evaluation) or from your TechNet subscription or licensing site . Seems like we got  all host & guest updates to W2K8R2SP1 done exactly on time to get this one installed and have a state of the art  up to date infrastructure Smile. Today the 2012 Beta version became available for download (here) and the documentation site went life (here). Things are moving in the system center space. Busy times ahead! Yet 2 more  VMs to test with I the lab … and than Denali is coming Open-mouthed smile.

New Functionality in Virtual Machine Manager Self Service Portal 2.0 SP1 Béta


It’s play time! Via the Microsoft Connect site (you can get an invite to join the Connect site from TechNet page What’s New in System Center Virtual Machine Manager Self Service Portal 2.0 SP1 Beta, it below under “Join the Beta”) you can get your hands on the Virtual Machine Manager Self Service Portal 2.0 SP1 Béta. Some new features to highlight are the ability to import virtual machines (VMs that were removed from the self service portal or VMs that are managed by SCVMM but are not created & available in the self service portal). There are notifications available now for events using SQL mail so you can keep an eye on what is happening. Virtual Machine Templates can now be added to infrastructures (fast, no need for the entire request/provision process). You also het the option to move infrastructure between business units while in maintenance mode and even delete business units when they don’t own infrastructure. But the one I like best is the fact that we can expire virtual machines.

No the last one doesn’t mean I want to be the bastards operator from hell that hates everyone and is so out of touch with the reality of “enabled” or “empowered” users or customers that he wants to seek & destroy them all. But when you’ve been in infrastructure for a while you’ve probably come across situations where the orphaned, abandoned, forgotten, virtual & yes even physical servers are working very hard on becoming a majority instead of an exception. This puts an enormous burden on the infrastructure, workload and it drives up cost very fast while it’s isn’t providing any ROI or other benefits to the business. Unless you’re getting paid to maintain an infrastructure by the VM  (congrats!) and you just smile when you find 50% superfluous guests Smile as this means the sound of your cash register ringing in your ears.

The thing is “Horror Vacui” comes into play and the inevitable desire of the universe for maximum entropy. So any environment will need some managing and it will be a welcome tool to help automate that management & enforce some decisions. You can even delegate this all via roles, so people can be empowered to set or change expiration dates. That way you can try to go for self regulation. This can work Winking smile

Upgrading a Hyper-V R2 Cluster to Windows 2008 R2 SP1


For all you people waiting to roll out Windows 2008 R2 SP1 to your Hyper-V cluster here’s a quick screenshot rich run through of the process. Some people claim you need to shut down the cluster services and shut down the guests but this is not the case.  You can do a rolling upgrade and your guests can stay on line on the other nodes, just use live migration to get them there. Now I do recommend to upgrade all the nodes tot SP1 as soon as possible and not staying a mixed Windows 2008 R2 / Windows 2008 R2 SP1 situation in your cluster. But this mixed situation makes upgrades of the nodes in the cluster possible without any down time for the guests (if you have live migration), which is the aim of having a high availability cluster.

Walk Through

Live migrate all the guests from the node you wish to upgrade to SP1. Make sure the host is fully patched and disable any antivirus services if you are running any. I always reboot the node before a major upgrade to make sure we have the server in a clean state with  no lingering reboots waiting  or processes can cause issues.

Navigate to the service pack 1 file for Windows 2008 R2, it’s called windows6.1-KB976932-X64.exe and start it up:

SP1

 

You’ll have to accept the license terms:

SP1-2

 

And then the preparation process starts:

SP1-3

 

It is now ready to start the upgrade and yes we want it to reboot automatically when needed:

SP1-4

The upgrade process takes a while (about 17 minutes on my lab servers):

SP1-6SP1-6(2)SP1-6(3)

 

When it’s done it will reboot and bring you back to the logon screen. Multiple reboots might be needed to complete the upgrade process depending on what’s running on your server. In this case we are dealing with dedicated Hyper-V nodes.

View when connected to the console

image

View when connected via RDP

image

 

After logging on you are greeted with this window:

SP1-7

 

And yes this is indeed the case

SP1-8

Reboot included the entire process took about 22 to 23 minutes. In the setup event log you’ll find these messages:

  • Initiating changes for package KB976932. Current state is Absent. Target state is Installed. Client id: SP Coordinater Engine.
  • Package KB976932 was successfully changed to the Installed state.

Note: if an extra reboot is required you’ll see an extra entry in between these stating: A reboot is necessary before package KB976932 can be changed to the Installed state.

When you have a cluster with nodes running both W2K8R2 TM and W2K8R2 SP1, mixed situation so to speak,  you’ll see the following notification in the cluster events:

SP1-9

 

You can live migrate the guest from the next node to the node already upgraded to SP1 and than repeat the process. You keep doing this until all your nodes are upgraded.

SP1-10

As a final recommendation I would suggest waiting until you get the SCVMM2008R2 SP1 bits is you use this product before you upgrade you clusters especially when using this with SCOM2007R2 PRO Tips. Otherwise you don’t need to wait just realize that until you have SP1 for SCVMM2008 R2 you won’t be able to use the new functionality for Hyper-V. In production I would not recommend using the RC1 for this.

Please do not forget to update your guests with the new SP1 version of the Hyper-V Integration Components. This is needed to be able to use the new features like Dynamic Memory & Remote FX. The Windows 2008 R2 RTM version of the Integration Components is  6.1.7600.16385:

image

 

You can do this using Hyper-V Manager through selecting “Insert Integration Services Setup Disk”  and running the setup, this will require a reboot.

image

 

Click to start the upgrade process:

image

 

It will ask to remove the previous version:

image

 

Work in progress:

image

 

Done and asking for a reboot:

image

 

SCVMM2008R2 can also be used, here you shut down the guest before updating the virtual guest services as it’s called in SCVMM2008R2. It can be annoying that the nomenclature differs. The good thing here is that you can upgrade multiple guest using VMM2008R2. Hans Vredevoort did a blog post on this here: http://www.hyper-v.nu/blogs/hans/?tag=windows-server-2008-r2-sp1.  After the upgrade you can see that the version of the Integration Components  for Windows 2088 R2 SP1 is  6.1.7601.17514:

image

SCVMM 2008 R2 Phantom VM guests after Blue Screen


UPDATE: Microsoft posted an SQL Clean Up script to deal with this issue. Not exactly a fix and let’s hope it gets integrated into SCVMM vNext :-) Look at the script here http://blogs.technet.com/b/m2/archive/2010/04/16/removing-missing-vms-from-the-vmm-administrator-console.aspx. There is a link to this and another related blog post in the newsgroup link at the bottom of this article as well.

I’ve seen an annoying hick up in SCVMM 2008 R2 (November 2009) in combination with Hyper-V R2 Live migration two times now. In both cases a Blue Screen (due to the “Nehalem” bug http://support.microsoft.com/kb/975530) was the cause of this. Basically when a node in the Hyper-V cluster blue screens you can end up with some (never seen all) VM’s on that node being is a failed/missing state. The VM’s however did fail over to another node and are actually running happily. They will even fail back to the original node without an issue. So, as a matter of fact, all things are up and running. Basically you have a running VM and a phantom one. There are just multiple entries in different states for the same VM. Refreshing SCVMM doesn’t help and a repair of the VM is not working.

While it isn’t a show stopper, it is very annoying and confusing to see VM guest in a missing state, especially since it the VM is actually up and running. You’re just seeing a phantom entry. However be careful when deleting the phantom VM as you’ll throw away the running VM as well they point to the same files. 

Removing the failed/Orphaned VM in SCVMM is a no go when you use shared storage like for example CSV as it points to the same files as the running one and it is visible to both the good VM node and the phantom one. Meaning it will ruin your good VM as well.

Snooping around in the SCVMM database tables revealed multiple VM’s with the same name but with separate GUIDS. In production it’s really a NO GO to mess around with the records. Not even as a last resort because we don’t know enough about the database scheme and dependencies. So I have found two workarounds that do work (used ‘m both).

  1. Export the good VM for save keeping, delete the missing/orphaned VM entry in SCVMM (one taking the good one with it if you didn’t export it) and import the exported VM again. This means down time for the VM guest. 
  2. Remove the Hyper-V cluster from VMM and re add it. This has the benefit that it creates no down time for the good VM and that the bad/orphaned one is gone. 

Searching the net didn’t reveal much info but I did find this thread that discusses the issue as well http://social.technet.microsoft.com/Forums/en-US/virtualmachinemanager/thread/1ea739ec-306c-4036-9a5d-ecce22a7ab85 and this one http://social.technet.microsoft.com/Forums/en/virtualmachinemgrclustering/thread/a3b7a8d0-28dd-406a-8ccb-cf0cd613f666

I’ve also contacted some Hyper-V people about this but it’s a rare and not well-known issue. I’ll post more on this when I find out.