Hyper-V Cluster Node Pause & Drain fails – Live Migrations fail with “The requested operation cannot be completed because a resource has locked status”


One night I was doing some maintenance on a Hyper-V cluster and I wanted to Pause and drain one of the nodes that was up next for some tender loving care. But I was greeted by some messages:

image

[Window Title]
Resource Status

[Main Instruction]
The requested operation cannot be completed because a resource has locked status.

[Content]
The requested operation cannot be completed because a resource has locked status.

[OK]

Strange, the cluster is up and running, none of the other nodes had issues and operational wise all VMs are happy as can be. So what’s up? Not to much in the error logs except for this one related to a backup. Aha …We fire up disk part and see some extra LUNs mounted + using “vssadmin list writers“ we find:

clip_image002

 

 

Writer name: ‘Microsoft Hyper-V VSS Writer’
…Writer Id: {66841cd4-6ded-4f4b-8f17-fd23f8ddc3de}
…Writer Instance Id: {2fa6f9ba-b613-4740-9bf3-e01eb4320a01}
…State: [5] Waiting for completion
…Last error: Unexpected error

Bingo! Hello old “friend”, I know you! The Microsoft Hyper-V VSS Writer goes into an error state during the making of hardware snapshots of the LUNs due to almost or completely full partitions inside the virtual machines. Take a look at this blog post on what causes this and how to fix fit. As a result we can’t do live migrations anymore or Pause/Drain the node on which the hardware snapshots are being taken.

And yes, after fixing the disk space issue on the VM (a SDT who’s pumped the VM disks 99.999% full) the Hyper-V VSS writer get’s out of the error state and the hardware provider can do it’s thing. After the snapshots had completed everything was fine and I could continue with my maintenance.

PowerShell: Monitoring DrainStatus of a Hyper-V Host & The Time Limited Value of Information In Beta & RC Era Blogs


I was writing some small PowerShell scripts to kick pause and resume Hyper-V cluster hosts and I wanted to monitor the progress of draining the virtual machines of the node when pausing it. I found this nice blog about Draining Nodes for Planned Maintenance with Windows Server 2012 discussing this subject and providing us with the properties to do just that.

It seems we have two common properties at our disposal: NodeDrainStatus and NodeDrainTarget.

image

So I set to work but I just didn’t manage to get those properties to be read. It was like they didn’t exist. So I pinged Jeff Wouters who happens to use PowerShell for just about anything and asked him if it was me being stupid and missing the obvious. Well it turned out to be missing the obvious for sure as those properties do no exist. Jeff told me to double check using:

Get-ClusterNode MyNode -cluster MyCluster | Select-Object -Property *

Guess what, it’s not NodeDrainStatus and NodeDrainTarget but DrainStatus and DrainTarget.

image

What put me off here was the following example in the same blog post:

Get-ClusterResourceType "Virtual Machine" | Get-ClusterParameter NodeDrainMoveTypeThreshold

That should have been a dead give away. As we’ve been using MoveTypeTresHold a lot the recent months and there is no NodeDrain in that value either. But it just didn’t register. By the way you don’t need to create the property either is exists. I guess this code was valid with some version (Beta?) but not anymore. You can just get en set the property like this

Get-ClusterResourceType “Virtual Machine” -Cluster MyCluster | Get-ClusterParameter MoveTypeThreshold

Get-ClusterResourceType “Virtual Machine” -Cluster MyCluster | Set-ClusterParameter MoveTypeThreshold 2000

So lessons learned. Trust but verify Smile.  Don’t forget that a lot of things in IT have a time limited value. Make sure that to look at the date of what you’re reading and about what pre RTM version of the product the information is relevant to.

To conclude here’s the PowerShell snippet I used to monitor the draining process.


Suspend-clusternode –Name crusader -Cluster warrior -Drain

Do
{
    Write-Host (get-clusternode –Name “crusader” -Cluster warrior).DrainStatus -ForegroundColor Magenta    
    Sleep 1
}
until ((get-clusternode –Name “crusader” -Cluster warrior).DrainStatus -ne "InProgress")

If ((get-clusternode –Name “crusader” -Cluster warrior).DrainStatus -eq "Completed")
{
    Write-Host (get-clusternode –Name “crusader” -Cluster warrior).DrainStatus -ForegroundColor Green
}

Which outputs

image

Understanding Virtual Machine Priority and Preemption Behavior


Introduction

By reading Aidan Finn his blog You Pause A Clustered Hyper-V Host And Low Priority VMs are QUICK MIGRATED! you will learn something about how virtual machine priorities work during the pausing and draining of a clustered Hyper-V host. They are either Live or quick migrated depending on the value of the MoveTypeThreshold cluster parameter for resources of the type “Virtual Machine”. By default it’s at 2000 and that happens to be the value of the virtual machine priority “Low”.

Changing this value can alter the default behavior. For example setting the MoveTypeThreshold value to 1000 using PowerShell

Get-ClusterResourceType “Virtual Machine” | Set-ClusterParameter MoveTypeThreshold 1000

makes sure that only VMs with a priority set to “No Auto Restart”  are quick migrated. The  low priority machines would than also live migrate where by default they quick migrate.

  • Virtual Machines with Priority equal to or higher than the value specified in MoveTypeThreshold will be moved using Live Migration.
  • Virtual Machines with Priority lower than the value specified in MoveTypeThreshold will be moved using Quick Migration.

Virtual Machine Priorities
3000 = High
2000 = Medium
1000 = Low
0 = Virtual machine does not restart automatically.

Another Scenario to be aware of  to avoid surprises

Note that al this also comes into play in other scenario’s. One of them is when you attempt to start a guest that requires more resources than available on the host. Preemption kicks in and the lower priority virtual machines go into a saved state.  If you didn’t plan for this it could be a bit of a surprise, causing service interruption. What’s also important to know is that preemption kicks in even when there is no chance that putting lower priority virtual machines into saved mode will free enough resources for (all) the VMs you’re trying to start. So that service interruption might do you no good. If this is the case the Low priority VMS come back up when there are sufficient resources left.  Do note however that the ones set top “No Auto Restart” remain in a saved state. Look below for an example on how this could happen.

How does this happen?

Let’s say you have a brand new VM that has gotten 16GB of RAM as requested by the business. When that large memory guest starts it will fail due to the fact that there are not enough memory resources available on the host that only has 16GB available. But as it attempts to start, the need for memory resources is detected and preemption comes into play. The guests with “Low” and “No Auto Restart” priorities are put into a saved state as the large memory VM has the default medium priority and the MoveTypeTreshold is at the default of 2000. You need to be ware of this behavior. Preemption kicks in and the machines are still saving while starting the large memory VM has already failed as they couldn’t free enough resources anyway.

image

The good new is that, as you can see below, is that the low priority guest starts again after starting the large memory guest has failed. No use keeping it saved as it can run and service customers. So the service interruption for this VM is limited but it does happen. Please also note that the guest set to No Auto Restart doesn’t come up again as it’s priority status says exactly that. So, this one becomes collateral damage.

image

As you can see it’s important to know how priorities and preemption work together and behave. It also good to know that changing the threshold come into play in more situations that just pausing & draining a host of during a fail over. While the cluster will try it’s best to keep as many VMs up and running you might have some unintended consequences under certain conditions. A good understanding of this can prevent you from being bitten here. So build a small cheap lab so you can play with stuff. This helps to gain a better understanding of how features work and behave. If you want to play some more, set the priority of the memory hungry VM to high you’ll see even more interesting things happen.

KB2803748 Failover Cluster Management snap-in crashes after you install update 2750149 on a Windows Server 2012-based failover cluster


When you install KB2750149 (An update is available for the .NET Framework 4.5 in Windows 8, Windows RT and Windows Server 2012) you’ll have an issue with the Cluster GUI.image

Basically it shows an error message. The issue caused by installing the above update 2750149 on a Windows Server 2012-based failover cluster or a management station running the Failover Cluster Management snap-in. In this situation, the Failover Cluster Management snap-in crashes. Do NOT worry, the entire cluster is fine, this is just a GUI bug that will leave your GUI work/results pane blank after closing the error screen and basically unusable.

clip_image002

The only known workaround was to uninstall the hotfix or not install it at all on any node where you need to use the Cluster GUI (Windows 8 with RSAT for example). But now there is a fix released with KB2803748.

The update requires no reboot unless you have the Cluster GUI running as that it locks the file that need replacing. So keep them closed and you’re good to go. Also, it’s also great opportunity to use Cluster Aware Updating (CAU) with the hotfix plug-in to install the hotfix in an orchestrated fashion.

UPDATE: This update is also available now via WSUS. So updating is possible via the CAU windows update plug-in Smile

image

Remote File Browsing Issue In Windows Server 2012 Hyper-V Leaves Results Pane Empty Workaround


In Windows Server 2012 the Remote File Browsing functionality for Hyper-V acts ups on some nodes indicating a problem.

You can read what “Remote File Browsing” is on TechNet here. You use it to browse the file system on a remote Hyper-V server when creating a  new VM there for example.

Remote File Browsing is a shell namespace extension implemented by Hyper-V, it provides a way to browse the folders/files on remove Hyper-V server without requiring server to open extra shell over the network.

The path "::{0907616E-F5E6-48D8-9D61-A91C3D28106D}\HYPER-V-TEST" is to tell shell (explorer or common file dialog) that it is hosting/pointing to the RemoteFileBrowsing shell namespace extension on the HYPER-V-TEST. The guid is Hyper-V remotefilebrowsing shell namespace extension GUID. However, due to the limitation on common file browser, it is not able to translated into "Hyper-V Remote File Browsing".

Now in Windows Server 2012 we sometimes see the following when we use it:

image

It seems to work but the result pane remains empty. The cluster is healthy, the nodes are healthy, all nodes are identically configured. Some nodes have it, other don’t. We also can’t find any errors logged anywhere.

If you try to work around it using the UNC path that will fail due to security issues later so don’t even go there Winking smile

Basically we were a bit baffled (we could not reproduce it in the lab either) until we saw some posts on then forums, indicating we’re not the only one seeing this.

http://social.technet.microsoft.com/Forums/en-US/winserverhyperv/thread/608d0c3b-0a7b-4ad9-9843-5e5051dcd526

http://social.technet.microsoft.com/Forums/en-US/winserverhyperv/thread/7a34f5e1-76bc-493a-8a7a-e9f420bf6a79#d7dd4db7-d7bd-419d-aa72-b12e43cd7a5d

If you know your cluster is perfectly healthy forget all the security settings stuff and go straight to testing this “fix” or rather workaround: Toggle Audit Object Access on and off.

In our case I can confirm that these nodes had been under a group policy that audited registry entries during a period that we were trouble shooting network card settings change behavior. We had removed that policy by first reverting the settings to not configured and after some days by removing the GPO. But that didn’t work. Even with no audit policy configured we had to go to all nodes showing this behavior, opening the local Group Policy, toggling our Audit Object Access on for success,applying this and reverting this to No auditing again.

So fire up an MMC, add a snap-in

image

Select Group Policy Object

image

Accept the defaults

image

image

When don navigate to Computer Configuration -> Windows Settings -> Security Settings -> Local Policy -> Audit Policy -> Audit Object Access

image

Now try to use Remote Browser again (close & reopen all wizard windows and start over a new) to see the results:

image

Success! All is well again.

Notes:

  • We only see this on systems remotely connecting to Windows Server 2012 Hyper-V nodes that are running Windows Server 2012 or Windows 8 themselves not on Windows 2008 R2 or Windows 7 with the RSAT for W2K12 installed.
  • This is not related to Windows core alone due to missing GUI components or something.

Logging Cluster Aware Updating Hotfix Plug-in Installations To A File Share


As an early adopter of Windows Server 2012 it’s not about being the fist it’s about using the great new features. When you leverage the Cluster Aware Updating (CAU) Plug-in to deploy hardware vendor updates like those from DELL which are called DUPs (Dell Update Packages) you have the option to to log the process via parameter /L

This looks like this in the config XML file for the CAU (I’ll address this XML file in more details later).

<Folder name="Optiplex980DUPS" alwaysReboot="false"> 
    <Template path="$update$" parameters="/S /L=\\zulu\CAULogging\CAULog.log"/>

 

As you can see I use a file share as I don’t want to log locally because this would mean I’d have to collect the logs on all nodes of a cluster.   Now if you log to  file share you need to do two things that we’ll discuss below.

1. Set up a share where you can write the log or logs to

Please note that you cannot and should not use the CAU file share for this. First off all only a few accounts are allows to have write permissions to the CAU file share. This is documented in How CAU Plug-ins Work

Only certain security principals are permitted (but are not required) to have Write or Modify permission. The allowed principals are the local Administrators group, SYSTEM, CREATOR OWNER, and TrustedInstaller. Other accounts or groups are not permitted to have Write or Modify permission on the hotfix root folder.

This makes sense. SMB Signing and Encryption are used to protect tampering with the files in transit and to make sure you talk to the one an only real CAU file share. To protect the actual content of that share you need to make sure now one but some trusted accounts and a select group of trusted administrators can add installers to the share. If not you might be installing malicious content to your cluster nodes without you ever realizing. Perhaps some auditing on that folder structure might be a good idea?

image_thumb61

This means that you need a separate file share so you can add modify or at least write permissions to the necessary accounts on the folder. Which brings us to the second thing you need to do.

2. Set up Write or Modify permissions on the log share

You’ll need to set up Write or Modify permissions on the log share for all cluster node computer accounts. To make this work more practically with larger clusters please you can add the computer accounts to an AD group, which makes for easier administration).

image_thumb61

The two nodes here have permissions to write to the location

image

As you can see the first node to create the loge file is the owner:

image

Some extra tips

The log can grow quite large if used a lot. Keep an eye on it so avoid space issues or so it doesn’t get too big to handle and be useful. And for clarities sake you might get a different log per cluster or even folder type. You can customize to your needs.

Cluster Aware Updating – Cluster CNO Name 15 Characters (NETBIOS name length) GUI Issue


There seems to be a small bug in the Cluster Aware Updating GUI when the cluster name exceeds 15 characters. In our example we’ll look at a cluster with the name XXXCLUSSQLSERVERS or xxxclussqlservers.test.lab. We’ll try to connect to that cluster to do some cluster aware updating.

Click on the dropdown arrow and select our cluster

image

 

Once selected, click “Connect”

image

 

Now we’re greeted by this little message

image

No, you didn’t make a typo as you selected the cluster from the drop down list. You also know that your cluster is up and running. So what happened? Well, the GUI queries AD and returns the CNOs it finds. Those are limited to the NETBIOS name and as such maximal 15 characters long. In this case the name is XXXCLUSSQLSERVERS and this gives a CNO of XXXCLUSSQLSERVE, which is not found as a cluster.

The fix is easy and simple. Just type in the cluster name. XXXCLUSSQLSERVERS and voila. You can connect and are on your way.

image

Let’s see if the FQDN is accepted as well, shall we? And yes, the below screenshot proves this.

http://workinghardinit.files.wordpress.com/2012/12/image43.png?w=584

Conclusion

So this is not a problem once you know this Smile. The CAU GUI returns the cluster CNO name and that’s the NetBIOS name which can be only 15 characters long. Selecting it in CUA to connect to the cluster doesn’t work. You need to fill out the complete name. As we demonstrated the CAU GUI does also accept a FQDN. To prevent running into this issue consider not making your cluster names longer than 15 characters as then the CNO and the cluster name will be identical and is a smart thing to do as you’ll avoid possible duplicate CNOs trying (and failing) to be created or other bugs Winking smile.

In PowerShell you always submit the cluster name so you don’t hit this issue. Perhaps the GUI drop down list could translate the CNOs into the actual cluster names?

Checking Host Integration Services Version on all Nodes of A Windows Server 2012 Hyper-V Cluster With PowerShell


It’s important to keep our Hyper-V cluster hosts and the virtual machines running on them up to date. Whilst we have great and free solutions to achieve this there are some things missing like centralized reporting on the Integration Services component version running on all of the nodes in a cluster and way to upgrade all the virtual machines to version running on the host. This post deals with the first issue.

Before we upgrade the Integration Services components on the virtual machines we always check if all nodes in the cluster are on the same version themselves. Sure this should not happen if you mange them right but my world isn’t perfect. So trust but verify.With cluster sizes now up to 64 nodes it’s ever more important to keep an eye on them. But even for smaller cluster the task of determining the Integration Services components manually via the GUI, event viewer and/or registry is rather tedious. Out of sync Integration Services components can be troublesome and cause many issues and if you have out of sync virtual machines, imagine the extra mess you’ll be in when even the cluster nodes are running different versions.

To make live easier I threw a little PowerShell script together to check the host Integration Services component version on all nodes of a Window Server 2012 Hyper-V Cluster With PowerShell. I’m far from a PowerShell guru, but you’ll see that you can do a lot of things  done even if you’re not. I’m sharing it here for you to use, adapt for your own needs and get some inspiration. It basically allows you to optionally pass an expected version of the IS components and a cluster name like this

CheckHyperVClusterHostsICVersion -Version 6.2.9200.16433 -cluster "MyClusterName"

It does the following:

  • It will list per Integration Services component version found on cluster nodes what version was found on what nodes. This gives you a nice overview. I hope this never becomes to much of a list in your clusters.
  • If you don’t specify a cluster it will try to connect to the cluster to which the host you’re running on belongs, if any.
  • If the host does not belong to a cluster it will just provide feedback on the IS version of that Hyper-V host you’re running the script on.

Here’s a screen shot of when you run this on a none clustered host, without Hyper-V installed:

image

This is the result of running it against a well maintained cluster without any parameters that has been updated with KB2770917:

image

The same but now with the expected version and cluster name passed as parameters

image

So, there you go, I hope you find it useful.

#===========================================================
# # Microsoft PowerShell Source File 
# 
# NAME:    CheckISCOnNodesOfHyperVCluster.ps1
# VERSION:    1.0.0.0

# AUTHOR:    Didier Van Hoye
# DATE :    17/11/2012
# 
# COMMENT:     This script is intended to be run 
#              against Windows Server 2012 and assumes
#            the use of PowerShell 3.0
#            The parameters are optional but if you
#            leave out some the remainder should be named.
# # =======================================================
 
cls
$ErrorActionPreference = "Stop"

 
function CheckHyperVClusterHostsICVersion
{
    Param
    (
        #Param help description
        [Version]
        $ExpectedISCVersion,
        #Param help description
        [String]
        $Cluster
    )

    Write-Host "This script will check the IS components on all nodes of a cluster." -ForegroundColor Green
 
    If ($ExpectedISCVersion) {Write-Host "You specified the expected IS component version to be $ExpectedISCVersion" -ForegroundColor Green}
    Else {Write-host "You did not specify an expected IS component version." -ForegroundColor Green}
    
    If ($Cluster)
    {
        Try
        {
            $ClusterObject= Get-Cluster -Name $Cluster
        }
        Catch
        {     
            Write-Host "We cannot contact the cluster you specified"
        }
    }
    Else
    {    
        write-Host "`n`n"
        Write-host "You did not specify a cluster to connect to. We'll use the cluster to which the node this script is running on belongs if any." -ForegroundColor Yellow
        write-Host "`n`n"
  
        Try
        {
            $ClusterObject = Get-Cluster
        }
  
        Catch
        {
            $LocalHost = $env:computername
            Write-Host
            Write-Host "The current node ($LocalHost) is not a member of a cluster. As a courtsey to you we'll check the IS components for current host" -foregroundcolor Magenta
            Write-Host
        }
 
    }
  
    If ($ClusterObject) {$ToCheck= "the nodes of cluster $ClusterObject"} Else { $ToCheck = "server $env:computername"}
 
    write-Host "Attempting to running Integration Components version check on" $ToCheck -ForegroundColor Green
    Write-Host


    If ($ClusterObject)
    {

        $ClusterNodes = Get-Clusternode -cluster $ClusterObject.Name
        
        #Declare an hashtable to hold all host/IS version values. The hosts are the key here.
        $HostISVersions = @{}
 
        foreach ($ClusterNode in $ClusterNodes)
        {
            Try
            {
                 $HostISVersions[$ClusterNode.Name]=Get-ItemProperty "HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization\GuestInstaller\Version" | select -ExpandProperty Microsoft-Hyper-V-Guest-Installer
            }
            Catch
            {
            Write-Host "We could not determine the version of the Integration Services on this host, probably due to this not being a Hyper-V host" -ForegroundColor Orange
            Write-Host "We'll check this for you right now" -ForegroundColor Orange
            $HyperVFeature = Get-WindowsFeature Hyper-V
            If ($HyperVFeature.Installstate -eq "Installed")
            {
              Write-Host "Hyper-V seems to be installed on this node. Something else is wrong." -ForegroundColor Red
            }
            Else
            {
                Write-Host "Hyper-V is indeed not installed on this node." -ForegroundColor Orange
            }
            }
        }
         #Use GetEnumerator or thise sorting thing doesn't work out well on an hash tabel :-) 
        $UniqueIcVersions = $HostISVersions.GetEnumerator() | Sort-Object -Property Value -Unique
 
        Write-Host "We've found " $UniqueIcVersions.count "versions on the" $HostISVersions.count "nodes of your cluster" $ClusterObject.Name
 
        ForEach ($IcVersion in $UniqueIcVersions )
        {
            $Counter = 1
            $IcVersionValue = $IcVersion.value
            "IC version " + $IcVersion.value + " is found in:"
            foreach ($Key in ($HostISVersions.GetEnumerator()| Where-Object { $_.value -eq $IcVersionValue}))
            {
                "`t" + "$Counter : " + $Key.Name
                $Counter= $Counter + 1
            }
 
            If ($ExpectedISCVersion)
            {
               
                $CompareVersions = ([Version]$IcVersion.Value).CompareTo([Version]$ExpectedISCVersion)
                        
                switch ($CompareVersions)
                {
                    0 {Write-Host "This version ($IcVersionValue) is equal to the expected version ($ExpectedISCVersion)." -ForegroundColor Green}
                    1 {Write-Host "This version ($IcVersionValue) is higher than the expected version ($ExpectedISCVersion). Please ensure all hosts run the same IC version level." -ForegroundColor Yellow}
                    -1 {Write-Host "This version ($IcVersionValue) is lower than the expected version ($ExpectedISCVersion). Please ensure all hosts run the same IC version level." -ForegroundColor Red}
                }
            }

        }
    }

    Else
    {
        Try
        {
            $HostIcVersion = Get-ItemProperty "HKLM:\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Virtualization\GuestInstaller\Version" | select -ExpandProperty Microsoft-Hyper-V-Guest-Installer
            Write-Host "The IS component version on server $localhost is $HostIcVersion"
            If ($ExpectedISCVersion)
            {
               
                   $CompareVersions = ([Version]$HostIcVersion).CompareTo([Version]$ExpectedISCVersion)
                        
                switch ($CompareVersions)
                {
                0 {Write-Host "This version ($HostIcVersion) is equal to the expected version ($ExpectedISCVersion)." -ForegroundColor Green}
                1 {Write-Host "This version ($HostIcVersion) is higher than the expected version ($ExpectedISCVersion). Please check if you need to downgrade your host or if the expected version is correct." -ForegroundColor Yellow}
                -1 {Write-Host "This version ($HostIcVersion) is lower than the expected version ($ExpectedISCVersion). Please check if you need to upgrade your host or if the expected version is correct." -ForegroundColor Red}
                }
            }
        }
        Catch
        {
            Write-Host "We could not determine the version of the Integration Services on this host, probably due to this not being a Hyper-V host" -ForegroundColor yellow
            Write-Host "We'll check this for you right now" -ForegroundColor yellow
            $HyperVFeature = Get-WindowsFeature Hyper-V
            If ($HyperVFeature.Installstate -eq "Installed")
            {
                Write-Host "Hyper-V seems to be installed on this node. Something else is wrong." -ForegroundColor Red
            }
            Else
            {
                Write-Host "Hyper-V is indeed not installed on this node." -ForegroundColor yellow
            }
        }
    }
}
 
CheckHyperVClusterHostsICVersion -Version 6.2.9200.16433 -cluster "MyClusterName"

Cluster Validation Failure while setting up a Windows 2012 Continuous Available File Share: The password does not meet the password policy requirements


We were installing a Windows Server 2012 cluster in a W2K8R2 domain and while we were checking out our work by running the cluster validation we got one warning we’ve never seen before:

Validate CSV Settings

Description: Validate that settings and configuration required by Cluster Shared Volumes are present. This test can only be run with an administrative account, and it only tests servers that are cluster nodes.

Start: 9/24/2012 5:01:18 PM.

Validating Server Message Block (SMB) share access through the IP address of the fault tolerant network driver for failover clustering (NetFT), and connecting with the user account associated with validation.

Begin Cluster Shared Volumes support testing on node server1.test.lab.

Failure while setting up to run Cluster Shared Volumes support testing on node server1.test.lab: The password does not meet the password policy requirements. Check the minimum password length, password complexity and password history requirements.

Begin Cluster Shared Volumes support testing on node server2.test.lab.

Failure while setting up to run Cluster Shared Volumes support testing on node server2.test.lab: The password does not meet the password policy requirements. Check the minimum password length, password complexity and password history requirements.

This test requires more than one node. If your cluster contains more than one node, please run validation tests again with more than one node specified.

Now as it turns out this Active Directory domain does enforce some lengthy and complex passwords. By this they are basically driving the admins to use pass sentences which are lot more secure. That also means that the account we are using to run the validation have adequate lengths & complexities.

So, what if we tune down the password length requirements and than run GPUDATE from an elevated command prompt on all nodes of the cluster? Bingo! The cluster valid now passes with flying colors.

I’m guessing that perhaps the local doesn’t have a strong enough password to meet the requirements. But this is just guessing. This is the account that is involved in reducing the clusters dependency on Active Directory so that CSV for example can come on line even if there is not domain controller to contact. Hence my guess that this is related. This did not happen in a lab environment so I’m not going to change the password on all nodes to a more complex one. That is for a lab Smile

image

Continuously Available File Shares Don’t Support Short File Names – "The request is not supported" & “CA failure – Failed to set continuously available property on a new or existing file share as Resume Key filter is not started.”


If you ever get the following error while trying to create a Continuously Available File Share in Windows Server 2012  "The request is not supported"

If on top you find this entry in the Microsoft-Windows-SmbServer/Operational event log:

Log Name:      Microsoft-Windows-SmbServer/Operational
Source:        Microsoft-Windows-SmbServer
Date:          24/09/2012 17:56:59
Event ID:      1801
Task Category: (1801)
Level:         Error
Keywords:      (8)
User:          SYSTEM
Computer:      server1.lab.test
Description:
CA failure – Failed to set continuously available property on a new or existing file share as Resume Key filter is not started.

image

First of all check  with fsutil if you have short file names enabled on the volumes on which you are trying to create the continuous available file share:

  • Log on to the node running the File role and open a elevated command prompt to run the following on the volume/partition in play, F: in this example.

fsutil 8dot3name query F:
The volume state is: 0 (8dot3 name creation is enabled).
The registry state is: 2 (Per volume setting – the default).
Based on the above two settings, 8dot3 name creation is enabled on F:

  • I chose to enable or disable short file names per volume

fsutil 8dot3name set 2
The registry state is now: 2 (Per volume setting – the default).

  • Disable short file names on the volume at hand

fsutil 8dot3name set f: 1
Successfully disabled 8dot3name generation on f:

  • Remove any short file names present on this volume

fsutil 8dot3name strip f:
Scanning registry…
Total affected registry keys:                   0
Stripping 8dot3 names…
Total files and directories scanned:            6
Total 8dot3 names found:                        3
Total 8dot3 names stripped:                     3
For details on the operations performed please see the log:
"C:\Users\USER~1\AppData\Local\Temp\2\8dot3_removal_log @(GMT 2012-09-24 18-40-05).log"

  • Now, move the role over to the next node to rinse & repeat

fsutil 8dot3name set 2
The registry state is now: 2 (Per volume setting – the default).

fsutil 8dot3name set f: 1
Successfully disabled 8dot3name generation on f:

fsutil 8dot3name query f:
The volume state is: 1 (8dot3 name creation is disabled).
The registry state is: 2 (Per volume setting – the default).
Based on the above two settings, 8dot3 name creation is disabled on f:

fsutil 8dot3name strip f:
Scanning registry…
Total affected registry keys:                   0
Stripping 8dot3 names…
Total files and directories scanned:            6
Total 8dot3 names found:                        0
Total 8dot3 names stripped:                     0
For details on the operations performed please see the log:
"C:\Users\USER~1\AppData\Local\Temp\3\8dot3_removal_log @(GMT 2012-09-24 18-44-36).log"

I know this now because I hit the wall on this one and Claus Joergensen at Microsoft turned me to the solution. He actually blogged about this as well, but I never really registered this until today.

Disable 8.3 name generation

SMB Transparent Failover does not support cluster disks with 8.3 name generation enabled. In Windows Server 2012 8.3 name generation is disabled by default on any data volumes created. However, if you import volumes created on down-level versions of Windows or by accident create the volume with 8.3 name generation enabled, SMB Transparent Failover will not work. An event will be logged in (Applications and Services Log – Microsoft – Windows – ResumeKeyFilter – Operational) notifying that it failed to attach to the volume because 8.3 name generation is enabled.

You can use fsutil to query and setting the state of 8.3 name generation system-wide and on individual volumes. You can also use fsutil to remove previously generated short names from a volume.

There’s also a little note here http://support.microsoft.com/kb/2709568

SMB Transparent Failover

Both the SMB client and SMB server must support SMB 3.0 to take advantage of the SMB Transparent Failover functionality.
SMB 1.0- and SMB 2.x-capable clients will be able to connect to, and access, shares that are configured to use the Continuously Available property. However, SMB 1.0 and SMB 2.x clients will not benefit from the SMB Transparent Failover feature. If the currently accessed cluster node becomes unavailable, or if the administrator makes administrative changes to the clustered file server, the SMB 1.0 or SMB 2.x client will lose the active SMB session and any open handles to the clustered file server. The user or application on the SMB client computer must take corrective action to reestablish connectivity to the clustered file share.
Note SMB Transparent Failover is incompatible with volumes enabled for short file name (8.3 file name) support or with compressed files (such as NTFS-compressed files).

Frankly, all my testing of Continuous available share, from the BUILD conference till RTM setups have been green field, meaning squeaky clean, brand new LUNs. So this time, in real live with a LUN that has a history in a Windows 2008 R2 environment I got bitten.

So, read, read and than read some more Smile is my advise and be grateful for the help of patient and knowledgeable people.

Anyway, It’s full steam ahead here once again getting the most out of our Software Assurance by leveraging everything we can out of Windows Server 2012.