Concluding My Summit, Conference & Community Engagements for 2014


After Redmond (MVP Global Summit 2014), which was a great experience I flew to Berlin to attend and speak at the Microsoft Technical Summit 2014 on “What’s New In Windows Server 2012 R2 Clustering”. Germany has a seriously engaged ITPro & Dev scene, that’s for sure, and the session room was packed! Afterwards some interesting questions popped up in the hallways. That’s great as question really make us think about technologies and solutions from other view points and perspectives.

image

After Berlin I was off to Experts Live 2014 in Ede (The Netherlands) where I presented on “The capable & Scalable Cloud OS”. The talk went well and I had a great crowd attending with whom I had some great chats after the session.

image

That concluded the third leg of my international road tour where I invest in myself, the community & the people I work with. Never ever stop learning Smile. Normally this also concludes my traveling schedule for 2014 unless I’m needed/requested somewhere to help out. Being an MVP is about sharing in the community. The only way to prosper is to share the knowledge, experience and the wealth. It provides for a healthy ecosystem from which we all reap the benefits. This should be promoted and facilitated. There is too much expertise & knowledge not being leveraged due to the fact it’s economically unfeasible, and that’s a waste when people are screaming for IT skills. In a war for talent, any waste is surely very counter productive?

Golden Nuggets: Windows Server 2012 R2 Failover Cluster CSV Placement Policy


Some enhancements only become truly evident to people when they see them in action. For many features this means something need to go wrong before they kick in. Others are more visible during normal operations. This is the case with the CSV enhancements in Windows Server 2012 R2 Failover Clustering.

One golden nugget here is the CSV placement policy (which really shines in combination with SOFS/Storage Spaces). This will spread ownership of the CSV amongst the cluster nodes to ensure a balanced distribution. In a failover cluster, one node is the “coordinator node” (owner) for a CSV. The coordinator node owns the physical disk resource that is associated with a logical unit (LUN). All I/O operations for the File System on that LUN are are through the coordinator node. In previous versions there is no automatic rebalancing of coordinator node assignment. This means that all LUNs could potentially be owned by the same node. In storage spaces & SOFS scenarios becomes even more important.

The benefits

  • It helps all nodes carry their share of the workload as it load balances the disk I/O.
  • Failovers of CSV owners are potentially quicker and more predictable/consistent as an even distribution ensures that no one node owns a disproportionate number of CSVs.
  • When losing storage access the number of CSVs that are in redirected mode is potentially less as they are evenly distributed. In an unbalanced cluster it could be for all of them in a worse case scenario.
  • When using SOFS with Storage Spaces it makes sure the Storage Spaces Ownership is distributed fairly.

When does it happen

  • Each time a node leaves or joins the cluster. This means you don’t need to intervene manually or via PowerShell to get an even distribution. This goes for both exiting nodes as when adding a new node. The new node will get a CSV assigned if there is any on surplus on one of the existing nodes.
  • The process also works when you start a failover cluster when it has shut down.

When customers see this in action (it’s most obvious when then add a node as then they are normally watching) they generally smile as the cluster does it job getting  the best possible results out of their hardware.

Windows Server 2012 R2 Clustering brings improved CSV diagnosability


Cluster Shared Volumes (CSV) have can go into different redirected access modes for several reasons. Now a lot of people get (or got) worried about seeing “redirected access” in the GUI. Most of the time however this is due to normal operations such as backups or maintenance (defragmentation) not only losing disk access.

To remediate unneeded troubleshooting, sometimes leading to real issues, calls to MSFT support and so forth it was hidden from the Failover Clustering GUI in Windows 2012 R2. OK, so goal achieved but how do we now troubleshoot and view redirected access that might indicate the presence of real issues? The answer to that is the Get-ClusterSharedVolumeState PowerShell cmdlet. It displays the state of the CSVs on a per node basis for a cluster. You’ll see the type of the IO (Direct, File System Redirected and Block Redirected), if it’s completely unavailable  as well as the reason.

This is what the output looks like on a two node cluster where node A has lost it’s storage path or paths (MPIO) to the CSV. You’ll see that both CSV are in redirected access. Not only that but you can see what type (block redirected) and why (no disk connectivity).

image

Pretty neat and clear. I love this functionality by the way and It’s why I’m leveraging 10Gbps Ethernet extensively to make sure that CSV traffic get’s the bandwidth & latency to handle what it has to. If you realize it leverages SMB3 which provides SMB Multichannel and SMB Direct you know it will get the job done for you in your time of need.

While this is happening in the GUI you’ll see this

image

Nothing is going on … it would seem so a bit of monitoring and alerting would be of use here. The good news is finding out what’s up is very straight forward now.

Now there is still a case where you’ll see that the CSV is in redirected access mode and that when you’ve put it in there yourself via the GUI

image

or via PowerShell for maintenance reasons.

image

As you can see the Icon has change to a networked disk one and it states “Redirected Access”. With Get-ClusterSharedVolumeState the output looks like this.

image

You’ll always see warnings in the event logs.

image

So monitor those with SCOM or another tool that suits your taste and you’ll be in good shape to react when it’s needed and you now know how to find out what’s going on.

First experiences with a rolling cluster upgrade of a lab Hyper-V Cluster (Technical Preview)


Introduction

In vNext we have gotten a long awaited  & very welcome new capability: rolling cluster upgrades. Which for the Hyper-V roles is a 100% zero down time experience. The only step that will require some down time is the upgrade of the virtual machine configuration files to vNext (version 5 to 6) as the VM has to be shut down for this.

How to

The process for a rolling upgrade is so straight forward I’ll just give you a quick bullet list of the first part of the process:

  • Evacuate the workload from the cluster node you’re going to upgrade
  • Evict the node to upgrade to vNext from the cluster
  • Upgrade (no in place upgrade supported but in your lab you can get away with it)
  • Add the upgraded node to the cluster
  • Rinse & repeat until all nodes have been upgraded (that can take a while with larger clusters)

Please note that all actions you administration you do on a cluster in mixed mode should be done from a node running vNext or a system running Windows 10 with the vNext RSAT installed.

Once you’ve upgraded all nodes in the cluster, the situation you’re in now is basically that you’re running a Windows Server vNext Hyper-V cluster in cluster functional level 8 (W2K12R2) and the next step is to upgrade to 9, which is vNext, no there no 10 yet in server Winking smile

You do this by executing the Update-ClusterFunctionalLevel cmdlet. This is an online process.  Again, do this from a node running vNext or a system running Windows 10 with the vNext RSAT installed. Note that this is where you’re willing to commit to the vNext level for the cluster. That’s where you want to go but you get to decide when. When you’ve do this you can’t go back to W2K12R2. It’s a matter of fact that as long as you’re running cluster functional level 8, you can reverse the entire process. Talk about having options! I like having options, just ask Carsten Rachfahl (@hypervserver), he’ll tell you it’s one of my mantras.

image

When this goes well you can just easily check the cluster functional level as follows:

image

When this is done you can do the upgrade of the VM configuration by running the Update-VMConfigurationVersion cmdlet. This is an off line process where the VMs you’re updating have to be shut down. You can do this for just one VM, all or anything in between. This is when you decided you’re committing to all the goodness vNext brings you.  But the fact that you have some time before you need to do it means you can  easily get those machine to run smoothly on a W2K12R2 cluster in case you need to roll back. Remember, options are good!

Doing so updates VM version from 5 to 6 and enables new Hyper-V features (hit F5 a lot or reopen Hyper-V Manager to see the value change.

image

image

Note: If in the lab you’re running some VMs on a cluster node are not highly available (i.e. they’re not clustered) they cannot be updated until the cluster functional level has been upgraded to version 9.

Defragmenting your CSV Windows 2012 R2 Style with Raxco Perfect Disk 13 SP2


When it comes to defragmenting CSV it seemed we took a step back when it comes to support from 3rd party vendors. While Windows provides for a great toolset to defragment a CSV it seemed to have disappeared form 3r party vendor software. Even from the really good Raxco Perfect disk. They did have support for this with Windows 2008 R2 and I even mentioned that in a blog.

If you need information on how to defragment a CSV in Windows 2012 R2, look no further.There is an absolutely fantastic blog post on the subject How to Run ChkDsk and Defrag on Cluster Shared Volumes in Windows Server 2012 R2, by Subhasish Bhattacharya one of the program managers in the Clustering and High Availability product group. He’s a great guy to talk shop to by the way if you ever get the opportunity to do so. One bizarre thing is that this must be the only place where PowerShell (Repair-ClusterSharedVolume cmdlet) is depreciated in lieu of chkdsk.

3rd party wise the release of Raxco Perfect Disk 13 SP2 brought back support for defragmenting CSV.

image

I don’t know why it took them so long but the support is here now. It looks like they struggled to get the CSVFS (the way CSV are now done since Windows Server 2012) supported. Whilst add it, they threw in support for ReFS by the way. This is the first time I’ve ever seen this. Any way it’s here and that’s good because I have a hard time accepting that any product (whatever it does) supports Hyper-V if it can’t handle CSV, not if you want to be taken seriously anyway. No CSV support equals = do not buy list in my book.

Here’s a screenshot of Perfect disk defragmenting away. One of the CSV LUNs in my lab is a SSD and the other a HDD.

image

Notice that in Global Settings you can tweak the behavior when defragmenting optimization of various drive types, including CSVFS but you just have to leave the default on unless you like manual labor or love PowerShell that much you can’t forgo any opportunity to use it Winking smile

image

Perfect disk cannot detect what kind of disks you have behind the CSV LUN so you might want to change the optimization method if you’re running SSD instead of HHD.

image

I’d love for Raxco to comment on this or point to some guidance.

What would also be beneficial to a lot of customers is guidance on defragmentation on the different auto-tiering storage arrays. That would make for a fine discussion I think.

Migrate an old file server to a transparent failover file server with continuous availability


This is not a step by step “How to” but we’ll address some thing you need to do and the tips and tricks that might make things a bit smoother for you.

1) Disable Short file names & Strip existing old file names

Never mind that this is needed to be able to do continuous availability on a file share cluster. You should have done this a long time ago. For one is enhances performance significantly. It also make sure that no crappy apps that require short file names to function can be introduced into the environment. While I’m an advocate for mutual agreements there are many cases where you need to protect users, the business against itself. Being to much of a politician as a technologist can be very bad for the company due to allowing bad workarounds and technology debt to be introduced. Stand tall!

Read up on this here Windows Server 2012 File Server Tip: Disable 8.3 Naming (and strip those short names too. Next to Jose’s great blog read Fsutil 8dot3name on how to do this.

If you still have applications that depend on short file names you need to isolate and virtualize them immediately. I feel sorry for you that this situation exists in your environment and I hope you get the necessary means to deal with swiftly and decisively by getting rid of these applications. Please see The Zombie ISV® to be reminded why.

Some tips:

  • Only use the /F switch if it’s a non system disk and you can afford to do so as you’re moving the data LUN to a new server anyone. Otherwise you might run into issues. See the below example.image
  • If you stumble on path that are too long, intervene. Talk to the owners. We got people to reduce “Human Resources Planning And Evaluations” sub folder & file names reduced to HRMPlanEval. You get the gist, trim them down.
  • You’ll have great success on most files & folders but if they are open. Schedule a maintenance window to make sure you can run without anyone connected to the shares (Stop LanManServer during that maintenance window).image
  • Also verify no other processes are locking any files or folders (anti virus, backups, sync tools etc.)

2) Convert MBR disks to GPT if you can

With ever growing amounts of data to store and protect this makes sense. I’m not saying you need to start doing 64TB disks today but making sure you can grown beyond 2TB is smart. It doesn’t cost anything when you start out with GPT disks from the start.  If you have older LUNs you might want to use the migration as an opportunity to convert MBR LUNs to GPT. That means copying the data and all NTFS permissions.

Please see  NTFS Permissions On A File Server From Hell Saved By SetACL.exe & SetACL Studio for some tools that might help you out when you run into NTFS/ACL permissions and for parsing logs during this operation.

Here’s a useful Robocopy command to start out with:

ROBOCOPY L:\ V:\ /MIR /SEC /Z /R:2 /W:1 /V /TS /FP /NP /XA:SH /MT:16 /XD "System Volume Information" *RECYCLE* /LOG:"D:\RoboCopyLogs\MBR2GPTLUNL2V.txt"

3) Dump the existing shares on the old file sever into a text file for documentation an use on the new file server

Pre Windows Server 2012 the new SMB Cmdlets don’t work, but no fear, we have some other tools to use. Using NET SHARE does work and with you can also show the hidden and system share but the layout is a bit of a mess. I prefer to use.

Get-WmiObject –class Win32_Share > C:\temp\OldFileServerShares

It’s fast, complete and the layout is very user friendly. Which is what I need for easy use with PowerShell on the W2K12R2  file server. Some of you might say, what about the share security settings. 1) We’re going to cluster so exporting these from the registry doesn’t work and 2) you should have kept this plain vanilla and done security via the NFTS permissions on the folder structure only. But hey I’m a nice guy, so here’s a link to a community PowerShell script if you need to find out the share permissions: http://gallery.technet.microsoft.com/scriptcenter/List-Share-Permissions-83f8c419 I do however encourage you to use this time to consider just using security on NFTS.

4) Create the clustered file shares

Amongst the many gems in Windows Server 2012 R2 are the new SMB PowerShell Cmdlets. They are a simple and great way to create clustered files shares. Read up on these SMB Share Cmdlets and especially New-SmbShare

When we’ve unmapped the LUNs from the old file server and exposed them to the new file server cluster you’re ready to go. You can even reorganize the Shares, consolidate to less but bigger LUNs and, by just adapting the path to the share in the script make sure the users are not confused or nee to learn new shares and adapt how & what they connect to them. Here it goes:

New-SmbShare -Name "TEST2" -path "T:\Shares\TEST2" -fullaccess Everyone -EncryptData $True -FolderEnumerationMode AccessBased -ConcurrentUserLimit 0 -ScopeName TF-FS-MIG

First and foremost, this is where the good practice of not micro managing file hare permissions will pay back big time. If you have done security via NTFS permissions with AG(U)DLP principle to your folder structure granting should be breeze right?

Before you ask, no you can’t do the old trick of importing the registry export of the shares and their security settings form the old file server when you’re going to cluster the file shares. That might sound bad but with some preparation and the PowerShell I demonstrated above you’ll find it easy enough.

5) Recuperate old file server name (Optional)

After you have decommissioned the old file server you could use a cluster alias to keep the old file server UNC path. This has the drawback you will fall back to connecting to the SMB shares via NTLM as aliases don’t support Kerberos authentication. But there is another trick. Once you got rid of the old server object in AD you can rename. If you can do this you’ll be able to keep Kerberos for authentication.

So after you’ve gotten rid of the old server in Active Directory go to the file server role. Select properties and rename it to recuperate the old files server name.

image

Now look at the resources tab. Right click and select the properties tab of “Server Name”. Rename the DNS Name. That will update the server name and the DNS record. This will cause the role to go down temporarily.

image

Right click and select the properties tab of “File Server”. Rename the UNC path to reflect the older file server name.

image For good measure and to test everything works: stop and restart the cluster role, connect to the shares and voila live should be good. Users can access the transparent failover file server like they used to do with the old non cluster file server and they don’t sacrifice Kerberos to be able to do so!

image

Conclusion

I hope you enjoyed the tips and pointers on migrating an old file server to a  Windows Server 2012 R2 file share cluster. Remember that these tips apply for various permutations of P2V, V2V as well as for P2P migrations.

SMB 3, ODX, Windows Server 2012 R2 & Windows 8.1 perform magic in file sharing for both corporate & branch offices


SMB 3 for Transparent Failover File Shares

SMB 3 gives us lots of goodies and one of them is Transparent Failover which allows us to make file shares continuously available on a cluster. I have talked about this before in Transparent Failover & Node Fault Tolerance With SMB 2.2 Tested (yes, that was with the developer preview bits after BUILD 2011, I was hooked fast and early) and here Continuously Available File Shares Don’t Support Short File Names – "The request is not supported" & “CA failure – Failed to set continuously available property on a new or existing file share as Resume Key filter is not started.”

image

This is an awesome capability to have. This also made me decide to deploy Windows 8 and now 8.1 as the default client OS. The fact that maintenance (it the Resume Key filter that makes this possible) can now happen during day time and patches can be done via Cluster Aware Updating is such a win-win for everyone it’s a no brainer. Just do it. Even better, it’s continuous availability thanks to the Witness service!

When the node running the file share crashes, the clients will experience a somewhat long delay in responsiveness but after 10 seconds the continue where they left off when the role has resumed on the other node. Awesome! Learn more bout this here Continuously Available File Server: Under the Hood and SMB Transparent Failover – making file shares continuously available.

Windows Clients also benefits from ODX

But there is more it’s SMB 3 & ODX that brings us even more goodness. The offloading of read & write to the SAN saving CPU cycles and bandwidth. Especially in the case of branch offices this rocks. SMB 3 clients who copy data between files shares on Windows Server 2012 (R2) that has storage an a ODX capable SAN get the benefit that the transfer request is translated to ODX by the server who gets a token that represents the data. This token is used by Windows to do the copying and is delivered to the storage array who internally does all the heavy lifting and tell the client the job is done. No more reading data form disk, translating it into TCP/IP, moving it across the wire to reassemble them on the other side and write them to disk.

image

To make ODX happen we need a decent SAN that supports this well. A DELL Compellent shines here. Next to that you can’t have any filter drives on the volumes that don’t support offloaded read and write. This means that we need to make sure that features like data deduplication support this but also that 3rd party vendors for anti-virus and backup don’t ruin the party.

image

In the screenshot above you can see that Windows data deduplication supports ODX. And if you run antivirus on the host you have to make sure that the filter driver supports ODX. In our case McAfee Enterprise does. So we’re good. Do make sure to exclude the cluster related folders & subfolders from on access scans and schedules scans.

Do not run DFS Namespace servers on the cluster nodes. The DfsDriver does not support ODX!

image

The solution is easy, run your DFS Namespaces servers separate from your cluster hosts, somewhere else. That’s not a show stopper.

The user experience

What it looks like to a user? Totally normal except for the speed at which the file copies happen.

Here’s me copying an ISO file from a file share on server A to a file share on server B from my Windows 8.1 workstation at the branch office in another city, 65 KM away from our data center and connected via a 200Mbps pipe (MPLS).

image

On average we get about 300 MB/s or 2.4 Gbps, which “over” a 200Mbps WAN is a kind of magic. I assure you that they’re not complaining and get used to this quite (too) fast Winking smile.

The IT Pro experience

Leveraging SMB 3 and ODX means we avoid that people consume tons of bandwidth over the WAN and make copying large data sets a lot faster. On top of that the CPU cycles and bandwidth on the server are conserved for other needs as well. All this while we can failover the cluster nodes without our business users being impacted. Continuous to high availability, speed, less bandwidth & CPU cycles needed. What’s not to like?

Pretty cool huh! These improvements help out a lot and we’ve paid for them via software assurance so why not leverage them? Light up your IT infrastructure and make it shine.

What’s stopping you?

So what are your plans to leverage your software assurance benefits? What’s stopping you? When I asked that I got a couple of answers:

  • I don’t have money for new hardware. Well my SAN is also pré Windows 2012 (DELL Compellent SC40 controllers. I just chose based on my own research not on what VARs like to sell to get maximal kickbacks Winking smile. The servers I used are almost 4 years old but fully up to date DELL PowerEdge R710’s, recuperated from their duty as Hyper-V hosts. These server easily last us 6 years and over time we collected some spare servers for parts or replacement after the support expires. DELL doesn’t take away your access to firmware &drivers like some do and their servers aren’t artificially crippled in feature set.
  • Skills? Study, learn, test! I mean it, no excuse!
  • Bad support from ISV an OEMs for recent Windows versions are holding you back? Buy other brands, vote with your money and do not accept their excuses. You pay them to deliver.

As IT professionals we must and we can deliver. This is only possible as the result of sustained effort & planning. All the labs, testing, studying helps out when I’m designing and deploying solutions. As I take the entire stack into account in designs and we do our due diligence, I know it will work. The fact that being active in the community also helps me know early on what vendors & products have issues and makes that we can avoid the “marchitecture” solutions that don’t deliver when deployed. You can achieve this as well, you just have to make it happen. That’s not too expensive or time consuming, at least a lot less than being stuck after you spent your money.