The Hyper V Amigos Showcast Episode 6: Storage Spaces


Everybody is very busy and I’m a bit tires but here’s the 6th episode of the Hyper-V Amigos show cast. In this episode we get to play a bit with storage spaces in Carsten’s lab.

As always we had a lot of fun doing so and thanks to Carsten Rachfahl and the assistance of Kerstin (his charming wife, also an MVP, in Office 365) we could simulate hardware failures & film them for you!

 

Carsten & I discuss several scenarios and what’s happening during failovers. Carsten is assisting customers with this a lot so he has some of the most varied experience with storage spaces and SOFS out there!  Interesting stuff and for people who haven’t even looked at Windows Server 2012 or later yet a wake up call to start as the world is not limited to what we once knew. It’s not your daddy’s Windows anymore Winking smile

I hope you enjoy it and we’re already planning for the next one!

Microsoft Keeps Investing In Storage Big Time


Disclaimer: These are my musing on the limited info available about Windows Server vNext and based on the Technical Preview bits at the time of writing. So it’s not set in stone & has a time limited value.

Reading the documentation that’s already available on vNext of Windows it’s clear that Microsoft is continuing it’s push towards the software defined data center. They are also pushing high to continuous availability ever more towards the  “continuous” side of things.

It’s early days yet and we just only downloaded the Technical Preview but what do we read in What’s New in Storage Services in Windows Server Technical Preview

Storage Quality of Service

  • They are giving us more Storage Quality of Service tied into the use of SOFS as storage over SMB3. As way to many NAS solutions don’t support SMB3 or only partially (in a restricted way) it’s clear too me that self build SOFS solution on a couple of servers is and remains the best SMB3 implementation on the market and has just gotten storage QoS.

Little Rant here: To the people that claim that this is not capable of high performance, I usually laugh. Have you actually build a SOFS or TFFS with 10Gbps networking on modern enterprise grade servers line the DELL R720 or 730? Did you look at the results form that relative low cost investment? I think not, really. And if you did and found it lacking, I’ll be very impressed of the workload you’re running.  You’ll force your storage to the knees earlier than your Windows file server nowadays.

  • It’s in the SOFS layer, so this does not tie you into to Storage Space if you’re not ready for that yet but would like the benefits of SOFS. As long as you have shared storage behind the SOFS you’re good.
  • It’s policy based and can apply to virtual machines, groups of virtual machines a service or a tenant
  • The virtual disk is the level where the policy is set & enforced.
  • Storage performance will dynamically adjust to meet the policies & when tied the performance will be fairly distributed.
  • You can monitor all this.

It’s right there in the OS.

Storage Replica

This gives us “storage-agnostic, block-level, synchronous replication between servers for disaster recovery, as well as stretching of a failover cluster for high availability. Synchronous replication enables mirroring of data in physical sites with crash-consistent volumes ensuring zero data loss at the file system level. Asynchronous replication allows site extension beyond metropolitan ranges with the possibility of data loss.”

Look for Hyper-V we already had Hyper-V replica (which is also being improved), but for other workloads we still rely on the storage vendors or 3rd party solutions. But now I can have my storage replicas for service protection and continuity out of the box with Windows.  WOW!

and as we read on ..

  • Provide an all-Microsoft disaster recovery solution for planned and unplanned outages of mission-critical workloads.
  • Use SMB3 transport with proven reliability, scalability, and performance.
  • Stretch clusters to metropolitan distances.
    Use Microsoft software end to end for storage and clustering, such as Hyper-V, Storage Replica, Storage Spaces, Cluster, Scale-Out File Server, SMB3, Deduplication, and ReFS/NTFS.
  • Help reduce cost and complexity as follows:

Hardware agnostic, with no requirement to immediately abandon legacy storage such as SANs.

Allows commodity storage and networking technologies.
Features ease of graphical management for individual nodes and clusters through Failover Cluster Manager and Microsoft Azure Site Recovery.

Includes comprehensive, large-scale scripting options through Windows PowerShell.

  • Helps reduce downtime, and increase reliability and productivity intrinsic to Windows.
  • Provide supportability, performance metrics, and diagnostic capabilities.

I have gotten this to work in the lab with some trial and error but this is the Technical Preview, not a finish product. If they continue along this path I’m pretty confident we’ll have functional & operational viable solution by RTM. Just think about the possibilities this brings!

Storage Spaces

Now I have not read much on Storage Space in vNext yet but I think its safe to assume we’ll see major improvements there as well. Which leads me to reaffirm my blog posy here: TechEd 2013 Revelations for Storage Vendors as the Future of Storage lies With Windows 2012 R2

Microsoft is delivering more & great software defined storage inbox. This means cost effective yet very functional storage solutions. On top of that they put pressure on the market to deliver more value if they want to stay competitive. As a customer, whatever solution fits my needs the best, I welcome that. And as a consumer of large amounts of storage in a world where we need to spend the money where it matters most I like what I’m seeing.

Tip for Microsoft: configurability, reliability and EASY diagnostics and remediation are paramount to success. Sure some storage vendor solution aren’t to great on that front either but some are awesome. Make sure your in the awesome category. Make it a great user experience from start to finish in both deployment and operations.

Tip for you: If you’re not ready for prime time with Storage Spaces , SMB Direct etc … do what I’ve done. Use it where it doesn’t kill you if you hit some learning curves. What about storage spaces as a backup target where you can now replicate the backups of to your disaster recovery site?

Defragmenting your CSV Windows 2012 R2 Style with Raxco Perfect Disk 13 SP2


When it comes to defragmenting CSV it seemed we took a step back when it comes to support from 3rd party vendors. While Windows provides for a great toolset to defragment a CSV it seemed to have disappeared form 3r party vendor software. Even from the really good Raxco Perfect disk. They did have support for this with Windows 2008 R2 and I even mentioned that in a blog.

If you need information on how to defragment a CSV in Windows 2012 R2, look no further.There is an absolutely fantastic blog post on the subject How to Run ChkDsk and Defrag on Cluster Shared Volumes in Windows Server 2012 R2, by Subhasish Bhattacharya one of the program managers in the Clustering and High Availability product group. He’s a great guy to talk shop to by the way if you ever get the opportunity to do so. One bizarre thing is that this must be the only place where PowerShell (Repair-ClusterSharedVolume cmdlet) is depreciated in lieu of chkdsk.

3rd party wise the release of Raxco Perfect Disk 13 SP2 brought back support for defragmenting CSV.

image

I don’t know why it took them so long but the support is here now. It looks like they struggled to get the CSVFS (the way CSV are now done since Windows Server 2012) supported. Whilst add it, they threw in support for ReFS by the way. This is the first time I’ve ever seen this. Any way it’s here and that’s good because I have a hard time accepting that any product (whatever it does) supports Hyper-V if it can’t handle CSV, not if you want to be taken seriously anyway. No CSV support equals = do not buy list in my book.

Here’s a screenshot of Perfect disk defragmenting away. One of the CSV LUNs in my lab is a SSD and the other a HDD.

image

Notice that in Global Settings you can tweak the behavior when defragmenting optimization of various drive types, including CSVFS but you just have to leave the default on unless you like manual labor or love PowerShell that much you can’t forgo any opportunity to use it Winking smile

image

Perfect disk cannot detect what kind of disks you have behind the CSV LUN so you might want to change the optimization method if you’re running SSD instead of HHD.

image

I’d love for Raxco to comment on this or point to some guidance.

What would also be beneficial to a lot of customers is guidance on defragmentation on the different auto-tiering storage arrays. That would make for a fine discussion I think.

Windows 2012 R2 Data Deduplication Leverages Shadow Copies: “LastOptimizationResultMessage : A volume shadow copy could not be created or was unexpectedly deleted”.


When you’re investigation and planning large repositories for data (backups, archive, file servers, ISO/VHD stores, …) and you’d like to leverage Windows Data Deduplication you have too keep in mind that the maximum supported size for an NTFS volume is 64TB. They can be a lot bigger but that’s the maximum supported. Why, well they guarantee everything will perform & scale up to that size and all NTFS functionality will be available. Functionality on like volume shadow copies or snapshots. NFTS volumes can not be lager than 64TB or you cannot create a snapshot. And guess what data deduplication seems to depend on?

Here’s the output of Get-DedupeStatus for a > 150TB volume:

image

Note “LastOptimizationResultMessage      : A volume shadow copy could not be created or was unexpectedly deleted”.

Looking in the Deduplication even log we find more evidence of this.

image

Data Deduplication was unable to create or access the shadow copy for volumes mounted at "T:" ("0x80042306"). Possible causes include an improper Shadow Copy configuration, insufficient disk space, or extreme memory, I/O or CPU load of the system. To find out more information about the root cause for this error please consult the Application/System event log for other Deduplication service, VSS or VOLSNAP errors related with these volumes. Also, you might want to make sure that you can create shadow copies on these volumes by using the VSSADMIN command like this: VSSADMIN CREATE SHADOW /For=C:

Operation:

   Creating shadow copy set.

   Running the deduplication job.

Context:

   Volume name: T: (\\?\Volume{4930c926-a1bf-4253-b5c7-4beac6f689e3}\)

Now there are multiple possible issues that might cause this but if you’ve got a serious amount of data to backup, please check the size of your LUN, especially if it’s larger then 64TB or flirting with that size. It’s temping I know, especially when you only focus on dedup efficiencies. But, you’ll never get any dedupe results on a > 64TB volume. Now you don’t get any warning for this when you configure deduplication. So if you don’t know this you can easily run into this issue. So next to making sure you have enough free space, CPU cycles and memory, keep the partitions you want to dedupe a reasonable size. I’m sticking to +/- 50TB max.

I have blogged before on the maximum supported LUN size and the fact that VSS can’t handle anything bigger that 64TB here Windows Server 2012 64TB Volumes And The New Check Disk Approach. So while you can create volumes of many hundreds of TB you’ll need a hardware provider that supports bigger LUNs if you need snapshots and the software needing these snapshots must be able to leverage that hardware VSS provider. For backups and data protection this is a common scenario. In case you ask, I’ve done a quick crazy test where I tried to leverage a hardware VSS provider in combination with Windows Server data deduplication. A LUN of 50TB worked just fine but I saw no usage of any hardware VSS provider here. Even if you have a hardware VSS provider, it’s not being used for data deduplication (not that I could establish with a quick test anyway) and to the best of my knowledge I don’t think it’s possible, as these have not exactly been written with this use case in mind. Comments on this are welcome, as I had no more time do dig in deeper.

Fixing Two Small DELL Compellent Hardware Hiccups


Here’s two little tips to solve some small hardware issues you might run into with a Compellent SAN. But first, you’re never on your own with CoPilot support. They are just one phone call away so I suggest if you see these to minor issues you give them a call. I speak from experience that CoPilot rocks. They are really good and go the extra mile. Best storage support I have ever experienced.

Notes

  • Always notify CoPilot as they will see the alerts come in and will contact you for sure Smile. Afterwards they’ll almost certainly will do a quick health check for you. But even better during the entire process they keep an eye on things to make sure you SAN is doing just fine. And if you feel you’d like them to tackle this, they will send out an engineer I’m sure.
  • Note that we’re talking about the SC40 controllers & disk bays here. The newer genuine DELL hardware is better than the super micro ones.

The audible alert without any issues what so ever

We kept getting an audible alert after we had long solved any issues on one of the SANs. The system had been checked a couple of times and everything was in perfect working order. Except for that audible alarm that just didn’t want to quit. A low priority issue I know but every time we walk into the data center we were going “oh oh” for a false alert. That’s not the kind of conditioning you want. Alerts are only to be made when needed and than they do need to be acted upon!

Working on this with CoPilot support we got rid of it by reseating the upper I/O module. You can do this on the fly – without pulling SAS-cables out or so, they are redundant, as long as you do it one by one and the cabling is done right (they can verify that remotely for you if needed).

image

But we got lucky after the first one. After the “Swap Clear” was requested  every warning condition was cleared and we got rid of the audible alert beep!  Copilot was on the line with us and made sure all paths are up and running so no bad things could happen. That’s what you have a copilot for.

Front panel display dimming out on a Compellent Disk Bay

We have multiple Compellent SANs and on one of those we had a disk bay with a info panel that didn’t light up anymore. A silly issue but an annoying one as this one also show you the disk bay ID.

image

Do we really replace the disk bay to solve this one? As that light had come on and of a couple of time it could just be a bad contact so my colleague decided to take a look. First  he removed the protective cover and then, using some short & curved screw drivers, he took of the body part. The red arrow indicates the little latch that holds the small ribbon cable in place.

image

That was standing right open. After locking that down the info appeared again on the panel. The covers was screwed on again and voila. Solved.

ODX Doesn’t Support IDE But Works With Both VHDX And VHD Virtual Disk Format


This question came up recently, once again, and deserves it a little blog post. If you want to see the benefits of ODX you’ll need to connect your virtual disks to a vSCSI controller or other supported controller option. These are iSCSI, vFC, a SMB 3 File Share or a pass-through disk. But unless you have really good reason to use pass-through disks, don’t. It’s limiting you in to many ways.

Basically in generation 1 virtual machines that boot from a vIDE this rules out the system disk. So the tip here is to store your data that’s moved around in or between virtual machines in vSCSI attached VDH or (preferably) VHDX  virtual disks. If you can use generation 2 virtual machines, you’ll be able to leveraged ODX on the system partition as well as it boots from vSCSI Smile.

It goes without saying you need to store any virtual disks  involved on ODX capable LUNs via iSCSI, FC, FCoE, SMB 3 File Share or SAS for ODX to be available to the virtual machine.

Also beware that ODX only works on NTFS partitioned disks. The files cannot be compressed or encrypted.  Sparse files are not supported either. And finally, the volume cannot be BitLocker protected.

Here’s a screenshot of a copy of 30GB worth of ISO files to a VHDX attached to a vSCSI controller:image

Here’s a screenshot of a copy of 30GB worth of ISO files to a VHDX attached to a vIDE controller.

image

You’ll notice quite a difference. Depending on the load on the controllers/SAN it’s on average 3 times slower than the same action to a VHDX disk on a vSCSI controller.

Hyper-V UNMAP Does Work With SAN Snapshots And Checkpoints But Not Always As You First Expect


Recently I was asked to take a look at why UNMAP was not working predictably  in a Windows Server 2012 R2 Hyper-V environment. No, this is not a horror story about bugs or bad storage solutions. Fortunately, once the horror option was of the table I had a pretty good idea what might be the cause.

San snapshots are in play

As it turned out everything was indeed working just fine. The unexpected behavior that made it seem that UNMAP wasn’t working well or at least at moments they didn’t expected it was caused by the SAN snapshots. Once you know how this works you’ll find that UNMAP does indeed work predictably.

Snapshots on SANs are used for automatic data tiering, data protection and various other use cases. As long as those snapshots live, and as such the data in them, UNMAP/Trim will not free up space on the SAN with thinly provisioned LUNs. This is logical, as the data is still stored on the SAN for those snapshots, hard deleting it form the VM or host has no impact on the storage the SAN uses until those snapshots are deleted or expire. Only what happens in the active portion is directly impacted.

An example

  • Take a VM with a dynamically expanding VHDX that’s empty and mapped to drive letter D. Note the file size of the VHDX and the space consumed on the thinly provisioned SAN LUN where it resides.
  • Create 30GB of data in that dynamically expanding  virtual hard disk of the virtual machine
  • Create a SAN snapshot
  • Shift + Delete that 30GB of data from the dynamically expanding virtual hard disk in the virtual machine. Watch the dynamically expanding VHDX  grow in size, just like the space consumed on the SAN
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: it remains +/- the same.
  • Shut down the VM and look at the size of the dynamic VHDX file. It shrinks to the size before you copied the data into it.
  • Boot the VM again and copy 30GB of data to the dynamically expanding VHDX in the VM again.
  • See the size of the VHDX grow and notice that the space consumed on the SAN for that LUN goes up as well.
  • Shift + Delete that 30GB of data from the dynamically expanding  virtual hard disk in the virtual machine
  • Run Optimize-Volume D –retrim to force UNMAP and watch the space consumed of the Size of the LUN on the SAN: It drops, as the data you delete is in the active part of your LUN (the second 30GB you copied), but it will not drop any more than this as the data kept safe in the frozen snapshot of the LUN is remains there (the first 30GB you copied)
  • When you expire/delete that snapshot on the SAN  we’ll see the size on the thinly provisioned SAN LUN  drop to the initial size of this exercise.

I hope this example gave you some insights into the behavior

Conclusion

So people who have snapshot based automatic data tiering, data protection etc. active in their Hyper-V environment and don’t see any results at all should check those snapshot schedules & live times. When you take them into consideration you’ll see that UNMAP does work predictably, all be it in a “delayed” fashion Smile.

The same goes for Hyper-V checkpoints (formerly known as snapshots). When you create a checkpoint the VHDX is kept and you are writing to a avhdx (differencing disk) meaning that any UNMAP activity will only reflect on data in the active avhdx file and not in the “frozen” parent file.