VEEAM Invests in Faster & More Efficient Data Protection With Backup & Replication 8


Ever more data to protect without breaking the systems or the bank

One of my major concerns today in IT, weather it is on premises or in the cloud, is the cost, time, reliability and feasibility of backup and restores. This true for most of us. Due to the environments in which I deliver my services my main issue with backups is the quantity of data. The amount of data is staggering and growth is not showing a downward trend.

The big four: CPU, Memory, Network & Storage

Over the years we have seen a vast increase in compute, memory, network and storage capabilities and pricing. CPUs are up to 18 cores per socket as I write this. DDR4 memory is here and the cost is relatively low. We have affordable 10Gbps networking to throw at the problem as well or in some case 8 to 16Gbps Fibre Channel. So when it comes to CPU, memory and network we’re pretty well served.

Storage is evolving as well and we’re getting ever bigger and, if you have the budget that is, faster storage arrays in different flavors. But it remains a challenge. First of all to get the right amount of IOPS and storage capacity at an affordable price point is a balancing act. Secondly when dealing with backups we need to manage the source IOPS & latency against the target. But that’s not all, while you might want to squeeze every last IOPS & 1ms latency out of your backup target you can’t carelessly do that to your source storage. If you do, this might constitute a Denial Of Service attack against your applications and services. Even today storage QoS is either non existent, in it’s infancy or at best limited to particular workloads on storage solutions.

The force multiplier: Backup software capabilities & approaches

If you’ve made sure the above 4 resources are not your killer bottle neck the backup software, methods algorithms and the approach used will be either your biggest problem or you best friends. You need your backup software to be:

  • Capable
  • Scalable
  • Fast
  • Configurable
  • Scale Out

There are some challenging environments out there. To deal with this backup software should be able to leverage the wealth of capabilities compute, network, memory & storage are offering to protect large amounts of data reliable and fast. This should be done smart and in an operationally supportable manner. VEEAM has been working on this for a long time and they keep getting better at this with every release and it allows for scale out designs in regards to backups targets.

VEEAM Backup & Replication 8.0

There are many improvements in v8 but a couple stand out.

image

Consistency groups (Hyper-V)

Backup jobs can execute more than one VM backup task simultaneously from the same volume snapshot with “Allow Processing of Multiple VMs with a single volume snapshot”.

image

This means you can reduce the number of snapshots significantly where in the past you needed a volume snapshot per VM. VEEAM limits the the maximum amount of VMs you can backup per snapshot to 4 when using software VSS and to eight with hardware VSS. They do this because under heavy load VSS/CSV sometimes has issues. This number can be tweaked to fit your needs (no all environments are created equally) with 2 registry values under HKLM\SOFTWARE\Veeam\Veeam Backup and Replication key:

  • MaxVmCountOnHvSoftSnapshot (DWORD)
  • MaxVmCountOnHvHardSnapshot (DWORD) registry values

Reducing the number of snapshots to be taken is good as it saves resources, speeds up things & as VSS can be finicky, not needing more than absolutely necessary is a good thing.

Backup I/O Control.

Another improvement is backup I/O Control which delivers capability to dynamically adjust the number of backup tasks based on IOPS latency. Under Options you’ll find a new Tabbed sheet, I/O Control. It contains the parallel processing option that used to be under “Advanced” tab in Veeam B&R 7.

image

The idea is to move to a more “policy driven” approach for handling the load backups can put on the storage. Until now we’d configure a number of X amounts of tasks to run against the source storage in order to keep IOPS/Latency in check. But this is very static and in a dynamic / elastic “cloud” world this isn’t very flexible nor is it feasible to keep tuned to the best number for the current workload.

I/O Control let’s you set limits on how much latency is acceptable for your data stores. Removing or adding VMs to the data store won’t invalidate your carefully set number of tasks allowed as it’s now the latency that’s used to dynamically tune that number for you.

I/O control has two settings:

 “Stop assigning new tasks to datastore at: X ms” :VEEAM looks at the latency (IOPS) before assigning a proxy (backup target) to a virtual disk or won’t launch the task until the load has dropped.  This prevents the depletion of IOPS by launching to many backups.

“Throttle I/O of existing tasks at: Y ms”: This will throttle the IO of already running  backup jobs when needed due to some application workloads in the VMs running on the source storage kicking in. The backups will be throttled so they’ll take longer but they won’t kill the performance of the applications while they are running.

These two setting allow for the dynamic and on the fly tweaking of the number of backups tasks running as well as their impact on the storage performance. Once you have determined what latency values are acceptable to you you’re done, VEEAM handles the tweaking for you. The default values seems to reflect industry best practices (sustained > 20 ms is considered problematic)

The below screenshot is for the backup job log and shows latency being monitoredclip_image002

With VEEMA B&R v8 Enterprise + You can even do this per data store, meaning you can optimize this per backup source. This recognizes that is no “one sizes fits all perfectly” and allows for differentiation. Yet it does so in a way that does not compromise on the simplicity of use that VEEAM offers. This sounds easy but from experience I know this isn’t. VEEAM manages to offer a great balance between simplicity and functionality for companies of all sizes.

Select “Configure”

image

In the “Datastore Latency Settings” you can add one, more or all data store you are protecting with VEEAM. This allows for differentiation when you have CSV that are used for SQL Server VMs versus stateless web servers of or other workloads that are not storage I/O intensive.

image

Select the datastore (in our case the CSV volumes in Hyper-V Cluster)

image

By selecting the desired datastore and clicking “Edit”  you can individually adjust the settings for that datastore.

image

Conclusion

It looks like we have some great additional capabilities in an already very good solution. I’ll be using these new capabilities in real life scenarios to see how these work out for us and optimize the backups of the virtualized environment under my care. Hardware VSS Providers, SANs, CSV’s normally need some tweaking and care to make them run well, so that’s what we’ll be doing.

Workshop Datacenter Modernization -Microsoft Technical Summit 2014 Germany (Berlin)


While speaking (What’s new in Failover Clustering in Windows Server 2012 R2) and attending the Microsoft Technical Summit 2014 I’m taking the opportunity to see how Microsoft Germany and partners are doing a workshop which is based on the IT Camps they have been delivering over the past year. There is a lot of content to be delivered and both trainers Carsten Rachfahl (Rachfahl IT-Solutions GmbH) and Bernhard Frank (Partner Technology Strategist (Hosting), Microsoft) are doing that magnificently.

One thing I note is that they sure do put in a lot of effort. The one I’m attending requires some server infrastructure, a couple of switches, cabling for over 50 laptops etc. These have been neatly packed into road cases and the 50+ laptops had been placed, cabled and deployed using PXE boot /WDS the night before. Yes even in the era of cloud you need hardware especially if you’re doing an IT Camp on “Datacenter Modernization” (think private & hybrid infrastructure design and deployment).

image

Not bypassing this aspect of private cloud building adds value to the workshop and is made possible with the help of Wortmann AG. Yes the attendees get to deploy storage spaces, Scale Out File Server, networking etc. They don’t abstract any of the underlying technologies away, I like that a lot, it adds value and realism.

I’m happy to see that they leverage the real world experience of experts (fellow Hyper-V MVP Carsten Rachfahl) who helps hosting companies and enterprises deploy these technologies. Storage, Scale Out File Server, Hyper-V clusters, System Center and self service (Azure Pack) are the technologies used to achieve the goals of the workshop.

image

The smart use of PowerShell (workflows, PDT) allows to automate the process and frees up time to discuss and explain the technologies and design decisions. They take great care to explain the steps and tools used so the attendees can use these later in their own environments. Talking about their own experiences and mistakes helps the attendees avoid common mishaps and move along faster.

image

The fact that they have added workshops like this to the summit adds value. I think it’s a great idea that they are held on the last day as this means that attendees can put the information they gathered from 2 days of sessions into practice. This helps understanding the technologies better.

There is very little criticism to be given on the content and the way they deliver it. I have to say that it’s all very well done. Perhaps they make private cloud look a bit too easy Winking smile. Bernard, Carsten, well done guys, I’m impressed. If you’re based in Germany and you or your team members need to get up to speed on how these technologies can be leveraged to modernize your data center I can highly recommend these guys and their workshops/IT Camps.

Hyper-V Guest Protected Network Testing Tip


I’ve been pinged a few times over the years with people saying that the new protected network feature does not work for them. This setting is set per vNIC of the virtual machine.

image

The issue lies in how & what people test, bar any number of other reasons why a live migration might not start or complete.  What people tend to do is disable a NIC to which the vSwitch is connected. But a Protected Network is about media sense loss detection of network disconnects and this requires the NIC to be actually there and enabled. Remember, we’re talking about the NIC on the host connected to the virtual switch. A physical link failure here, meaning that the virtual switch the protected virtual network adapter no longer has network connectivity, will lead to all the VMs with  the protected network enabled do be live migrated to another node in the cluster that still has a connected virtual switch for the same network.  The latter is to avoid  senseless virtual machine migrations to other nodes that might also have lost connectivity due to a failed physical switch.

So the point is that testing by disabling the NIC in the OS will not do. You need to unplug the cables to the virtual switch or disable the port on the switch or even shutdown the switch (a bit drastic).

Do note that it can take a little time for the live migration to kick in,  it varies a bit, but it beats having to wait for the issue to be resolved. You’ll see event id 1255 logged when the VMs lose network connectivity:image

In this day and age with NIC teaming to redundant switches & the fact that you might be using converged networking these tests aren’t as simple as you might think. Also don’t pull out all if the cables used for clustering if you want the cluster to be able to help you out here with a live migration. Because when the other cluster nodes can’t talk to the node your testing in any way it will be kicked out of the cluster, the VMs will go down, be moved to another node and started. This might seem obvious but if you a are using a teamed 10Gbps solution in a converged setup this might cause exactly that.

Another thing to note is that if you have a virtual switch with a dedicated backup network exposed to hosts & VMs that can tolerate down time you might want to disable protected networks on that vNIC as you don’t want to live migrate the VMs of when that network has an issue. It all depends on your needs & tastes.

Last but not least please behave, and don’t do anything silly in production when testing this. Be careful in your testing.

Golden Nuggets: Windows Server 2012 R2 Failover Cluster CSV Placement Policy


Some enhancements only become truly evident to people when they see them in action. For many features this means something need to go wrong before they kick in. Others are more visible during normal operations. This is the case with the CSV enhancements in Windows Server 2012 R2 Failover Clustering.

One golden nugget here is the CSV placement policy (which really shines in combination with SOFS/Storage Spaces). This will spread ownership of the CSV amongst the cluster nodes to ensure a balanced distribution. In a failover cluster, one node is the “coordinator node” (owner) for a CSV. The coordinator node owns the physical disk resource that is associated with a logical unit (LUN). All I/O operations for the File System on that LUN are are through the coordinator node. In previous versions there is no automatic rebalancing of coordinator node assignment. This means that all LUNs could potentially be owned by the same node. In storage spaces & SOFS scenarios becomes even more important.

The benefits

  • It helps all nodes carry their share of the workload as it load balances the disk I/O.
  • Failovers of CSV owners are potentially quicker and more predictable/consistent as an even distribution ensures that no one node owns a disproportionate number of CSVs.
  • When losing storage access the number of CSVs that are in redirected mode is potentially less as they are evenly distributed. In an unbalanced cluster it could be for all of them in a worse case scenario.
  • When using SOFS with Storage Spaces it makes sure the Storage Spaces Ownership is distributed fairly.

When does it happen

  • Each time a node leaves or joins the cluster. This means you don’t need to intervene manually or via PowerShell to get an even distribution. This goes for both exiting nodes as when adding a new node. The new node will get a CSV assigned if there is any on surplus on one of the existing nodes.
  • The process also works when you start a failover cluster when it has shut down.

When customers see this in action (it’s most obvious when then add a node as then they are normally watching) they generally smile as the cluster does it job getting  the best possible results out of their hardware.

Hot add/remove of network adapters and enabling device naming in Windows Server Hyper-V


One of the cool new features in Window Server vNext Hyper-V (in Technical Preview at the moment of writing) is that you gain the ability to hot add and remove NICs.  That might sound not to important to the non initiated in the fine art of virtualization & clouds. But it is. You see anything you can do to a VM configuration wise that does not require downtime is good. That’s what helps shift the needle of high availability to that holey grail of continuous availability.

On top of that the names of the network adapters are now exposed to the guest. Which is also great. It’s become lot easier to automate the VM network configuration.

Hot adding NICs can be done via the GUI and PoSh.

image

But naming the network adapter seems a PowerShell only game for now (nothing hard, no sweat). This can be done during creation of the network adapter. Here I add a NIC to VM RAGNAR connected to the ISCSI-GUEST switch & named ISCSI.

Add-VMNetworkAdapter –VMName RAGNAR –SwitchName ISCSI-GUEST –Name ISCSI

Now I want this name to be reflected into the VM’s NCI configuration properties. This is done by enabling device naming. You can do this via the GUI or PoSh.

Set-VMNetworkAdapter –VMName RAGNAR –Name ISCSI –Devicenaming On

That’s it.

image

So now let’s play with our existing network adapter “Network Adapter” which connects our Hyper-V guests to the LAN via the HYPER-V-GUESTS switch? Can you rename it?  Yes, you can. In PoSh run this:

Rename-VMNetworkAdapter –VMName RAGNAR –Name “Network Adapter” –NewName “LAN”

And that’s it. If you refresh the setting of your VM or reopen it you’ll see the name change.

image

The one thing that I see in the Tech Preview is that I need to reboot the VM to see the Name change reflected inside the VM in the NIC configuration under advance properties, called “Hyper-V Network Adapter Name”. Existing one show their old name and new one are empty until then.

image

 

Two important characteristics to note about enabling device naming

You notice that one can edit this field in NIC configuration of the VM but it doesn’t move up the stack into the settings of the VM. Security wise this seems logical to me and it’s not intended to work. It’s a GUI limitation that the field cannot be disabled for editing but no one can try and  be “funny” by renaming the ethernet adapter in the VMs settings via the guest Winking smile

Do note that this is not exactly the same a Consistent Device Naming in Windows 2012 or later. It’s not reflected in the name of the NIC in the GUI, these are still Ethernet, Ethernet 2, … Enable device naming is mainly meant to enable identifying the adapter assigned to the VM inside the VM, mainly for automation. You can name the NIC in the Guest whatever works best for you and you’ll never lose the correlations between the Network adapter in your VM settings and the Hyper-V Network Adapter name in the NIC configuration properties. In that respect is a bit more solid/permanent even if some one found it funny to rename all vNICs to random names you’re still OK with this feature.

That’s it off, you go! Download the Technical Preview bits from MSDN, start exploring and learning. Knowledge is seldom a bad thing Winking smile

The Hyper V Amigos Showcast Episode 6: Storage Spaces


Everybody is very busy and I’m a bit tires but here’s the 6th episode of the Hyper-V Amigos show cast. In this episode we get to play a bit with storage spaces in Carsten’s lab.

As always we had a lot of fun doing so and thanks to Carsten Rachfahl and the assistance of Kerstin (his charming wife, also an MVP, in Office 365) we could simulate hardware failures & film them for you!

 

Carsten & I discuss several scenarios and what’s happening during failovers. Carsten is assisting customers with this a lot so he has some of the most varied experience with storage spaces and SOFS out there!  Interesting stuff and for people who haven’t even looked at Windows Server 2012 or later yet a wake up call to start as the world is not limited to what we once knew. It’s not your daddy’s Windows anymore Winking smile

I hope you enjoy it and we’re already planning for the next one!

Microsoft Keeps Investing In Storage Big Time


Disclaimer: These are my musing on the limited info available about Windows Server vNext and based on the Technical Preview bits at the time of writing. So it’s not set in stone & has a time limited value.

Reading the documentation that’s already available on vNext of Windows it’s clear that Microsoft is continuing it’s push towards the software defined data center. They are also pushing high to continuous availability ever more towards the  “continuous” side of things.

It’s early days yet and we just only downloaded the Technical Preview but what do we read in What’s New in Storage Services in Windows Server Technical Preview

Storage Quality of Service

  • They are giving us more Storage Quality of Service tied into the use of SOFS as storage over SMB3. As way to many NAS solutions don’t support SMB3 or only partially (in a restricted way) it’s clear too me that self build SOFS solution on a couple of servers is and remains the best SMB3 implementation on the market and has just gotten storage QoS.

Little Rant here: To the people that claim that this is not capable of high performance, I usually laugh. Have you actually build a SOFS or TFFS with 10Gbps networking on modern enterprise grade servers line the DELL R720 or 730? Did you look at the results form that relative low cost investment? I think not, really. And if you did and found it lacking, I’ll be very impressed of the workload you’re running.  You’ll force your storage to the knees earlier than your Windows file server nowadays.

  • It’s in the SOFS layer, so this does not tie you into to Storage Space if you’re not ready for that yet but would like the benefits of SOFS. As long as you have shared storage behind the SOFS you’re good.
  • It’s policy based and can apply to virtual machines, groups of virtual machines a service or a tenant
  • The virtual disk is the level where the policy is set & enforced.
  • Storage performance will dynamically adjust to meet the policies & when tied the performance will be fairly distributed.
  • You can monitor all this.

It’s right there in the OS.

Storage Replica

This gives us “storage-agnostic, block-level, synchronous replication between servers for disaster recovery, as well as stretching of a failover cluster for high availability. Synchronous replication enables mirroring of data in physical sites with crash-consistent volumes ensuring zero data loss at the file system level. Asynchronous replication allows site extension beyond metropolitan ranges with the possibility of data loss.”

Look for Hyper-V we already had Hyper-V replica (which is also being improved), but for other workloads we still rely on the storage vendors or 3rd party solutions. But now I can have my storage replicas for service protection and continuity out of the box with Windows.  WOW!

and as we read on ..

  • Provide an all-Microsoft disaster recovery solution for planned and unplanned outages of mission-critical workloads.
  • Use SMB3 transport with proven reliability, scalability, and performance.
  • Stretch clusters to metropolitan distances.
    Use Microsoft software end to end for storage and clustering, such as Hyper-V, Storage Replica, Storage Spaces, Cluster, Scale-Out File Server, SMB3, Deduplication, and ReFS/NTFS.
  • Help reduce cost and complexity as follows:

Hardware agnostic, with no requirement to immediately abandon legacy storage such as SANs.

Allows commodity storage and networking technologies.
Features ease of graphical management for individual nodes and clusters through Failover Cluster Manager and Microsoft Azure Site Recovery.

Includes comprehensive, large-scale scripting options through Windows PowerShell.

  • Helps reduce downtime, and increase reliability and productivity intrinsic to Windows.
  • Provide supportability, performance metrics, and diagnostic capabilities.

I have gotten this to work in the lab with some trial and error but this is the Technical Preview, not a finish product. If they continue along this path I’m pretty confident we’ll have functional & operational viable solution by RTM. Just think about the possibilities this brings!

Storage Spaces

Now I have not read much on Storage Space in vNext yet but I think its safe to assume we’ll see major improvements there as well. Which leads me to reaffirm my blog posy here: TechEd 2013 Revelations for Storage Vendors as the Future of Storage lies With Windows 2012 R2

Microsoft is delivering more & great software defined storage inbox. This means cost effective yet very functional storage solutions. On top of that they put pressure on the market to deliver more value if they want to stay competitive. As a customer, whatever solution fits my needs the best, I welcome that. And as a consumer of large amounts of storage in a world where we need to spend the money where it matters most I like what I’m seeing.

Tip for Microsoft: configurability, reliability and EASY diagnostics and remediation are paramount to success. Sure some storage vendor solution aren’t to great on that front either but some are awesome. Make sure your in the awesome category. Make it a great user experience from start to finish in both deployment and operations.

Tip for you: If you’re not ready for prime time with Storage Spaces , SMB Direct etc … do what I’ve done. Use it where it doesn’t kill you if you hit some learning curves. What about storage spaces as a backup target where you can now replicate the backups of to your disaster recovery site?