Microsoft Keeps Investing In Storage Big Time


Disclaimer: These are my musing on the limited info available about Windows Server vNext and based on the Technical Preview bits at the time of writing. So it’s not set in stone & has a time limited value.

Reading the documentation that’s already available on vNext of Windows it’s clear that Microsoft is continuing it’s push towards the software defined data center. They are also pushing high to continuous availability ever more towards the  “continuous” side of things.

It’s early days yet and we just only downloaded the Technical Preview but what do we read in What’s New in Storage Services in Windows Server Technical Preview

Storage Quality of Service

  • They are giving us more Storage Quality of Service tied into the use of SOFS as storage over SMB3. As way to many NAS solutions don’t support SMB3 or only partially (in a restricted way) it’s clear too me that self build SOFS solution on a couple of servers is and remains the best SMB3 implementation on the market and has just gotten storage QoS.

Little Rant here: To the people that claim that this is not capable of high performance, I usually laugh. Have you actually build a SOFS or TFFS with 10Gbps networking on modern enterprise grade servers line the DELL R720 or 730? Did you look at the results form that relative low cost investment? I think not, really. And if you did and found it lacking, I’ll be very impressed of the workload you’re running.  You’ll force your storage to the knees earlier than your Windows file server nowadays.

  • It’s in the SOFS layer, so this does not tie you into to Storage Space if you’re not ready for that yet but would like the benefits of SOFS. As long as you have shared storage behind the SOFS you’re good.
  • It’s policy based and can apply to virtual machines, groups of virtual machines a service or a tenant
  • The virtual disk is the level where the policy is set & enforced.
  • Storage performance will dynamically adjust to meet the policies & when tied the performance will be fairly distributed.
  • You can monitor all this.

It’s right there in the OS.

Storage Replica

This gives us “storage-agnostic, block-level, synchronous replication between servers for disaster recovery, as well as stretching of a failover cluster for high availability. Synchronous replication enables mirroring of data in physical sites with crash-consistent volumes ensuring zero data loss at the file system level. Asynchronous replication allows site extension beyond metropolitan ranges with the possibility of data loss.”

Look for Hyper-V we already had Hyper-V replica (which is also being improved), but for other workloads we still rely on the storage vendors or 3rd party solutions. But now I can have my storage replicas for service protection and continuity out of the box with Windows.  WOW!

and as we read on ..

  • Provide an all-Microsoft disaster recovery solution for planned and unplanned outages of mission-critical workloads.
  • Use SMB3 transport with proven reliability, scalability, and performance.
  • Stretch clusters to metropolitan distances.
    Use Microsoft software end to end for storage and clustering, such as Hyper-V, Storage Replica, Storage Spaces, Cluster, Scale-Out File Server, SMB3, Deduplication, and ReFS/NTFS.
  • Help reduce cost and complexity as follows:

Hardware agnostic, with no requirement to immediately abandon legacy storage such as SANs.

Allows commodity storage and networking technologies.
Features ease of graphical management for individual nodes and clusters through Failover Cluster Manager and Microsoft Azure Site Recovery.

Includes comprehensive, large-scale scripting options through Windows PowerShell.

  • Helps reduce downtime, and increase reliability and productivity intrinsic to Windows.
  • Provide supportability, performance metrics, and diagnostic capabilities.

I have gotten this to work in the lab with some trial and error but this is the Technical Preview, not a finish product. If they continue along this path I’m pretty confident we’ll have functional & operational viable solution by RTM. Just think about the possibilities this brings!

Storage Spaces

Now I have not read much on Storage Space in vNext yet but I think its safe to assume we’ll see major improvements there as well. Which leads me to reaffirm my blog posy here: TechEd 2013 Revelations for Storage Vendors as the Future of Storage lies With Windows 2012 R2

Microsoft is delivering more & great software defined storage inbox. This means cost effective yet very functional storage solutions. On top of that they put pressure on the market to deliver more value if they want to stay competitive. As a customer, whatever solution fits my needs the best, I welcome that. And as a consumer of large amounts of storage in a world where we need to spend the money where it matters most I like what I’m seeing.

Tip for Microsoft: configurability, reliability and EASY diagnostics and remediation are paramount to success. Sure some storage vendor solution aren’t to great on that front either but some are awesome. Make sure your in the awesome category. Make it a great user experience from start to finish in both deployment and operations.

Tip for you: If you’re not ready for prime time with Storage Spaces , SMB Direct etc … do what I’ve done. Use it where it doesn’t kill you if you hit some learning curves. What about storage spaces as a backup target where you can now replicate the backups of to your disaster recovery site?

What You Need To Hear, Not What You Want To Hear


The usual disclaimer covers this blog. Dilbert® Life series are humorous post on corporate culture from hell and dysfunctional organizations running wild. This can be quite shocking and sobering to those who take themselves to serious. So these blog posts need to be read with a healthy dose of humor and be put into perspective. If you can’t do that, leave now. If it hits home too hard, you have other problems. It could be that you don’t like what you see in this mirror. Or perhaps …

You’re so vain, you probably think this blog is about you
You’re so vain, I’ll bet you think this blog is about you
Don’t you? Don’t you?

Many thanks to Carly Simon’s “You’re so vain” Smile

Shopaholic Organizations

There is a shocking addiction to trying to buy ones way out of problems. If the service desk process sucks then you buy a CRM package. If this doesn’t do what you hoped out of the box, have it customized. You don’t have 100% IT automation? You need to buy a CMDB! Need to track changes? Go ITIL & do ITLM/ITSM all over the board. Projects don’t respect their boundaries? Hire some PRINCE expertise. Can’t keep up with all the project & resource management? Buy a ERP and integrate it with the project management software you’ve been abusing. You have no clue what to do next? Hire management consultants! We have one for every flavor of management. Your employees suck? Hire consultants. Slow applications? Buy flash only storage and 40Gbps switches. Your employees are disengaged? Get a coach, buy a team building experience and a 5$ pizza discount coupon as an “atta boy”. Maybe you could even gamify the company to success? And if you feel all alone and misunderstood you can join all the peer groups & professional organizations you can find to play that same broken record to each other over and over again whilst hoping you catch a break to a better gig.

Whatever the problem you’re facing, there is a product to buy and help to be hired. Like a true addict you keep using more of the same in the hope it will work. Nice twist on what Einstein called the definition of insanity. Yet why do so many people think it will help, all evidence to the contrary?

The obsessive and compulsive need to buy stuff to fix or even solve problems, needs, lack of skills, knowledge and insights is staggering. Sure the world is full of people and companies that will gladly take your money. Why? Well that’s their business model. The only aim is to separate you from your money. They’ll tell you they understand you, that they’ve helped hundreds of people and businesses like you. So they’ll sell you whatever it is they sell and they couldn’t care less if you’re still around next year. Until perhaps the moment in 18 months they know they can sucker you again. The only line of defense you have against that is your own good judgment. It’s not that all of them their products or services have no value at all. The better vendors will even walk away from an engagement when it not mutually beneficial. But the core of the problem is that you are having issues and that’s your inability to deal with problems that cannot be solved by buying something. It’s very much like a shopaholic.

It’s a business model for someone

The idea that there is a an easy fix to solve the issues your facing and make sure you can shine as a successful leader instead of being stuck in your current mess is very temping one. There is always someone who understands this. Who’s ready to step up and deliver. Which would be great if it was not for a few simple rules:

  • A fool and his money are easily separated. And if not, as long as the money is good enough they’ll put in more effort.
  • Your problems are internal, they are caused by you and need to be fixed by you. Any addiction to whatever (products, services, consulting, coaching) are actually keeping you away from the solution.

image

  • You as a manager, perhaps even a leader, will have to step up. Be all you can be and if that is not enough step aside. Do the latter yourself before it’s done to you, it’s less messy that way.

Listen, when the money is gone, all that is left are your internal resources, if you’re lucky. Acting as if they don’t matter means they won’t be very engaged. All budgets are limited, but that doesn’t mean that you need to be a scrooge. It means you need to create and build a capable organization even when budgets are plentiful that can stand on its own feet. One that is able to analyze and decide independently what it needs to do and act on that. Spend your money there. Otherwise as soon as you run out, you lose all your capabilities to act. It’s like a ship without power, on top of not even not having a rudder. You’re a drift, floating between the sharks that bled you dry.

Also, if all your organization knows what to do is hire & buy everything from others it can easily replace it with a cheaper one that’s optimized that model needing 40% to 50% less employees & managers. Pure substitution play. Game over. Economics 101.

You need to get a clue, make it happen, you and your team, no one else.  But it has to start with you. If you need coaches, consultants, products just to get started you’re not going to make it.

Ouch, that hurt!

Deep down you know the painful truth. While it would indeed be great if you’d be able to hire a coach, consultant or buy service, product that can take away your pains it doesn’t work that way. You cannot purchase those magical bottles of pixie dust or unicorn tears that can put the struggles and headaches behind you allowing you to solely focus on enjoying a successful business and be forever bliss.

image

I could tell you that you’re in luck as I have a nice stash of pixie dust bottles I can use in a pinch and for a price. But that’s not it. It’s experience, knowledge, having to work and live with solutions, see the good, the bad and the ugly of both marketing, “marchitecture” in combination with grand and hopefully realistic visions of analysis & architects what’s need. The only thing this has in common with pixie dust is that is doesn’t come cheap or easy neither, but it does work Winking smile

Too many times solutions are nothing but rehashed marketing & sales pitches that succeed due to a lack of skill on both sides. All kinds of schemes are used to justify them. They don’t achieve much at all. These are often self-serving “quick fixes” to something that is as structural & often over-hyped, over complicated problem serving some people agendas.

So you spend your money and for a little while you experience the illusion that you’ve solved something. But like any addict, you, the shopaholic, will return hard and fast to reality. Poorer and sadly none the wiser. You coast from purchase to purchase never breaking this destructive pattern. You like to fool yourself into believing that you’re investing instead of spending money because you see so many successful companies buy the same products or services. It’s kind of painful and sad to watch. Some of you will blame the market, incompetent employees or dishonest vendors, lack of commitment, disobedience. While all these factors do exist and play their role it’s not the real cause of your woes. The environment you operate in is no different for you or competitors. Sure there might be a hobby business around, run by the son of a super-rich business tycoon but that’s a minority. No, the playing field is the same, so could it, however painful that thought, be you, that’s not made of the right stuff?

What if despite all your best efforts and even some pixie dust you still have issues that are killing your performance?  You can suck it up and BS your way out. Say that what you did is the best in the world and nothing more can be done. Hire consultants to audit whatever it is you want to audit (or whoever you want to put in their place if you’re really political), blame you predecessor, the lack of (upper) management vision or the current sun spots cycle. You can also really dive in and pint point where the issues are. But that’s hard, very hard. A lot harder than buying a vile of unicorn tears which seems the missing ingredient in any unrealistic project, overly ambitious architecture or design. It’s horribly difficult to obtain because it is scarce beyond imagination.

image

I’ll make you a deal. While I possess some flasks, they are the most expensive substance ever to come by. So if you require the tears of a unicorn, you’re going to need truck loads with money of large denomination kind.

But there are no unicorn tears. YOU will need to fix your problems. Forget about buying products, that’s in essence automation and optimization. If you do that to a problem you only make it bigger and worse faster. Forget about coaches and consultants, they’ll only enable you to move faster and more targeted if you know the goal, that is. They will not solve your problems. That’s your job.

Don’t try to improve things with tools and services until you really know what’s wrong. Look very deep, hard and honest at your company, your managerial results and your actions. If you only find you do things to save your own behind, cover your back and hopefully move ahead you’re not fit to lead anything at all and you’re a much a strategist as my hamster. But in defense of my hamster: he lacks any ambition.  As a leader / manager you should care a bit more. Action is needed, from you. Lip service is useless. Talk is cheap. Fear kills. Deflecting decisions and responsibility makes you lose all credibility. If you care, act like it. If you don’t care no one else will for sure. If you can’t be bothered to do the hard work, no one will. You can’t lead from behind.

So what needs to be done?

Stop what you’re doing right now. Observe, orient, decide, and act (OODA) and see the progress of intelligent decisions and watch how money invested differs in results so much from money spent. There is no substitute. You don’t need tools, coaches, taskforces, committees and services. Those are only for amplification, they are force multipliers and that’s great as long as you don’t apply them to your problems. Hard as it may sound, its (free) advise that you won’t get from a sales person. You cannot avoid your responsibilities.

The eyes of the world are upon you

You brought this on yourself. You stepped on the plate as a leader. So yes, your employees are watching and they don’t miss much what affects them. I know employees can act very entitled and be a major pain in the proverbial behinds, but this discussion isn’t about that. Do you want to know why they doubt you, don’t follow you, ignore or possibly even oppose you? Because you show no leadership and do not portray any sign of competence or insight. For the good of the company and themselves they do what they need to, with or without you. No one goes over the top anymore at the blow of the whistle. So don’t pull rank, instead try to become credible.

Setting Up A Uplink (Trunk/General) With A Dell PowerConnect 2808 or 28XX


Introduction

I was deploying a bunch of PowerConnect 2808 switches that needed to provide connectivity to multiple VLANs  (Training, Guest, …)  in a class rooms. I should have figured it out before I got there with my “assumption” based quick configuration loaded on the switches if I had just refreshed my insights in how the PowerConnect family of switches work.

image

So before we go on, here are the basics on switch port (or LAG) modes in the PowerConnect family. Please realize that switch behavior (especially for trunk mode in this context) has changed over time with more recent switches/firmware. But the current state of affairs is as follows (depending on what model & firmware you have behavior differs a bit).You can put your port or LAG in the following 3 (main) modes:

Access: The port belongs to a single untagged VLAN. When a port is in Access mode, the packet types which are accepted on the port cannot be designated. Ingress filtering cannot be enabled/disabled on an access port. So only untagged received traffic is allowed and all transmitted traffic is untagged. The setting of the port determines the VLAN of traffic. Tagged received traffic is dropped. Basically, this is what you set your ports for client devices to (printer, PC, laptop, NAS).

Trunk: In older versions this means that ALL transmitted traffic is tagged.  That’s easy. Tagged received traffic is dropped if doesn’t belong to one of the defined VLAN on the trunk. In more recent switches/firmware untagged received traffic is dropped but for one VLAN, that can be untagged and still be received. Which is nice for the default VLAN and makes for a better compatibility with other switches.

General: You determine what the rules are. You can configure it to transmit tagged or untagged traffic per VLAN. Untagged received traffic is accepted and the PVID determines the VLAN it is tagged with.  Tagged received traffic is dropped if doesn’t belong to one of the defined VLANs.

Also see this DELL link PowerConnect Common Questions Between Access, General and Trunk mode

The PowerConnect 28XX Series

These  are good switches for their price point & use cases. Just make sure you buy them for the right use case. There is only one thing I find unforgiving in this day and age: the lack of SSH/HTTPS support for management.

Go ahead fire up a 2808 and take a look at the web interface and see what you can configure. In contrast with the PC54XX/55XX etc. Series you cannot set the port mode it seems. So how can this switch accommodate trunks/general/access modes at all. Well it’s implied in the configuration of ports that seem to be set in general mode by default and you cannot change that. The good news is that with the right setting a port in general mode behaves like a port in access or trunk mode. How? Well we follow the rules above.

So we assume here that a port is in general mode (can’t be changed). But we want trunk mode, so how do we get the same behavior? Let’s look at some examples in speudo CLI. (It’s web GUI only device).

Example 1: Classic Trunk = only defined tagged traffic is accepted. All untagged traffic is dropped

switchport mode trunk
switchport trunk allowed vlan add 9, 20

So we can have the same behavior is general mode using

switchport mode general
switchport general allowed vlan add 9, 20 tagged
switchport general pvid 4095   

The PVID  of 4095 is the industry standard discard VLAN, it assign this VLAN to all untagged traffic which is dropped. Ergo this is the same as the trunk config above!

Example 2: Modern Trunk = only defined tagged traffic and one untagged VLAN is accepted

switchport mode trunk
switchport trunk allowed vlan add 9, 20
switchport trunk allowed vlan add 1 untagged

So we can have the same behavior is general mode using

switchport mode general
switchport general allowed vlan add 9, 20 tagged
switchport general pvid 1  

This example is what we needed in the classroom. And is basically what you set with the GUI. So far so good. But we ran into an issue with connectivity to the access ports in VLAN 9 and VLAN 20. Let’s look at that in the next Example

Example 3: Access port mode = only one untagged VLAN is accepted

switchport mode access
switchport access vlan 9

Switchport mode general
switchport general allowed vlan add 9 untagged
switchport general pvid 9

If you’re accustomed to the higher end PC switches you define the port in access mode and add the VLAN of you choice untagged. That’s it. Here the mode is general and can’t be changed meaning we need to set the PVID to 9 so all untagged traffic is indeed tagged with VLAN 9 on the port.

Setting Up an uplink between a PowerConnect 5548 and a 2808

Here’s the normal deal with higher range series of PowerConnect switches: you normally use the port mode to define the behavior and in our case we could go with a trunk or general mode. We use trunk, leave the native VLAN for the one untagged VLAN and add 9 and 20 as tagged VLANs.

The “trunk” port of LAG is left on the default PVID

image

So an “access” port for VLAN 9 is is achieved by setting the PVID to 9

image

And an “access” port for VLAN 20 is achieved by setting the PVID to 20

image

While the VLAN  membership settings are what you’d expect them to be like on the higher end PowerConnect models:

VLAN 1 (native)

image

VLAN 9 (Corp)

image

VLAN 20 (Guest)

image

If it’s the first time configuring a PC2808 you might  totally ignore the fact that needed to do some extra work to make traffic flow. So to recap what you need to do  As described above there is no selection of access/general/trunk … on a PowerConnect 2808. The port or the lag is “implicitly” set to general and the extra settings of the PVID and adding tagged/untagged VLANs will make it behave as general, trunk or access.

  • The trick is to set any other VLAN than the default 1 to tagged on the port or LAG you’ll use as uplink. So far things are quite “standard PowerConnect”.
  • You set the VLAN membership of your “access” ports to untagged to the VLAN you want them to belong to.
  • After that in on the “access” ports you set the PVID to the VLAN you want the port to belong to. If you do not do this the port still behaves as if it’s a VLAN 1 port. It will not get a DHCP address for that VLAN but for for the the one on VLAN 1 if there  is one, or, if you use a static IP address for the subnet of a VLAN on that port you won’t have connectivity as it’s not set to the right VLAN.

The reason we used the PowerConnect 2808 series here is that we needed silent ones (passive cooling) and they need multiple ones in the training rooms to avoid to many cables running around the place. That was the 2 minutes at the desk of the project managers quick fix to a changed requirement. The real solution of cause would have been to get 24+ outlets to the room in the correct places and add 24+ ports to the normal switch count in the hardware analysis for the building solution. But after the facts you have to roll with the flow.

DELL Has Great Windows Server 2012 R2 Feature Support – Consistent Device Naming–Which They Help Develop


The issue

Plug ‘n Play enumeration of devices has been very useful for loading device drivers automatically but isn’t deterministic. As devices are enumerated in the order they are received it will be different from server to server but also within the system. Meaning that enumeration and order of the NIC ports in the operating system may vary and “Local Area Connection 2” doesn’t always map to port 2 on the  on board NIC. It’s random. This means that scripting is “rather hard” and even finding out what NIC matches what port is a game of unplugging cables.

Consistent Device Naming is the solution

A mechanism that has to be supported by the BIOS was devised to deal with this and enable consistent naming of the NIC port numbering on the chassis and in the operating system.

But it’s even better. This doesn’t just work with on board NICs. It also works with add on cards as you can see. In the name column it identifies the slot in which the card sits and numbers the ports consistently.

In the DELL 12th Generation PowerEdge Servers this feature is enabled by default. It is not in HP servers for some reason, you need to turn in it on manually.

I first heard about this feature even before Windows Server 2012 Beta was released but as it turns out Dell has been involved with the development of this feature. It was Dell BIOS team members that developed the solution to consistently name network ports and had it standardized via PCI SIG.  They also collaborated with Microsoft to ensure that Windows Server 2012 would support all this.

Here’s a screen shot of a DELL R720 (12th Generation PowerEdge Server) of ours. As you can see the Consistent Device Naming doesn’t only work for the on broad NIC card. It also does a fine job with add on cards of which we have quite a few in this server.image

It clearly shows the support for Consistent Device Naming for the add on cards present in this server. This is a test server of ours (until we have to take it into production) and it has a quad 1Gbps Intel card, a dual Intel X520 DA card and a dual port Mellanox 10Gbps RoCE card. We use it to test out our assumptions & ideas. We still need a Chelsio iWarp card for more testing mind you Winking smile

A closer look

This solution is illustrated the in the “Device Name column” in the screen shot below. It’s clear that the PnP enumerated name (the friendly name via the driver INF file) and the enumerated number value are very different from the number in Name column ( NIC1, NIC2, NIC2, NIC4) even if in this case where by change the order is correct. If the operating system is reinstalled, or drivers changed and the devices re-enumerated, these numbers may change as they did with previous operating systems.

image

The “Name” column is where the Consistent Device Naming magic comes to live. As you can see you are able to easily identify port names as they are numbered consistently, regardless of the “Device Name” column numbering and in accordance with the numbering on the chassis or add on card. This column name will NEVER differ between identical servers of after reinstalling a server because it is not dependent on PnP. Pretty cool isn’t it! Also note that we can rename the Name column and if we choose we can keep the original name in that one to preserve the mapping to the physical hardware location.

In the example below thing map perfectly between the Name column and the Device Name column but that’s pure luck.image

On of the other add on cards demonstrates this perfectly.image

Where Does Storage QoS Live In Windows Server 2012 R2 Hyper-V


Back to basics to explain where storage QoS lives and how it works

In Windows Server 2012 R2 Hyper-V (and earlier) we have Hyper-V components called Virtualization Service Provider (VSP) and Virtualization Service Clients (VSC). In combination with the VMBUS the VSP and VSC components are what make virtualization perform well on Hyper-V.The Stor VSP/VSC are were the maximum IOPS functionality lives, aka as QoS Limit.

In a hosted hypervisor like Virtual PC or in a bare metal hypervisor without any “enlightment” the operating system inside a virtual machine is blissfully unaware of the fact it virtualized. Basically it sends hardware access requests using native drivers, but the requests are received by the virtual layer that intercepts them on behalf of the host OS by emulating hardware devices. This comes at a cost, namely performance, latency and losing device specific functionality.

In Hyper-V Microsoft provides the Integration Services (IS) for virtual machines running on Hyper-V which, in combination with the VMBus, avoids this overhead. So you should ways use them where and when possible. Two of the components in the IS are VSP and VSC. They are responsible for the communication between the Host OS or Parent Partition (where the VSP lives) and the Guest OS or Child Partition (where the VSC lives).

image

There are 4 VSP & VSC components: Network, Video, HID and Storage. As you probably guessed we’re interested in the storage VSP & VSC (storVSP.sys & storvsc.sys) for the discussion at hand. While the Stor VSP lives in the host OS and the Stor VSC in the guest OS of every VM running on the host they communicate over the VMBus we mentioned and is designed to make communications as fast as possible (it’s a communication protocol that runs in memory, i.e. it’s very fast).

image

The Minimum IOPS, also known as the Reserve is set per virtual disk but the threshold alerts for it are generated by the VHDMP. This is the VHD/VHDX parser and dependency property provider and this know all about the VHD/VHDX format with in itself is again a file on storage (DAS, CSV, SMB 3.0 File Share). This also happens to be where the Storage IO Balancer lives with which it collaborates, more on that below. You now see why QoS is not available for pass-through disk or iSCSI/FC storage in a VM, it requires a VHDX and is implemented at the virtual disk layer.

The QoS Limit (Maximum IOPS) is set at the virtual disk level via the Stor VSC and the Qos Limiter lives in the Stor VSP.

image

So what do we know:

QoS Limit (Maximum IOPS) and QoS reserve (Minimum IOPS) are implemented at the virtual disk layer. So per VHDX in a particular VM.  It’s not available yet for shared VHDX, whether on the same host or not.

Unlike QoS Limit (Maximum IOPS), which is a hard cap, QoS reserve (Minimum IOPS) is a best effort not a hard minimum. It’s used to warn us, not as an enforcement. This works at the host level, where it will detect whether the VHDX can get get the minimum IOPS configured or not and can generate alerts if this happens. This tied to the QoS IO Balancer which is improved in R2 but it will still only spreads IOPS across multiple VMs on the same host, making sure they all get a fair share.

The key point here is that this process doesn’t work across multiple hosts in a cluster, over multiple clusters and stand alone member servers that might all be attached to the same storage system. Meaning that on shared, multi purpose storage we might have an issue. What if some VMs in a dedicated 4 node Hyper-V cluster dedicated to SQL Server virtualization is eating away all the IOPS. QoS IO Balancer will give each SQL Server VM a fair share of the IOPS but only within its host in that cluster. But if a VM on another host is consuming all IOPS, that’s out of it’s scope  That’s where the max cap comes to the rescue (at the virtual disk level) if you need it. Nice but not perfect. You can see now why the storage QoS minimum is implemented at the VHDMP layer, as this which is where the IO Balancer also lives. The fairness that the IO Balancer gives you a better change that the minimal reserve might be met and if it doesn’t you’ll get notified (you need to listen an react, I hope that’s obvious).

Also don’t forget that if you still have other physical servers that run file services, SQL Server or some data crunching apps you will find that those are blissfully ignorant of your QoS IO Balancer at the Hyper-V host level and of your QoS at the Hyper-V virtual disk level.

There is no multiple host QoS, there is no cluster wide QoS and there is no storage wide QoS in Windows. Perhaps you have some QoS your SAN but most of the time this has no knowledge of Hyper-V, the cluster and the virtual machines.

So the above this gives you an idea where does Microsoft might focus it’s attention in regards to storage IOPS  management (there are many more storage capabilities on my wish list) in vNext.

Any other options available today?

Other options are storage that is smart and has knowledge about the workload. This is nice but that means that it will come at a cost. For the moment GridStore with it’s virtual controller seems to be one of the better ones out there. Now I have heard people say Microsoft doesn’t get it and they’re doing do a bad job, but I do not agree. I have spoken to many people in the community and at MSFT and they have stated, even publicly, on stage, that they will keep investing in storage feature to enhance it in the versions to come. Take a look here at TechEd 2013 Session  MDC-B345: Windows Server 2012 Hyper-V Storage Performance.

Why would I like Microsoft to keep improving storage

When talking to storage vendors serving our needs, I always have some feedback. A lot of the advanced storage features don’t always work well in real life, especially if you combine a few. Don’t believe me? Talk to some experienced Windows engineers about the sorry state of many hardware VSS providers. Or how federation across storage systems falls apart the moment you combine it with application consistent snapshots or put a real heavy load on it. Not to cool when you paid for all those licenses which are tuned into “lab only” toys. Yes sometimes as a Windows user you feel like a second class citizen in storage land. A lot of storage systems are still very much a silo. Attempts to do storage federation without a hit on performance, making it load balance across SAN building blocks whilst making all the advanced features that have knowledge of the OS and hypervisor work reliably are not moving as fast as the race for ever more IOPS.

Sure I love the notion of 2 million IOPS, especially if you can get them with random write/read IO at super low latencies Smile. But there are other, sometimes more urgent needs and those seem to fall between the cracks as the storage vendors compete with each other and forget about the needs of their customers. If some storage vendors would shut up long enough to listen to customers they might be less surprised as to why those customers are interested in Storage Spaces.

So it would be kind of nice if Microsoft can work on this an include more evolved storage QoS capabilities in the box. I also like that approach for other reasons. Basically we will do everything we can with what Windows offers us inbox. It’s cost effective as long as you keep the KISS principle in mind and design it consciously. I assure you that often too much money is spent on 3rd party software because people don’t leverage what they have in box and drop the 20/80 rule. We do and you get the best TCO/ROI for our licenses possible. We don’t spend extra money on licenses, integration and support of third party products so we can spend it where it matters the most. It also makes upgrades easier as the complexity and the number of dependencies are lower on pure in box solution.On top of that we minimalize the distinct possibility that one or more 3rd party products will hold us hostage in an older infrastructure because they don’t support new versions of Windows fast, good and complete enough for us to upgrade.

Adventures In RDMA – The RoCE Path Over DCB To Windows Server 2012 R2 SMB 3.0 Glory


Prologue

On gloomy day, it was dark, grey and cold, we gave battle with RoCE & DCB (PFC/ETS). The fight was a long one, the battle field uncharted and we had only our veteran attitude towards adversity to guide us through the switch configurations. It seemed that no man had gone that far to the edges of the Windows Server 2012 empire. And when it came to RoCE & DCB meets Didier, I needed to show it that it had been conquered and was remembered of a quote in Gladiator:

Quintus: People RoCE/DCB configs should know when they are conquered.
Maximus: Would you, Quintus? Would I?

image

After many, many lonely & unsuccessful hours dealing with Performance monitor, switch configurations, reloads, firmware, drivers & Windows we got results:

… “it’s working” … “holey s* look at those numbers” …

On that dark day in a scarcely illuminated room, in the faint glare of the monitors even the CLI  of the switches in PUTTY felt like a grim cold place. But all that changed at as the impressive results brightened up the day and made all efforts seem worthwhile. “Didier Victor” I thought as I looked away from the screen, ‘”Once more”.image

But it has been a hard won victory. And should you fight this battle? We’ll let’s discuss this a bit now we’ve got your attention. RDMA is a learning process for many of us and neither Infiniband,  iWARP or RoCE are the one that need to win at this game. It’s you, via the knowledge you’ll gain working with RDMA technologies.

SMB Direct or SMB over RDMA comes in flavors

Infiniband (Mellanox)

That’s been here for a while. Has high cost associated (depends on where you come from) and also has a psychological barrier to it. Try discussing buying 10Gbps versus Infiniband with semi technical managerial types. You’ll know what I mean.

Deploying Windows Server 2012 with SMB Direct (SMB over RDMA) and the Mellanox ConnectX-2/ConnectX-3 using InfiniBand – Step by Step

iWARP (Chelsio / Intel)

RDMA but it’s TCP/IP offloaded to the card. It can leverage DCB but doesn’t require it.

Deploying Windows Server 2012 with SMB Direct (SMB over RDMA) and the Chelsio T4 cards using iWARP – Step by Step

RoCE (Mellanox)

“Infiniband over Ethernet” > so you “NEED” (no not a real hard requirement) DBC with PFC/ETS (DCBx can be handy) for it to work best. No need for Congestion Notification as it’s for TCP/IP but could be nice with iWARP (see above). Do note that you’ll need to configure your switches for DCB & that’s highly dependent on the vendor & even type of switches.

Deploying Windows Server 2012 with SMB Direct (SMB over RDMA) and the Mellanox ConnectX-3 using 10GbE/40GbE RoCE – Step by Step

Here’s an older overview of RDMA flavor’s pros & cons:image

Please see Jose Barreto’s excellent work on explaining SMB 3.0 over RDMA in his presentations at SNIA, TechEd and on his blog.

While I have heard of two people I have in my network working with Infiniband for SMB Direct and Windows Server 2012 (R2) most of us are doing 10Gbps. Pricing for Infiniband has a bad reputation. Not because Infiniband is super costly compared to 10/40Gbps (I’m told by most people who ask quotes are positively surprised) but when you can’t afford a Porsche you’re not shopping for a Ferrari either.  Especially not when a mid size sedan will serve al of your needs above and beyond the call of duty. On top of that you might have bought all that nice “converged network ready” 10Gbps gear some years ago. Some of us may be working towards 40Gbps but most are 10Gbps shops. My 40Gbps is “limited” to the inter links & uplinks. Meaning that we either go for iWarp or RoCE.

RoCE or iWARP

Which one is best of those two? Well, as the line is drawn between vendors. RoCE today equals Mellanox (yes the Infiniband vendor, RoCE is sometimes called “Infiniband on layer 4 over Ethernet layer 2”) and iWarp means Chelsio or Intel (their cards look a bit old in the teeth however).

You’ll find comparisons by both vendors claiming superiority for varied reasons. Here’s the Mellanox side http://www.mellanox.com/pdf/whitepapers/WP_RoCE_vs_iWARP.pdf & here’s Chelsio’s take http://www.chelsio.com/roce/ & http://www.moderntech.com.hk/sites/default/files/whitepaper/V09_iWAR_Summary_WP_0.pdf. It’s good to look at your needs and map them. But I cannot declare a winner. I did notice that at least one vendor of SOFS/CiB uses iWarp. Is that a statement? And if so about what? Price? Easy of use? Perfomance/Cost?

What I do find is that Chelsio is really hacking into RoCE as you can see here http://www.chelsio.com/wp-content/uploads/2011/05/RoCE-The-Grand-Experiment1.pdf, http://www.chelsio.com/roce-whitepaper/, http://www.chelsio.com/wp-content/uploads/2011/05/RoCE-FAQ-1204121.pdf So that begs the question are the right or are the scared of RoCE, as the Infiniband boys are out to eat their lunch?

My take on this for now

iWarp is way easier to get started. That’s for sure. RoCE  is firmware sensitive (NIC, Switches), driver sensitive (NIC). Configuring your switches (DCB) now is usually followed by a rebooting that switch (so you might not do that so easily in production and depending on where in the stack those switches live you really need to Force10 VLT or Cisco vPC, Arista MLAG  or a independent redundant switches to get away with it. RoCE loves green field. Stacking I hear you say? I don’t like stacking on that spot of the stack as firmware updates will get you to suffer through a single point of failure.

Disclaimer: RoCE in itself does not  DEMAND/REQUIRE DCB but the consensus is that it will work better, especially under heavy load. Weather SMB Direct over RoCE requires DCB is another question. For all practical purposes I’m working from the prerequisite it does for a production environment. But as you can do RoCE RDMA between to NIC with no DCB switch in between this indicates that the hard requirement for DCB is not there. Mind you not using DCB might not be smart in regards to QoS & error handling (no TCP/IP goodness handling this for you). But I’m no expert on this subject. Paul Grun however is and he’s involved with RoCE at  https://www.openfabrics.org/component/search/?searchword=Paul+grun&ordering=&searchphrase=all They tend to know their stuff. Read some of the comments below this article and you’ll know a lot http://www.hpcwire.com/hpcwire/2010-04-22/roce_an_ethernet-infiniband_love_story.html But PFC isn’t Walhalla either and some claim you can just forget about it and build non blocking networks. I guess you could if your pockets are deep enough Smile. And you might go a very long way without the need for RDMA. Many do … and when you talk to some network people & vendors they can’t agree either as everyone is on the same learning curve but from a different perspective. There is no one size fits all & it all depends.

iWarp doesn’t require DCB so you can get away with cheaper switches. Or, not so cheap switches that don’t support DCB (choose wisely). So cheaper switches is probably true on the low end. But, even very economically priced switches from DELL have good DCB support. Some other vendors who are more expensive don’t.

DCB is uncharted terrain for SMB Direct purposes & new to many for us. So if you want to do RDMA the easy way  … go iWARP. As said, the use of DCB for PFC/ETS is not mandatory in that case, you’ll get great results and it’s easy.  Mind you, you’ll still be dabbling with DCB if you want to do lossless magic in the switches Smile. Why you say? Well, that “converged network” story makes it kind of interesting to do so and PFC, DCBx/TLV is generic and can be leveraged for other things than iSCSI or FCoE.  And for all practical purposes SMB 3.0 with SMB Direct is a storage protocol since Windows Server 2012 made it so (CSV). Or do you do DCB for iSCSI/FCoE & iWarp for SMB Direct? After all there’s only 2 lossless queues to be had. But hey how many do you need? Choices, choices and no vast pool of experienced practitioners yet.

iWARP routes, it’s not bound by a single Ethernet broadcast domain. That could be useful info depending on your environment & needs. I’ll note that I leverage RDMA for East-West traffic, not north south & as such this could not be an issue. The time that I do “Shared Nothing Live Migration" from on premise to the cloud has not arrived yet.

The Mellanox cards in my neck of the woods were 35% cheaper than Chelsio (SFP+)

What about the scalability? “iWarp doesn’t scale that well” is stated left and right but I think that might often be based on older information. Chelsio makes a strong case for iWARP scalability. Especially when it comes to long distances, multiple hops & routing.

Again, your mileage may vary. But for “the smaller environments” who want to leverage RDMA with SMB 3.0 I’d say that iWarp is the easiest path to go & will do just fine. Now if you’re already into lossless Ethernet for iSCSI or working with FCoE you might have all the hardware you need & the experience to deal with DCB. The latter might not always be true however. Most people have Lossless Ethernet for iSCSI or FCoE set up by the vendor or consultants who’ll use well defined step by step guides. These do not exist for the RoCE variant of SMB 3.0 over RDMA.

The case for RoCE can be made as well.  Some claim that high volume of connections consumes memory when using iWARP and TCP’s flow and reliability controls are less suited for large-scale datacenters & cloud deployments due to performance issues. Where iWARP does not know multicast, RoCE does and that could be important to you.

So why did I or still do RoCE?

So why did I walk the walk? Basically because just talking the talk isn’t enough. We considered it an investment in our education. DCB is not going away (the abstraction isn’t their yet and won’t be for a while) and we need to gain knowledge of it to both handle it and make informed decisions. By the way once you go to lossless you might leverage DCB/PFC with iWarp as well just like you do for iSCSI to make it lossless (leveraging DCBx/TLV). Keep in mind that DCB is key in converged networking and as such deserves your attention. That’s why I chose not to avoid it but gave battle. DCB is all over the place when it comes to converged networking (iSCSI, FCoE), so we need to learn the good, the bad and the ugly. Until that day that perhaps, the hardware stack is that good, powerful & has so much bandwidth TCP/IP never needs it built in protection for packet loss. Hmmmmmm, I remember people saying that about 10Gbps, but then they wanted to send everything over 2*10Gbps pipes and it becomes an issue again?

It’s early days yet but you have to give credit to Microsoft for getting RDMA/DCB on the radar screen of the worlds virtualization & storage admins than ever before. It’s not a well established segment yet and it will be interesting to see how this all turns out. I do know that now that I’ve figured out a thing or two about RoCE, I won’t be intimidated & won’t make choices out of fear. And do remember that if you have plenty of idle CPU cycles & 10Gbps you might not even need RDMA. The value for me and my employers is the knowledge gained. DCB has it’s role to play but we’ll leverage iWARP or RoCE without a preference. Today you have 2 choices. RoCE is the newer one while iWarp has been around longer and both have avid proponents it seems.

I know one thing. If you need or want RDMA in any existing 10Gbps environment with minimal effort & no risk to existing switch infrastructure, you’ll use iWarp it seems.

Epilogue

You sit there staring at a truckload of VMs with 120GB of memory assigned in total being evacuated in +/- 70 seconds seconds, while doing a Shared Nothing Live Migration between the same hosts and without consuming CPU load …  and have DCB for SMB 3.0 running on your switches … Yes!

image

Remember, “What we do in life, echo’s in eternity” Winking smile You might think now that I’m a bit nutty, but I assure you that in my quest to find someone who had hands on experience configuring DCB on switches for SMB Direct with RoCE I had to turn to myself as no one seems to have done it.  I’ll be sharing more info on our setup and configurations in the future. Once you wrap your head around the concepts, you understand why things are done and how. There in lies the value for me.

Future Proofing Storage Acquisitions Without A Crystal Ball


Dealing with an unknown future without a crystal ball

I’ve said it before and I’ll say it again. Storage Spaces in Windows Server 2012 (R2) is are the first steps of MSFT to really make a difference (or put a dent into) in the storage world. See TechEd 2013 Revelations for Storage Vendors as the Future of Storage lies With Windows 2012 R2 (that was a nice blog by the way to find out what resellers & vendors have no sense of humor & perspective). It’s not just Microsoft who’s doing so. There are many interesting initiatives at smaller companies to to the same. The question is not if these offerings can match the features sets, capabilities and scenario’s of the established storage vendors offerings. The real question is if the established vendors offer enough value for money to maintain themselves in a good enough is good enough world, which in itself is a moving target due to the speed at which technology & business needs evolve. The balance of cost versus value becomes critical for selecting storage. You need it now and you know you’ll run it for 3 to 5 years. Perhaps longer, which is fine if it serves your needs, but you just don’t know. Due to speed of change you can’t invest in a solution that will last you for the long term. You need a good fit now at reasonable cost with some headway for scale up / scale out. The ROI/TCO has to be good within 6 months or a year. If possible get a modular solution. One where you can replace the parts that are the bottle neck without having to to a fork lift upgrade. That allows for smaller, incremental, affordable improvements until you have either morphed into a new system all together over a period of time or have gotten out of the current solution what’s possible and the time has arrived to replace it. Never would I  invest in an expensive, long term, fork lift, ultra scalable solution. Why not. To expensive and as such to high risk. The risk is due to the fact I don’t have one of these:

http://trustbite.co.nz/wp-content/uploads/2010/01/Crystal-Ball.jpg

So storage vendors need to perform a delicate balancing act. It’s about price, value, technology evolution, rapid adoption, diversification, integration, assimilation & licensing models in a good enough is good enough world where the solution needs to deliver from day one.

I for one will be very interested if all storage vendors can deliver enough value to retain the mid market or if they’ll become top feeders only. The push to the cloud, the advancements in data replication & protection in the application and platform layer are shaking up the traditional storage world. Combine that with the fast pace at which SSD & Flash storage are evolving together with Windows Server 2012 that has morphed into a very capable storage platform and the landscape looks very volatile for the years to come. Think about  ever more solutions at the application (Exchange, SQL server) and platform layer (Hyper-V replica) with orchestration on premise and/or in the cloud and the pressure is really on.

So how do you choose a solution in this environment?

Whenever you are buying storage the following will happen. Vendors, resellers & sales people, are going to start pulling at you. Now, some are way better than others at this, some are even down right good at this whole process a proceed very intelligently.

Sometimes it involves FUD, doom & gloom combined with predictions of data loss & corruption by what seem to be prophets of disaster. Good thing is when you buy whatever they are selling that day, they can save you from that. The thing is this changes with the profit margin and kickbacks they are getting. Sometimes you can attribute this to the time limited value of technology, things evolve and todays best is not tomorrows best. But some of them are chasing the proverbial $ so hard they portray themselves as untrustworthy fools.

That’s why I’m not to fond of the real big $ projects. Too much politics & sales. Sure you can have people take care of but you are the only one there to look out for your own interests. To do that all you need to do is your own due diligence and be brave. Look, a lot of SAN resellers have never ever run a SAN, servers, Hyper-V clusters, virtualized SQL Server environments or VDI solutions in your real live production environments for a sustained period of time. You have. You are the one whose needs it’s all about as you will have to live and work with the solution for years to come.  We did this exercise and it was worth while. We got the best value for money looking out for our own interests.

Try this with a reseller or vendor. Ask them about how their hardware VSS providers & snapshot software deals with the intricacies of CSV 2.0 in a Hyper-V cluster. Ask them how it works and tell them you need to references to speak to who are running this in production. Also make sure you find your own references. You can, it’s a big world out there and it’s a fun exercise to watch their reactions Winking smile

As Aidan remarked in his blog on ODX–Not All SANs Are Created Equally

These comparisons reaffirm what you should probably know: don’t trust the whitepapers, brochures, or sales-speak from a manufacturer.  Evidently not all features are created equally.

You really have to do your own due diligence. Some companies can afford the time, expense & personnel to have the shortlisted vendors deliver a system for them to test. Costs & effort rise fast if you need to get a setup that’s comparable to the production environment. You need to device tests that mimic real life scenario’s in storage capacity, IOPS, read/write patterns and make sure you don’t have bottleneck outside of the storage system in the lab.

Even for those that can, this is a hard thing to do. Some vendors also offer labs at their Tech Centers or Solutions Centers where customers or potential customers can try out scenarios. No matter what options you have, you’ll realize that this takes a lot of effort. So what do I do? I always start early. You won’t have all the information, question & answers available with a few hours of browsing the internet & reading some brochures. You’ll also notice that’s there’s always something else to deal with or do, so give your self time, but don’t procrastinate. I did visit the Tech Centers & Solution Centers in Europe of short listed vendors. Next to that I did a lot of reading, asked questions and talked to a lot of people about their view and experiences with storage. Don’t just talk to the vendors or resellers. I talked a lot with people in my network, at conferences and in the community. I even tracked down owners of the shortlisted systems and asked to talk to them. All this was part of my litmus test of the offered storage solutions. While perfection is not of this world there is a significant difference between vendor’s claims and the reality in the field. Our goal was to find the best solution for our needs based on price/value and who’s capabilities & usability & support excellence materialized with the biggest possible majority of customers in the field.

Friendly Advice To Vendors

So while the entire marketing and sales process is important for a vendor I’d like to remind all of them of a simple fact. Delivering what you sell makes for very happy customers who’s simple stories of their experiences with the products will sell it by worth of mouth. Those people can afford to talk about the imperfections & some vNext wishes they have. That’s great as those might be important to you but you’ll be able to see if they are happy with their choice and they’ll tell you why.