Where SMB Direct, RoCE, RDMA & DCB fit into the stack


I’m assuming most of you are at lest familiar with the concept of converged networking and SMB Multichannel and SMB Direct. This is not going to be a lesson on these subjects. We’re just setting the stage here for our simple demo configuration and its relation to real world scenarios. This to remind you of the why and where of what we do an demo in our next blog posts on SMB Direct over RoCE with two DCB features: Priority Flow Control (PFC) and Enhanced Transmission Selection (ETS).

Generalized and simplified a modern virtualized data center network looks a lot like this:

image

It’s more or less converged, that means all kinds of traffic move over the same infrastructure, which is great for standardization and your budget. Unless you get into performance issues. That’s where QoS can help.  As we’re doing SMB Direct over RoCE we’ll use DCB to handle QoS. Mind you, QoS is an aid and it will not help to do too much over too little bandwidth. Let’s zoom in a bit on the Hyper-V & storage side of things. In general the RDMA capable variant of a  modern SOFS / Hyper-V environment network looks as below in a bit more detail:

image

The RDMA capable traffic is SMB Direct over RoCE in this use case. This is used for Live Migration, CSV Traffic & storage traffic to the SOFS Server.

DCB cannot distinguish between these SMB traffic uses cases. It’s all RDMA traffic over port 445 the DCB configuration will not distinguish between these. That’s why on top of DCB we leverage SMB Bandwidth Limit (see https://blog.workinghardinit.work/2013/09/03/preventing-live-migration-over-smb-starving-csv-traffic-in-windows-server-2012-r2-with-set-smbbandwidthlimit/). This prevents the live migration traffic form pushing aside the Storage traffic. This is a windows configured feature and does not rely on DCB or other forms of QoS.

To make sure cluster traffic itself, backups, data copies, management etc… don’t starve each other we implement QoS leveraging DCB (the ETS part). As we need to use DCB with RoCE in real worlds scenarios to make it lossless (the PFC part) and as you do not mix different QoS approaches on the same networks stack we stick with DCB for the other workloads on the same networks stack.

image

Mind you this does not prevent scenarios where management and backups are done over vNICs on the Hyper-V switch and where we leverage Hyper-V QoS as that’s on another network stack.

In our lab demos we’ll keep things simple: We’ll do live migration over SMB Direct (RoCE)and we’ll simulate intense backup traffic over the same pair of NICs to illustrate a RoCE configuration to guarantee minimal bandwidth for both and keep the RDMA traffic lossless (PFC). To make it very clear we’ll do a demo setup where we use two 10GbE NICs per host and allocate a minimum bandwidth of 90% for live migration and allocate the remaining10% minimum bandwidth to all other traffic (i.e. which includes our intense backup traffic). Read more about the configuration in SMB Direct over RoCE Demo – Hosts & Switches Configuration Example.

Advertisements

SMB Direct with DCB, PFC, ETS … How do I know it works?!


A question that comes up over time, again and again, is how do you know SMB Direct is working. The question stems from a nagging feeling that configuring DCB is a bit of playing wizard’s apprentice and we might not completely know what we’re doing, i.e. lack of experience.

image

Many have suspected me of brewing up DCB configurations in a dark corner of the data center where no one else dares venture. But those are unsubstantiated rumors. But in coming blog posts we’ll address how to configure it end to end and we’ll show how to find out if it’s really working and how to test that.

Finding out if it really works, testing and monitoring isn’t magic. It boils down to using tools you know. Performance counters for RDMA Activity and SMB direct are natively available in Windows. Use them!The NIC vendors also provide very detailed counters, those are excellent and of great value when testing and confirming things work as they should. The latter is very important. Because after people are satisfied SMB Direct works they want to know if DCB is configured correctly. Does PFC work, are pause frames being send and received? Is it really lossless?  Does ETS really kick in when needed, do I get the minimum bandwidth I configured? These are very valid questions people struggle with. But the answer eludes many, almost like the question if the refrigerator light really goes out when you close the door.

It’s hard to do deep down in the network packets … that often requires a very specialized skillset and experience with packet analyzers etc. Nothing most of you can’t learn but often this is not a priority. But with some creativity and the performance counters on windows provided by the NIC vendors and the statistics counters on the switches you can demonstrate that both PFC & ETS doe work and kick in.

So in upcoming blogs & videos I’ll demonstrate the configuring SMB Direct over RoCE leveraging 2 parts of DCB:

  • PFC (Priority Flow Control) – mandatory for SMB Direct over RoCE
  • ETS (Enhanced Transmission Selection) – optional but I advise you to leveraged it for SMB Direct over RoCE

Actually, when doing true converged, no matter what route you go, QoS is not really optional any more.

The biggest challenge is to get people to wrap their heads around the concepts and it’s behavior. Once you do that you’ll understand how and why to configure it. It took me time and effort, there’s no way around it, but it’s well worth the effort.

Look, DCB is not 100% fully matured or perfect especially in large scale environments over > 2 or 3 hops. Frak, while I love tinkering, testing and playing with this stuff I have never been a “QoS first person”. If I can I thrown resources at the problem (CPU cycles; memory, bandwidth, …). QoS is like a gun. You only draw it when you must use it and than you’d better do it right otherwise you don’t touch it, bar for practice/training/ education. While perfection is not of this world and improvements are being worked on (ECN) it does work and deliver. How many of you had a large scale > 2 hops , > 20 switches deployment with FC, FCoE or iSCSI to worry about? So can it deliver what you need today in most scenarios? Yes! Can I fix the short comings of any random technologies? No. Can I leverage current technologies with great success despite this? Yes! So can you. There is a reason I get hired and paid. Trust me it’s not my looks, my bed side manner or charismatic appearance Winking smile.

Side note 1: I’m cannot possibly provide a switch configuration guide in a step by step fashion as the details vary by vendor, they can also be switch model/type specific and it all depends on your environment & needs. So no I cannot and will not attempt to write a bunch of these. This would be way too much work and way too expensive (time, hardware etc.), so unless I’m paid very generously to do so, you’re out of luck. It might be cheaper to hire me or to come to the free community sessions, presentations, ATE evenings and study up.

SMB Direct with DCB, PFC, ETS … How do I know it works?!


A question that comes up over time, again and again, is how do you know SMB Direct is working. The question stems from a nagging feeling that configuring DCB is a bit of playing wizard’s apprentice and we might not completely know what we’re doing, i.e. lack of experience.

image

Many have suspected me of brewing up DCB configurations in a dark corner of the data center where no one else dares venture. But those are unsubstantiated rumors. But in coming blog posts we’ll address how to configure it end to end and we’ll show how to find out if it’s really working and how to test that.

Finding out if it really works, testing and monitoring isn’t magic. It boils down to using tools you know. Performance counters for RDMA Activity and SMB direct are natively available in Windows. Use them!The NIC vendors also provide very detailed counters, those are excellent and of great value when testing and confirming things work as they should. The latter is very important. Because after people are satisfied SMB Direct works they want to know if DCB is configured correctly. Does PFC work, are pause frames being send and received? Is it really lossless?  Does ETS really kick in when needed, do I get the minimum bandwidth I configured? These are very valid questions people struggle with. But the answer eludes many, almost like the question if the refrigerator light really goes out when you close the door.

It’s hard to do deep down in the network packets … that often requires a very specialized skillset and experience with packet analyzers etc. Nothing most of you can’t learn but often this is not a priority. But with some creativity and the performance counters on windows provided by the NIC vendors and the statistics counters on the switches you can demonstrate that both PFC & ETS doe work and kick in.

So in upcoming blogs & videos I’ll demonstrate the configuring SMB Direct over RoCE leveraging 2 parts of DCB:

  • PFC (Priority Flow Control) – mandatory for SMB Direct over RoCE
  • ETS (Enhanced Transmission Selection) – optional but I advise you to leveraged it for SMB Direct over RoCE

Actually, when doing true converged, no matter what route you go, QoS is not really optional any more.

The biggest challenge is to get people to wrap their heads around the concepts and it’s behavior. Once you do that you’ll understand how and why to configure it. It took me time and effort, there’s no way around it, but it’s well worth the effort.

Look, DCB is not 100% fully matured or perfect especially in large scale environments over > 2 or 3 hops. Frak, while I love tinkering, testing and playing with this stuff I have never been a “QoS first person”. If I can I thrown resources at the problem (CPU cycles; memory, bandwidth, …). QoS is like a gun. You only draw it when you must use it and than you’d better do it right otherwise you don’t touch it, bar for practice/training/ education. While perfection is not of this world and improvements are being worked on (ECN) it does work and deliver. How many of you had a large scale > 2 hops , > 20 switches deployment with FC, FCoE or iSCSI to worry about? So can it deliver what you need today in most scenarios? Yes! Can I fix the short comings of any random technologies? No. Can I leverage current technologies with great success despite this? Yes! So can you. There is a reason I get hired and paid. Trust me it’s not my looks, my bed side manner or charismatic appearance Winking smile.

Side note 1: I’m cannot possibly provide a switch configuration guide in a step by step fashion as the details vary by vendor, they can also be switch model/type specific and it all depends on your environment & needs. So no I cannot and will not attempt to write a bunch of these. This would be way too much work and way too expensive (time, hardware etc.), so unless I’m paid very generously to do so, you’re out of luck. It might be cheaper to hire me or to come to the free community sessions, presentations, ATE evenings and study up.

Free VEEAM Endpoint Backup Goes RTM – First Upgrade Experiences!


VEEAM Endpoint backup has gone RTM and that’s great news. I’ve been using it since the beta version with great results. I moved to the release candidate when that became available and now I’m running RTM. The version number of the RTM bits is 1.0.0.1954.

image

You can download it here and put it into action straight away!

Quick Tips & Findings

There is no supported upgrade path form the beta release. As a matter of fact the RTM version cannot read the backup files. When trying to upgrade from beta to RTM you’ll be greeted with this message:

image

Now that’s OK. You should have been on the RC already and there things are better Smile. Mind you, there’s no way to do an in place upgrade either but it can read the backups made by the RC version!

image

With a clean install (green field or after uninstalling the beta or RC version) the installation will kick off.

image

Now in the case of or RC backups we tested 2 things:

  • Can we restore the existing backups? Yes we can!

image

  • How are the backs made by the RTM version handled in regards to the already present ones. We just reconfigured the backups to the same repository and kicked of a backup. A new backup job folder was created and the backup was made there. So our DBA’s great self service SQL Server backup offloading repository made with the RC candidate is still available for restores while RTM backups to it’s own new folder.

image

Well there you go, VEEAM Endpoint Backup just got launched in production. We still have to wait for the production ready update for integration with VEEAM Backup & Replication v8 but that will arrive soon enough. The future looks bright.

Help! Active Directory cannot read forest and domain functional level anymore or much ado about nothing


The other day I got a very worried request for support. Apparently the Forest and domain functional level of a Active Directory deployment could no longer be read. Nothing else was wrong, everything was working just fine. If I could have a quick look? So they shared the screen with me and this is what I saw.

image

And this …

image

Well that didn’t surprise me, they are supposed to be already on domain and forest functional level 2012 R2 as all their domain controllers where already on Windows 202 R2 for over a year. That error message is not right!  After being puzzled for a moment when it hit me. This was a Windows 2008 R2 host without updated tools!

Once they used the correct version of RSAT or checking on a Windows 2012 R2 host all was show to be well and the scare was over.

image

Nothing so see here, move along.

Windows Server Technical Preview and Hyper-V Server Technical Preview Expiration Extension


Great news for all those of us that are running Windows Server Technical Preview v1 in their labs. It was due to expire on April 15th but Microsoft announced they were working on a fix to extend that deadline. They did not mention an ETA for it bit it’s here now, see http://www.microsoft.com/en-us/download/details.aspx?id=46447

image

So download, install, reboot and you’re good to go until we get our hands on Technical Preview v2! We’ve been saved by the cavalry and life is good!

Windows Server Technical Preview and Hyper-V Server Technical Preview Expiration Extension


Great news for all those of us that are running Windows Server Technical Preview v1 in their labs. It was due to expire on April 15th but Microsoft announced they were working on a fix to extend that deadline. They did not mention an ETA for it bit it’s here now, see http://www.microsoft.com/en-us/download/details.aspx?id=46447

image

So download, install, reboot and you’re good to go until we get our hands on Technical Preview v2! We’ve been saved by the cavalry and life is good!