Essential to my Modern Datacenter Lab: Azure Site 2 Site VPN with a DELL SonicWALL Firewall


If you’re a serious operator in the part of IT that is considered the tip of the spear, i.e. you’re the one getting things done, you need a lab. I have had one (well I upgraded it a couple of times) for a long time.  When you’re dealing with cloud as an IT Pro, mostly Microsoft Azure in my case, that need has not changed. It enables you gain the knowledge and insights that you can only acquire by experimenting and hands on work, there is no substitute. Sometimes people ask me how I learn. A lab and lots of hands on experimenting is a major component of my self education and training. I put in a lot of time and some money, yes.

Perhaps you have a lab at work, perhaps not, but you do need one. A lab is a highly valuable investment in education for both your employers and yourself. It takes a lot of time, effort and it cost a bit of money. The benefits however are huge and I encourage any employer who has IT staff to sponsor this at the ROI is huge for a relative small TCO.

I love the fact that in a lab you have (and want) complete control over the entire stack so you experiment at will and learn about the solutions you build end to end. You do need to deal with it all but that’s all good, you learn even more, even when at times it’s tough going. Note that a home lab, even with the associated costs, has the added benefit of still being available to you even if you move between employers or between clients.

You can set up a site to site VPN using Windows Server 2012 R2 RRAS (see Site-to-Site VPN in Azure Virtual Network using Windows Server 2012 Routing and Remote Access Service (RRAS) that works. But for for long term lab work and real life implementations you’ll be using other devices. In the SOHO lab I run everything virtualized & I need internet access for other uses cases than the on premises lab. I also like to minimize the  hosts/VMs/appliances I need to have running to save on electricity costs. For enterprise grade solutions you leverage solutions form CISCO, JUNIPER, CheckPoint etc. There is no need for “enterprise grade” solutions in a SOHO or small branch office environment.Those are out of budget & overkill, so I needed something else. There are some options out there but I’m using a DELL SonicWALL NSA 220. This is a quality product for one and I could get my hands on one in a very budget friendly manner. UTMs & the like are not exactly cheap, even without all the subscription, but they don’t exactly cater to the home user normally. You can go higher or lower but I would not go below a TZ-205 (Wireless) which is great value for money and more than up for the task of providing you with the capabilities you need in a home lab.

SonicWALL NSA 220 Wireless-N Appliance

I consider this minimum level as I want 1Gbps (no I do not buy 100Mbps equipment in 2015) and I want wireless to make sure I don’t need to have too many hardware devices in the lab. As said, the benefit over the RRAS solution are that it serves other purposes (UTM) and it can remain running cheaply so you can connect to the lab remotely to fire up your hosts and VMs which you normally power down to safe power.

Microsoft only dynamic routing with a limited number of vendors/devices but that doesn’t mean all others are off limits. You can use them but you’ll have to research the configurations that work instead of downloading the configuration manual or templates from the Microsoft web site, which is still very useful to look at an example configuration, even if it’s another product than you use.

Getting it to work is a multistep proccess:

    1. Set up your Azure virtual network.
    2. Configure your S2S VPN on the SonicWall
    3. Test connectivity between a on premises VM and one in the cloud
    4. Build out your hybrid or public cloud

Here’s a reference to get you started Tutorial: Create a Cross-Premises Virtual Network for Site-to-Site Connectivity I will be sharing my setup for the SonicWall in a later blog post so you can use it as a reference. For now, here’s a schematic overview of my home lab setup to Azure (the IP addresses are fakes). At home I use VDSL and it’s a dynamic IP address so every now and then I need to deal with it changing. I’d love to have a couple of static IP address to play with but that’s not within my budget. I wrote a little Azure scheduled run book that takes care of updating the dynamic IP address in my Azure site-to-site VPN setup. It’s also published on the TechNet Gallery

image

You can build this with WIndows RRAS, any UTM, Firewall etc … device that is a bit more capable than a consumer grade (wireless) router. The nice things is that I’ve had multiple subnets on premises and the 10 tunnels in a standard Azure  site-to-site VPN accommodate that nicely. The subnets I don’t want to see in a tunnel to azure I just leave out of the configuration.

Tip to save money in your Azure lab for newbies, shut down everything you can when your done. Automate it with PowerShell. I just make sure my hybrid infra is online & the VPN active enough to make sure we don’t run into out of sync issues with AD etc.

Azure Automation Scheduled Runbook PowerShell Script to automatically update site-to-site VPN Local Network VPN Gateway Address with dynamic public IP


You can download the script at the end of the article. When you’re connecting a home (or perhaps even an office) lab  to Azure with a site-2-site VPN you’ll probably have to deal with the fact that you have a dynamic IP assigned by your ISP. This means unless you update the VPN Gateway Address of your Azure local network in some automated way, your connection is down very often and you’re faced with this this in Azure …

image

which on my DELL SonicWALL NSA 220 that looks like this …

image

A fellow MVP of mine (Christopher Keyaert) has written a PowerShell script that a few years back that updated the VPN gateway gddress of your Azure local network via a scheduled task inside of his Windows RRAS VM. Any VM, either in Azure or in your lab will do. Good stuff! If you need inspiration for a script  you have a link. But, I never liked the fact that keeping my Azure site-to-site VPN up and running was tied to a VM being on line in Azure or in my lab, which is also why I switched to a SonicWALL device. Since we have Azure Automation runbooks at our disposal I decided to automate the updating of the VPN gateway address  to the dynamic IP address of my ISP using a runbook.

Finding out your dynamic IP address from anywhere in the world

For this to work you need a way to find out what your currently assigned dynamic IP is. For that I subscribe to a free service providing dynamic DNS updates. I use https://www.changeip.com/. That means that by looking up the FQDN is find can out my current dynamic IP address form where ever I have internet access. As my SonicWALL supports dynamic DNS services providers I can configure it their, not need for an update client running in a VM or so.

image

The runbook to update the VPN Gateway Address of your Azure local network

I will not deal with how to set up Azure Automation, just follow this link. I will share a little hurdle I needed to take. At least for me it was a hurdle. That hurdle was that the Set-AzureVNetConfig cmdlet which we need has a mandatory parameter -ConfigurationPath which reads the configuration to set from an XML file (see Azure Virtual Network Configuration Schema).

You cannot just use a file path in an Azure runbook to dump a file on c:\temp  for example. Using an Azure file share seems overly complicated for this job. After pinging some fellow MVPs at Inovativ Belgium who are deep into Azure automation on a daily basis, Stijn Callebaut gave me the tip to use [System.IO.Path]::GetTempFileName() and that got my script working. Thank you Stijn Winking smile!

So I now have a scheduled runbook that automatically updates my to the dynamic IP address my ISP renews every so often without needing to have a script running scheduled inside a VM. I don’t always need a VM running but I do need that VPN to be there for other use cases. This is as elegant of a solution that I could come up with.

I test the script before publishing & scheduling it by setting the VPN Gateway Address of my Azure local network to a wrong IP address in order to see whether the runbook changes it to the current one it got from my dynamic IP. As you can see it was successful.

image

Now publish it and have it run x times a day … depending on how aggressive your ISP renews your IP address and how long your lab can sustain the Azure site-to-site VPN to be down. I do it hourly. Not a production ready solution, but neither is a dynamic IP and this is just my home lab!

image

Now my VPN looks happy most of the time automatically

image

image

Download the runbook  here (zipped PowerShell script)

In Defense of Switch Independent Teaming With Hyper-V


For many old timers (heck, that includes me) NIC teaming with LACP mode was the best of the best, at least when it comes to teaming options. Other modes often led to passive/active, less than optimal receiving network traffic aggregation. Basically, and perhaps over simplified, I could say the other options were only used if you had no other choice to get things to work. Which we did a lot … I used Intel’s different teaming modes for various reasons in the past (before we had MLAG, VLT, VPC, …). Trying to use LACP where possible was a good approach in the past in physical deployments and early virtualized environments when 1Gbps networking dominated the datacenter realm and Windows did not have native support for LBFO.

But even LACP, even in those days, had some drawbacks. It’s the most demanding form of teaming. For one it required switch stacking. This demands the same brand and type of switches and that means you have no redundancy during firmware upgrades. That’s bad, as the only way to work around that is to move all workload to another rack unit … if you even had the capability to do that! So even in days past we chose different models if teaming out of need or because of the above limitations for high availability. But the superiority of NIC teaming with LACP still stands for many and as modern switches support MLAG, VLT, etc. the drawback of stacking can be avoided. So does that mean LACP for NIC teaming is always the superior choice today?

Some argue it is and now they have found support in the documentation about Microsoft CPS system documentation about Microsoft CPS system. Look, even if Microsoft chose to use LACP in their solutions it’s based on their particular design and the needs of that design I do not concur that this is the best overall. It is however a valid & probably the choice for their specific setup. While I applaud the use of MLAG (when available to you a no or very low cost) to have all bases covered but it does not mean that LACP is the best choice for the majority of use cases with Hyper-V deployments. Microsoft actually agrees with me on this in their Windows Server 2012 R2 NIC Teaming (LBFO) Deployment and Management guide. They state that Switch Independent configuration / Dynamic distribution (or Hyper-V Port if on Hyper-V and if not on W2K12R2)  is the best possible default choice is for teaming in both native and Hyper-V environments. I concur, even if perhaps not that strong for native workloads (it depends). Exceptions to this:

  • Teaming is being performed in a VM (which should be rare),
  • Switch dependent teaming (e.g., LACP) is required by policy, or
  • Operation of a two-member Active/Standby team is required by policy.

In other words in 2 out of 3 cases the reason is a policy, not a technical superior solution …

Note that there are differences between Address Hash, Hyper-V Port mode & the new dynamic distribution modes and the latter has made things better in W2K12R2 in regards to bandwidth but you’ll need the read the white papers. Use dynamic as default, it is the best. Also note that LACP/Switch Dependent doesn’t mean you can send & receive to and from a VM over the aggregated bandwidth of all team members. Life is more complicated than that. So if that’s you’re main reason for switch dependent, and think you’re done => be ware Winking smile.

Switch Independent is also way better for optimization of VMQ. You have more queues available (sum-of-queues) and the IO path is very predictable & optimized.

If you don’t control the switches there’s a lot more cross team communication involved to set up teaming for your hosts. There’s more complexity in these configurations so more possibilities for errors or bugs. Operational ease is also a factor.

The biggest draw back could be that for receiving traffic you cannot get more than the bandwidth a single team member can deliver. That’s true but optimizing receiving traffic has it’s own demands and might not always be that great if the switch configuration isn’t that smart & capable. Do I ever miss the potential ability to aggregate incoming traffic. In real life I do not (yet) but in some configurations it could do a great job to optimize that when needed.

When using 10Gbps or higher you’ll rarely be in a situation where receiving traffic is higher than 10Gbps or higher and if you want to get that amount of traffic you really need to leverage DVMQ. And a as said switch independent teaming with port of dynamic mode gives you the most bang for the buck. as you have more queues available. This drawback is mitigated a bit by the fact that modern NICs have way larger number of queues available than they used to have. But if you have more than one VM that is eating close to 10Gbps in a non lab environment and you planning to have more than 2 of those on a host you need to start thinking about 40Gbps instead of aggregating a fistful of 10Gbps cables. Remember the golden rules a single bigger pipe is always better than a bunch of small pipes.

When using 1Gbps you’ll be at that point sooner and as 1Gbps isn’t a great fit for (Dynamic) VMQ anyway I’d say, sure give LACP a spin to try and get a bit more bandwidth but will it really matter? In native workloads it might but with a vSwith?  Modern CPUs eat 1Gbps NICs for breakfast, so I would not bother with VMQ. But when you’re tied to 1Gbps it’s probably due to budget constraints and you might not even have stackable, MLAG, VLT or other capable switches. But the arguments can be made, it depends (see Don’t tell me “It depends”! But it does!). But in any case I start saving for 10Gbps Smile

Today as the PC8100 series and the N4000 Series (budget 10Gbps switches, yes I know “budget” is relative but in the 10Gbps world, but they offer outstanding value for money), I tend to set up MLAG with two of these per rack. This means we have all options and needs covered at no extra cost and without sacrificing redundancy under any condition. However look at the needs of your VMs and the capability of your NICs before using LACP for teaming by default. The fact that switch independent works with any combination of budget switches to get redundancy doesn’t mean it’s only to be used in such scenarios. That’s a perk for those without more advanced gear, not a consolation price.

My best advise: do not over engineer it. Engineer it for the best possible solution for the environment at hand. When choosing a default it’s not about the best possible redundancy and bandwidth under certain conditions. It’s about the best possible redundancy and bandwidth under most conditions. It’s there that switch independent comes into it’s own, today more than ever!

There is one other very good, but luckily also a very rare case where LACP/Switch dependent will save you and switch independent won’t: dead switch ports, where the port becomes dysfunctional. So while switch independent protects against NIC, Switch, cable failures, here it doesn’t help you as it doesn’t know (it’s about link failures, not logical issues on a port).

For the majority of my Hyper-V deployments I do not use switch dependent / LACP. The situation where I did had to do with Windows NLB in combination with ICMP Multicast.

Note: You can do VLT, MLAG, stacking and still leverage switch independent teaming, LACP or static switch dependent is NOT mandatory even when possible.

DELL SonicWALL Site-to-Site VPN Options With Azure Networking


The DELL SonicWALL product range supports both policy based and route based VPN configurations. Specifically for Azure they have a configuration guide out there that will help you configure either.

Technically, networking people prefer to use route based configuration. It’s more flexible to maintain in the long run. As life is not perfect and we do not control the universe, policy based is also used a lot. SonicWALL used to be on the supported list for both a Static and Dynamically route Azure VPN connections. According to this thread it was taken off because some people had reliability issues with performance. I hope this gets fixed soon in a firmware release. Having that support is good for DELL as a lot of people watch that list to consider what they buy and there are not to many vendors on it in the more budget friendly range as it is. The reference in that thread to DELL stating that Route-Based VPN using Tunnel Interface is not supported for third party devices, is true but a bit silly as that’s a blanket statement in the VPN industry where there is a non written rule that you use route based when the devices are of the same brand and you control both points. But when that isn’t the case, you go a policy based VPN, even if that’s less flexible.

My advise is that you should test what works for you, make your choice and accept the consequences. In the end it determines only who’s going to have to fix the problem when it goes wrong. I’m also calling on DELL to sort this out fast & good.

A lot of people get confused when starting out with VPNs. Add Azure into the equation, where we also get confused whilst climbing the learning curve, and things get mixed up. So here a small recap of the state of Azure VPN options:

  • There are two to create a Site-to-Site VPN VPN between an Azure virtual network (and all the subnets it contains) and your on premises network (and the subnets it contains).
    1. Static Routing: this is the one that will work with just about any device that supports policy based VPNs in any reasonable way, which includes a VPN with Windows RRAS.
    2. Dynamic Routing: This one is supported with a lot less vendors, but that doesn’t mean it won’t work. Do your due diligence. This also works with Windows RRAS

Note: Microsoft now has added a a 3rd option to it’s Azure VPN Gateway offerings, the High Performance VPN gateway, for all practical purposes it’s dynamic routing, but a more scalable version. Note that this does NOT support static routing.

The confusion is partially due to Microsoft Azure, network industry and vendor terminology differing from each other. So here’s the translation table for DELL SonicWALL & Azure

Dynamic Routing in Azure Speak is a Route-Based VPN in SonicWALL terminology and is called and is called Tunnel Interface in the policy type settings for a VPN.

image

Static Routing in Azure Speak is a Policy-Based VPN in SonicWALL terminology and is called Site-To-Site in the “Policy Type” settings for a VPN.

image

  • You can only use one. So you need to make sure you won’t mix the two on both sites as that won’t work for sure.
  • Only a Pre-Shared Key (PSK) is currently supported for authentication. There is no support yet for certificate based authentication at the time of writing).

Also note that you can have 10 tunnels in a standard Azure site-to-site VPN which should give you enough wiggling room for some interesting scenarios. If not scale up to the high performance Azure site-to site VPN or move to Express Route. In the screenshot below you can see I have 3 tunnels to Azure from my home lab.

image
I hope this clears out any confusion around that subject!

Azure Done Well Means Hybrid Done Right


If you think that a hybrid cloud means you need to deploy SCVMM & WAP you’re wrong. It does mean that you need to make sure that you give yourself the best possible conditions to make your cloud a success and an asset in the biggest possible number of all scenarios that might apply or come up.

DC1

Cool you say, I hear you, but what does that mean in real life? Well it means you should stop playing games and get serious. Which translates into the following.

Connectivity

A 200Mbps is the absolute minimum for the SMB market. You need at least that for Office 365 Suite, if you want happy customers that is. Scale based on the number of users and usage but remember you’ll pinch at least a 100Mbps of that for a VPN to Azure.

Get a VPN already!

Or better still, take the gloves off and go for Express Route. Extend your business network to your cloud and be done with all the hacks, workarounds, limitations, tedious & creative yet finicky "solutions" to get thing done. I guess it beats living with the limitations but it will only get you that far.

Any country or business that isn’t investing in FC to the home & cheap affordable data connectivity to the businesses is actively destroying long term opportunity for some dubious short term gain.

So without further ado, life is to short to do hybrid cloud without. It opens up great scenarios that will allow you to get all the comforts of on premise in your Azure data center such as …

Extend AD  & ADFS into Azure

Get that AD & ADFS into the cloud people! What? Yes, do it. That’s what that good solid VPN between Azure and on premises or better still, Express Route enables. Just turn it into just another site of your business.  But one with some fascinating capabilities. DirSync or better Azure Active Directory Sync will only get you that far and mostly in a SAAS(PAAS) ecosystem. Once you’ve done that the world is your oyster!

https://media.licdn.com/mpr/mpr/p/4/005/083/346/127f314.jpg

Conclusion

So don’t be afraid. Just do it!  People I have my home lab and it’s AD connected to my azure cloud via VPN! That’s me the guy that works for his money and pays his own bills. So what are you as a business waiting for?

But wait Didier, isn’t AD going away, why would I not wait for the cloud to be 100% perfect for all I do? Well, just get started today and take it from there. You’ll enjoy the journey if you do it smart and right!

“Your cloud, your terms”. Well that’s true.  But that’s not a given, you’ll need to put in some effort. You have to determine what your terms are and what your cloud should look like. If you don’t you’ll end up in a bad state. If you have good IT staff, you should be OK. If they could handle your development environment & run your data center chances are good they’ll be able to handle “cloud”. Really.

Consultants? Sure, but get really good ones or you’ll get sold to. There’s a lot of churning and selling going on. Don’t get taken for a ride. I know a bunch of really good ones. How do I determine this? One rule … would I hire them Winking smile

Options For A Highly Available Load Balanced RD Gateway Server Farm on Hyper-V


When you need to make the RD Gateway service highly available you have some options. On the RD Gateway side you have capability of configuring a farm with multiple RD Gateway servers.image

When in comes to the actual load balancing of the connections there are some changes in respect load balancing from Windows Server 2008 R2 that you need to de aware of! With Windows 2008 R2 you could do:

  1. Load balancing appliances (KEMP Loadmaster for example, F5, A10, …) or Application Delivery Controllers, which can be hardware, OEM servers, virtual and even cloud based (see Load Balancing In An Ever More Demanding Virtualized & Cloudy World). KEMP has Hyper-V appliances, many others don’t. These support layer 4, layer 7, geo load balancing etc. Each has it’s use cases with benefits and drawback but you have many options for the many situations you might encounter.
  2. Software load balancing. With this they mean Windows NLB. It works but it’s rather limited in regards to intelligence for failure detection & failover. It’s in no way an “Application Delivery Controller” as load balancer are positioned nowadays.
  3. DNS Round Robin load balancing. That sort of works but has the usual drawbacks for problem detection and failover.  Don’t get me wrong for some use cases it’s fine, but for many it isn’t.

I prefer the first but all 3 will do the basic job of load balancing the end-user connections based on the traffic. I have done 2 when it was good enough or the only option but I have never liked 3, bar where it’s all what’s needed, because it just doesn’t fit many of the uses cases I dealt with. It’s just too limited for many apps.

In regards to RD Gateway in Windows Server 2012 (R2), you can no longer use  DNS Round Robin for load balancing with the new HTTP transport. The reason is that it uses two HTTP channels (one for input and one for output) and DNS round robin cannot guarantee that both these connections will be routed trough the same RD Gateways server which is a requirement for it to work. Basically RRDNS will only work for legacy RPC-HTTP. RPC could reroute a channel to make sure all flows over the same node at the cost of performance & scalability. But that won’t work with HTTP which provides scalability & performance. Another thing to note is that while you can work without UDP you don’t want to. The UDP protocol is used  to deliver graphics with a better user experience  over even low quality networks for graphics or high and experiences with RemoteFX. TCP (HTTP) is can be used without it (at the cost of a lesser experience) and is also used to maintain the sessions and actions. Do note that you CANNOT use UDP alone as these connections are established only after the main HTTP connection exists between the remote desktop client and the remote desktop server. See Don’t Forget To Leverage The Benefits of RD Gateway On Hyper-V & RDP 8/8.1 for more information

So you will need a least Windows Network Load Balancing (WNLB) because that supports IP affinity to make sure all channels stick to the same node. UDP & HTTP can be on different nodes by the way. Also please not that when using network virtualization WNLB isn’t a good choice. It’s time to move on.

So the (or at least my) preferred method is via a real “hardware” load balancer.  These support a bunch of persistence options like IP affinity, cookie-based affinity, … just look at the screenshot below (KEMP Loadmaster)

image

But they also support layer 7 functionality for better health checking and failover.  So what’s not to like?

So we need to:

  1. Build a RD Gateway Farm with at least two servers
  2. Load balance HTTP/HTTPS for the RD Gateway farm
  3. Load balance UDP for the RD Gateway farm.

We’ll do this 100% virtualized on Hyper-V and we’ll also make make the load balancer it self highly available. Remember, removing single points of failure are like bottle necks. The moment you take one away you just hit the next one Smile.

Kemp has a great deployment guide for RDS on how to do this but I should ass that you could leverage SUB Virtual Services (SUBVS) to deal with the other workloads such as RD Web Access if they’re on the same server. They don’t mention this in the white paper but it’s an option when using HTTP/HTTPS as service type for both configurations. #1 & #2 are the SUB Virtual Services where I used this in a lab.

image

But for RD Gateway you can also leverage the Remote Terminal Service type and in this case you won’t leverage SUBVS as the service type is different between RD Gateway (Remote Terminal) and RD Web Access (HTTP/HTTPS). This is actually used by their RDS template you can download form their support site.

image

Hope this helps some of you out there!

Quick Demo Video Of Site Failover With KEMP Loadmaster Global Balancing


Here’s a quick video that demonstrates how you can achieve site failover with via the KEMP Loadmaster Global Balancing feature. As long as you know what this can do for you and realize that it about site failover and high availability and not continuous availability without a second of service interruption you can deliver nice results with this technology across city campuses or between cities.

In our scenario we normally connect to the primary data center (weighted round robin) and fail over to the DRC when the primary site fails for some reason.

It’s very busy at the moment but I hope to address this topic a bit more in detail in the future. All of this runs virtualized on Hyper-V and performs just fine.

SMB Direct With RoCE in a Mixed Switches Environment


I’ve been setting up a number of Hyper-V clusters with  Mellanox ConnectX3 Pro dual port 10Gbps Ethernet cards. These Mellanox cards provide a nice amount of queues (128) for DVMQ and also give us RDMA/SMB Direct capabilities for CSV & live migration traffic.

Mixed Switches Environments

Now RoCE and DCB is a learning curve for all of us and not for the faint of heart. DCB configuration is non trivial, certainly not across multiple hops and different switches. Some say it’s to be avoided or can’t be done.

You can only get away with a single pair of (uniform) switches in smaller deployments. On top of that I’m seeing more and more different types of switches being used to optimize value, so it’s not just a lab exercise to do this. Combine this with the fact that DCB is an unavoidable technology in networking, unless it get’s replaced with something better and easier, and you might as well try and learn. So I did.

Well right now I’m successfully seeing RoCE traffic going across cluster nodes spread over different racks in different rows at excellent speeds. The core switches are DELL Force10 S4810 and the rack switches are PowerConnect 8132Fs. By borrowing an approach from spine/leave designs this setup delivers bandwidth where they need it a a price point they can afford. They don’t need more expensive switches for the rack or the core as these do support DCB and give the port count needed at the best price point.  This isn’t supposed to be the top in non blocking network design. Nope but what’s available & affordable today in you hands is better than perfection tomorrow. On top of that this is a functional learning experience for all involved.

We see some pause frames being sent once in a while and this doesn’t impact speed that very much. It does guarantee lossless traffic which is what we need for RoCE. When we live migrate 300GB worth of memory across the nodes in the different racks we get great results. It varies a bit depending on the load the switches & switch ports are under but that’s to be expected.

Now tests have shown us that we can live migrate just as fast with non RDMA 10Gbps as we can with RDMA leveraging “only” Multichannel. So why even bother? The name of the game low latency and preserving CPU cycles for SQL Server or storage traffic over SMB3. Why? We can just buy more CPUs/Cores. Great, easy & fast right? But then with SQL licensing comes into play and it becomes very expensive. Also storage scenarios under heavy load are not where you want to drop packets.

Will this matter in your environment? Great question! It depends on your environment. Sometimes RDMA is needed/warranted, sometimes it isn’t. But the Mellanox cards are price competitive and why not test and learn right? That’s time well spent and prepares you for the future.

But what if it goes wrong … ah well if the nodes fail to connect over RDAM you still have Multichannel and if the DCB stuff turns out not to be what you need or can handle, turn it of and you’ll be good.

RoCE stuff to test: Routing

Some claim it can’t be done reliably. But hey they said that for non uniform switch environments too Winking smile. So will it all fall apart and will we need to standardize on iWarp in the future?  Maybe, but isn’t DCB the technology used for lossless, high performance environments (FCoE but also iSCSI) so why would not iWarp not need it. Sure it works without it quite well. So does iSCSI right, up to a point? I see these comments a lot more form virtualization admins that have a hard time doing DCB (I’m one so I do sympathize) than I see it from hard core network engineers. As I have RoCE cards and they have become routable now with the latest firmware and drivers I’d love to try and see if I can make RoCE v2 or Routable RoCE work over different types of switches but unless some one is going to sponsor the hardware I can’t even start doing that. Anyway, lossless is the name of the game whether it’s iWarp or RoCE. Who know what we’ll be doing in 5 years? 100Gbps iWarp & iSCSI both covered by DCB vNext while FC, FCoE, Infiniband & RoCE have fallen into oblivion? We’ll see.

Hyper-V Guest Protected Network Testing Tip


I’ve been pinged a few times over the years with people saying that the new protected network feature does not work for them. This setting is set per vNIC of the virtual machine.

image

The issue lies in how & what people test, bar any number of other reasons why a live migration might not start or complete.  What people tend to do is disable a NIC to which the vSwitch is connected. But a Protected Network is about media sense loss detection of network disconnects and this requires the NIC to be actually there and enabled. Remember, we’re talking about the NIC on the host connected to the virtual switch. A physical link failure here, meaning that the virtual switch the protected virtual network adapter no longer has network connectivity, will lead to all the VMs with  the protected network enabled do be live migrated to another node in the cluster that still has a connected virtual switch for the same network.  The latter is to avoid  senseless virtual machine migrations to other nodes that might also have lost connectivity due to a failed physical switch.

So the point is that testing by disabling the NIC in the OS will not do. You need to unplug the cables to the virtual switch or disable the port on the switch or even shutdown the switch (a bit drastic).

Do note that it can take a little time for the live migration to kick in,  it varies a bit, but it beats having to wait for the issue to be resolved. You’ll see event id 1255 logged when the VMs lose network connectivity:image

In this day and age with NIC teaming to redundant switches & the fact that you might be using converged networking these tests aren’t as simple as you might think. Also don’t pull out all if the cables used for clustering if you want the cluster to be able to help you out here with a live migration. Because when the other cluster nodes can’t talk to the node your testing in any way it will be kicked out of the cluster, the VMs will go down, be moved to another node and started. This might seem obvious but if you a are using a teamed 10Gbps solution in a converged setup this might cause exactly that.

Another thing to note is that if you have a virtual switch with a dedicated backup network exposed to hosts & VMs that can tolerate down time you might want to disable protected networks on that vNIC as you don’t want to live migrate the VMs of when that network has an issue. It all depends on your needs & tastes.

Last but not least please behave, and don’t do anything silly in production when testing this. Be careful in your testing.

Hot add/remove of network adapters and enabling device naming in Windows Server Hyper-V


One of the cool new features in Window Server vNext Hyper-V (in Technical Preview at the moment of writing) is that you gain the ability to hot add and remove NICs.  That might sound not to important to the non initiated in the fine art of virtualization & clouds. But it is. You see anything you can do to a VM configuration wise that does not require downtime is good. That’s what helps shift the needle of high availability to that holey grail of continuous availability.

On top of that the names of the network adapters are now exposed to the guest. Which is also great. It’s become lot easier to automate the VM network configuration.

Hot adding NICs can be done via the GUI and PoSh.

image

But naming the network adapter seems a PowerShell only game for now (nothing hard, no sweat). This can be done during creation of the network adapter. Here I add a NIC to VM RAGNAR connected to the ISCSI-GUEST switch & named ISCSI.

Add-VMNetworkAdapter –VMName RAGNAR –SwitchName ISCSI-GUEST –Name ISCSI

Now I want this name to be reflected into the VM’s NCI configuration properties. This is done by enabling device naming. You can do this via the GUI or PoSh.

Set-VMNetworkAdapter –VMName RAGNAR –Name ISCSI –Devicenaming On

That’s it.

image

So now let’s play with our existing network adapter “Network Adapter” which connects our Hyper-V guests to the LAN via the HYPER-V-GUESTS switch? Can you rename it?  Yes, you can. In PoSh run this:

Rename-VMNetworkAdapter –VMName RAGNAR –Name “Network Adapter” –NewName “LAN”

And that’s it. If you refresh the setting of your VM or reopen it you’ll see the name change.

image

The one thing that I see in the Tech Preview is that I need to reboot the VM to see the Name change reflected inside the VM in the NIC configuration under advance properties, called “Hyper-V Network Adapter Name”. Existing one show their old name and new one are empty until then.

image

 

Two important characteristics to note about enabling device naming

You notice that one can edit this field in NIC configuration of the VM but it doesn’t move up the stack into the settings of the VM. Security wise this seems logical to me and it’s not intended to work. It’s a GUI limitation that the field cannot be disabled for editing but no one can try and  be “funny” by renaming the ethernet adapter in the VMs settings via the guest Winking smile

Do note that this is not exactly the same a Consistent Device Naming in Windows 2012 or later. It’s not reflected in the name of the NIC in the GUI, these are still Ethernet, Ethernet 2, … Enable device naming is mainly meant to enable identifying the adapter assigned to the VM inside the VM, mainly for automation. You can name the NIC in the Guest whatever works best for you and you’ll never lose the correlations between the Network adapter in your VM settings and the Hyper-V Network Adapter name in the NIC configuration properties. In that respect is a bit more solid/permanent even if some one found it funny to rename all vNICs to random names you’re still OK with this feature.

That’s it off, you go! Download the Technical Preview bits from MSDN, start exploring and learning. Knowledge is seldom a bad thing Winking smile