10Gbps Cheap & Without Risk In Even The Smallest Environments


Over the last 18 months cheaper, commodity, small port count, but high quality 10Gbps switches have become available. NetGear is a prime example. This means 10Gbps networking is within reach for even the smallest deployments.

Size is an often used measure for technological needs like storage, networking and compute but in many cases it’s way too blunt of a tool. A lot of smaller environments in specialized niches need more capable storage  and networking capacities than their size would lead you to believe. The “Enterprise level” cost associated with the earlier SPF+ based swithes was an obstacle especially since the minimum port count lies around 24 ports, so with switch redundancy this already means 2 *24 ports.  Then there’s the cost of vendor branded SPF+ modules. But that could be offset with Copper Twinax Direct Attach cabling (which have their sweet spots for use) or finding functional cheaper non branded SFP+ modules. But all that isn’t an issue anymore. Today 10GBase-T card & switches are readily available and ready for prime time. The issues with power consumption and heat have been dealt with.

While vendors like DELL have done some amazing work to bring affordable 10Gbps switches to the market it remained a obstacle for many small environments. Now with the cheaper copper based, low port count switches it’s become a lot easier to introduce 10Gbps while taking away the biggest operational pains.

  • You can start with a lower number of 10Gbps ports (8-12) instead of  a minimum of 24.
  • No need for expensive vendor branded SPF+ modules.
  • Copper cabling (CAT6A) is relatively cheap for use in a rack or between two racks and for this kind of environment using patch lead cables isn’t an issue
  • Power consumption and heat challenges of copper 10Gbps has been addressed.

8port10Gbps

So even for the smallest setups where people would love to get 10Gbps for live migrations, hypervisor host backups and/or the virtual network it can be done now. If you introduce these for just CSV, live migration, storage or backup networks you can even avoid having to integrate them into the data network. This makes it easier, non disruptive & the isolation helps puts minds at easy about potential impacts of extra traffic and misconfigurations. Still you take away the heavy loads that might be disrupting your 1Gbps network, making things well again without needing further investments.

So go ahead, take the step and enjoy the benefits that 10Gbps bring to your (virtual) environment. Even medium sized shops can use this as a show case while they prepare for a 10Gbps upgrade for the server room or data center in the years to come.

Setting Up A Uplink (Trunk/General) With A Dell PowerConnect 2808 or 28XX


Introduction

I was deploying a bunch of PowerConnect 2808 switches that needed to provide connectivity to multiple VLANs  (Training, Guest, …)  in a class rooms. I should have figured it out before I got there with my “assumption” based quick configuration loaded on the switches if I had just refreshed my insights in how the PowerConnect family of switches work.

image

So before we go on, here are the basics on switch port (or LAG) modes in the PowerConnect family. Please realize that switch behavior (especially for trunk mode in this context) has changed over time with more recent switches/firmware. But the current state of affairs is as follows (depending on what model & firmware you have behavior differs a bit).You can put your port or LAG in the following 3 (main) modes:

Access: The port belongs to a single untagged VLAN. When a port is in Access mode, the packet types which are accepted on the port cannot be designated. Ingress filtering cannot be enabled/disabled on an access port. So only untagged received traffic is allowed and all transmitted traffic is untagged. The setting of the port determines the VLAN of traffic. Tagged received traffic is dropped. Basically, this is what you set your ports for client devices to (printer, PC, laptop, NAS).

Trunk: In older versions this means that ALL transmitted traffic is tagged.  That’s easy. Tagged received traffic is dropped if doesn’t belong to one of the defined VLAN on the trunk. In more recent switches/firmware untagged received traffic is dropped but for one VLAN, that can be untagged and still be received. Which is nice for the default VLAN and makes for a better compatibility with other switches.

General: You determine what the rules are. You can configure it to transmit tagged or untagged traffic per VLAN. Untagged received traffic is accepted and the PVID determines the VLAN it is tagged with.  Tagged received traffic is dropped if doesn’t belong to one of the defined VLANs.

Also see this DELL link PowerConnect Common Questions Between Access, General and Trunk mode

The PowerConnect 28XX Series

These  are good switches for their price point & use cases. Just make sure you buy them for the right use case. There is only one thing I find unforgiving in this day and age: the lack of SSH/HTTPS support for management.

Go ahead fire up a 2808 and take a look at the web interface and see what you can configure. In contrast with the PC54XX/55XX etc. Series you cannot set the port mode it seems. So how can this switch accommodate trunks/general/access modes at all. Well it’s implied in the configuration of ports that seem to be set in general mode by default and you cannot change that. The good news is that with the right setting a port in general mode behaves like a port in access or trunk mode. How? Well we follow the rules above.

So we assume here that a port is in general mode (can’t be changed). But we want trunk mode, so how do we get the same behavior? Let’s look at some examples in speudo CLI. (It’s web GUI only device).

Example 1: Classic Trunk = only defined tagged traffic is accepted. All untagged traffic is dropped

switchport mode trunk
switchport trunk allowed vlan add 9, 20

So we can have the same behavior is general mode using

switchport mode general
switchport general allowed vlan add 9, 20 tagged
switchport general pvid 4095   

The PVID  of 4095 is the industry standard discard VLAN, it assign this VLAN to all untagged traffic which is dropped. Ergo this is the same as the trunk config above!

Example 2: Modern Trunk = only defined tagged traffic and one untagged VLAN is accepted

switchport mode trunk
switchport trunk allowed vlan add 9, 20
switchport trunk allowed vlan add 1 untagged

So we can have the same behavior is general mode using

switchport mode general
switchport general allowed vlan add 9, 20 tagged
switchport general pvid 1  

This example is what we needed in the classroom. And is basically what you set with the GUI. So far so good. But we ran into an issue with connectivity to the access ports in VLAN 9 and VLAN 20. Let’s look at that in the next Example

Example 3: Access port mode = only one untagged VLAN is accepted

switchport mode access
switchport access vlan 9

Switchport mode general
switchport general allowed vlan add 9 untagged
switchport general pvid 9

If you’re accustomed to the higher end PC switches you define the port in access mode and add the VLAN of you choice untagged. That’s it. Here the mode is general and can’t be changed meaning we need to set the PVID to 9 so all untagged traffic is indeed tagged with VLAN 9 on the port.

Setting Up an uplink between a PowerConnect 5548 and a 2808

Here’s the normal deal with higher range series of PowerConnect switches: you normally use the port mode to define the behavior and in our case we could go with a trunk or general mode. We use trunk, leave the native VLAN for the one untagged VLAN and add 9 and 20 as tagged VLANs.

The “trunk” port of LAG is left on the default PVID

image

So an “access” port for VLAN 9 is is achieved by setting the PVID to 9

image

And an “access” port for VLAN 20 is achieved by setting the PVID to 20

image

While the VLAN  membership settings are what you’d expect them to be like on the higher end PowerConnect models:

VLAN 1 (native)

image

VLAN 9 (Corp)

image

VLAN 20 (Guest)

image

If it’s the first time configuring a PC2808 you might  totally ignore the fact that needed to do some extra work to make traffic flow. So to recap what you need to do  As described above there is no selection of access/general/trunk … on a PowerConnect 2808. The port or the lag is “implicitly” set to general and the extra settings of the PVID and adding tagged/untagged VLANs will make it behave as general, trunk or access.

  • The trick is to set any other VLAN than the default 1 to tagged on the port or LAG you’ll use as uplink. So far things are quite “standard PowerConnect”.
  • You set the VLAN membership of your “access” ports to untagged to the VLAN you want them to belong to.
  • After that in on the “access” ports you set the PVID to the VLAN you want the port to belong to. If you do not do this the port still behaves as if it’s a VLAN 1 port. It will not get a DHCP address for that VLAN but for for the the one on VLAN 1 if there  is one, or, if you use a static IP address for the subnet of a VLAN on that port you won’t have connectivity as it’s not set to the right VLAN.

The reason we used the PowerConnect 2808 series here is that we needed silent ones (passive cooling) and they need multiple ones in the training rooms to avoid to many cables running around the place. That was the 2 minutes at the desk of the project managers quick fix to a changed requirement. The real solution of cause would have been to get 24+ outlets to the room in the correct places and add 24+ ports to the normal switch count in the hardware analysis for the building solution. But after the facts you have to roll with the flow.

DELL Has Great Windows Server 2012 R2 Feature Support – Consistent Device Naming–Which They Help Develop


The issue

Plug ‘n Play enumeration of devices has been very useful for loading device drivers automatically but isn’t deterministic. As devices are enumerated in the order they are received it will be different from server to server but also within the system. Meaning that enumeration and order of the NIC ports in the operating system may vary and “Local Area Connection 2” doesn’t always map to port 2 on the  on board NIC. It’s random. This means that scripting is “rather hard” and even finding out what NIC matches what port is a game of unplugging cables.

Consistent Device Naming is the solution

A mechanism that has to be supported by the BIOS was devised to deal with this and enable consistent naming of the NIC port numbering on the chassis and in the operating system.

But it’s even better. This doesn’t just work with on board NICs. It also works with add on cards as you can see. In the name column it identifies the slot in which the card sits and numbers the ports consistently.

In the DELL 12th Generation PowerEdge Servers this feature is enabled by default. It is not in HP servers for some reason, you need to turn in it on manually.

I first heard about this feature even before Windows Server 2012 Beta was released but as it turns out Dell has been involved with the development of this feature. It was Dell BIOS team members that developed the solution to consistently name network ports and had it standardized via PCI SIG.  They also collaborated with Microsoft to ensure that Windows Server 2012 would support all this.

Here’s a screen shot of a DELL R720 (12th Generation PowerEdge Server) of ours. As you can see the Consistent Device Naming doesn’t only work for the on broad NIC card. It also does a fine job with add on cards of which we have quite a few in this server.image

It clearly shows the support for Consistent Device Naming for the add on cards present in this server. This is a test server of ours (until we have to take it into production) and it has a quad 1Gbps Intel card, a dual Intel X520 DA card and a dual port Mellanox 10Gbps RoCE card. We use it to test out our assumptions & ideas. We still need a Chelsio iWarp card for more testing mind you Winking smile

A closer look

This solution is illustrated the in the “Device Name column” in the screen shot below. It’s clear that the PnP enumerated name (the friendly name via the driver INF file) and the enumerated number value are very different from the number in Name column ( NIC1, NIC2, NIC2, NIC4) even if in this case where by change the order is correct. If the operating system is reinstalled, or drivers changed and the devices re-enumerated, these numbers may change as they did with previous operating systems.

image

The “Name” column is where the Consistent Device Naming magic comes to live. As you can see you are able to easily identify port names as they are numbered consistently, regardless of the “Device Name” column numbering and in accordance with the numbering on the chassis or add on card. This column name will NEVER differ between identical servers of after reinstalling a server because it is not dependent on PnP. Pretty cool isn’t it! Also note that we can rename the Name column and if we choose we can keep the original name in that one to preserve the mapping to the physical hardware location.

In the example below thing map perfectly between the Name column and the Device Name column but that’s pure luck.image

On of the other add on cards demonstrates this perfectly.image

Windows NLB Nodes Misconfigured after Simultaneous Live Migration on Windows Server 2012 (R2)


Here’s the deal. While Windows NLB on Hyper-V guests might seem to work OK you can run into issues. Our biggest challenge was to keep the WNLB cluster functional when all or multiple node of the cluster are live migrated simultaneously. The live migration goes blazingly fast via SMB over RDMA nut afterwards we have a node or nodes in an problematic state and clients being send to them are having connectivity issues.

After live migrating multiple or all nodes of the Windows NLB cluster simultaneously the cluster ends up in this state:

image

A misconfigured interface. If you click on the error for details you’ll see

image

Not good, and no we did not add those IP addresses manually or so, we let the WNLB cluster handle that as it’s supposed to do. We saw this with both fixed MAC addresses (old school WNLB configuration of early Hyper-V deployments) and with dynamic MAC addresses. On all the nodes MAC spoofing is enabled on the appropriate vNICs.

The temporary fix is rather easy. However it’s a manual intervention and as such not a good solution. Open up the properties of the offending node or nodes (for every NLB cluster that running on that node, you might have multiple).

image

Click “OK” to close it …

image

… and you’re back in business.

image

image

Scripting this out somehow with nlb.exe or PowerShell after a guest gets live migrated is not the way to go either.

But that’s not all. In some case you’ll get an extra error you can ignore if it’s not due to a real duplicate IP address on your network:

image

We tried rebooting the guest, dumping and recreating the WNLB cluster configuration from scratch. Clearing the switches ARP tables. Nothing gave us a solid result.

No you might say, Who live migrates multiple WNLB nodes at the same time? Well any two node Hyper-V cluster that uses Cluster Aware Updating get’s into this situation and possibly bigger clusters as well when anti affinity is not configured or chose to keep guest on line over enforcing said anti affinity, during a drain for an intervention on a cluster perhaps etc. It happens. Now whether you’ll hit this issue depends on how you configure and use your switches and what configuration of LBFO you use for the vSwitches in Hyper-V.

How do we fix this?

First we need some back ground and there is way to much for one blog actually. So many permutations of vendors, switches, configurations, firmware & drivers …

Unicast

This is the default and Thomas Shinder has an aging but  great blog post on how it works and what the challenges are here. Read it. It you least good option and if you can you shouldn’t use it. With Hyper-V we and the inner workings and challenges of a vSwitch to the mix. Basically in virtualization Unicast is the least good option. Only use it if your network team won’t do it and you can’t get to the switch yourself. Or when the switch doesn’t support mapping a unicast IP to a multicast MAC address. Some tips if you want to use it:

  1. Don’t use NIC teaming for the virtual switch.
  2. If you do use NIC teaming for the virtual switch you should (must):
    • use switch independent teaming on two different switches.
    • If you have a stack or just one switch use multicast or even better IGMP with multicast to avoid issues.

I know, don’t shout at me, teaming on the same switch, but it does happen. At least it protects against NIC issues which are more common than switch or switch port failures.

Multicast

Again, read Thomas Shinder his great blog post on how it works and what the challenges are here.

It’s an OK option but I’ll only use it if I have a switch where I can’t do IGMP and even then I do hope I can do two things:

  1. Add a static entry for the cluster IP address  / MAC address on your switch if it doesn’t support IGMP multicast:
    • arp [ip] [cluster multicast mac*] ARPA  > arp 172.31.1.232  03bf.bc1f.0164 ARPA
  2. To prevent switch flooding occurs, as with the unicast configure your switch which ports to use for multicast traffic:
    • mac-address-table static [cluster multicast mac] [vlan id] [interface]  > mac-address-table static 03bf.bc1f.0164 vlan 10 interface Gi1/0/1

The big rotten thing here is that this is great when you’re dealing with physical servers. They don’t tend to jump form switch port to switch port and switch to switch on the fly like a virtual machine live migrating. You just can’t hardcode all the vSwitch ports into the physical switches, one they move and depending on the teaming choice there are multiple ports, switches etc …it’s not allowed and not possible. So when using multicast in a Hyper-V environment stick to 1). But here’s an interesting fact. Many switches that don’t support 1) do support 2). Fun fact is that most commodity switches do seems to support IGMP … and that’s your best choice anyway! Some high end switches don’t support WNLB well but in that category a hardware load balancer shouldn’t be an issue. But let’s move on to my preferred option.

  • IGMP With Multicast (see IGMP Support for Network Load Balancing)

    This is your best option and even on older, commodity switches like a DELL PowerConnect 5424 or 5448 you can configure this. It was introduced in Windows Server 2003 (did not exist in NT4.0 or W2K). It’s my favorite (well, I’d rather use hardware load balancing) in a virtual environment. It works well with live migration, prevents switch flooding and with some ingenuity and good management we can get rid of other quirks.

    So Didier, tell us, how to we get our cookie and eat it to?

    Well, I will share the IGMP with Multicast solution with you in a next blog. Do note that as stated above there are some many permutations of Windows, teaming, WNL, switches  & firmware/drivers out there I give no support and no guarantees. Also, I want to avoid writing a  100 white paper on this subject?. If you insist you want my support on this I’ll charge at least a thousand Euro per hour, effort based only. Really. And chances are I’ll spend 10 hours on it for you. Which means you could have bought 2 (redundancy) KEMP hardware NLB appliances and still have money left to fly business class to the USA and tour some national parks. Get the message?

    But don’t be sad. In the next blog we’ll discuss some NIC teaming for the vSwitch, NLB configuration with IGMP with Multicast and show you a simple DELL PowerConnect 5424 switch example that make WNLB work on a W2K12R2 Hyper-V cluster with NIC teaming for the vSwitch and avoids following issues:

    • Messed up WNLB configuration after the simultaneous live migration of all or multiple NLB Nodes.
    • You avoid “false” duplicate IP address goof ups (at the cost of  IP address hygiene management).
    • You prevent switch port flooding.

    I’d show you on redundant Force10 S4810 but for that I need someone to ship me some of those with SFP+ modules for the lab, free of cost for me to keep Winking smile

    Conclusion

    It’s time to start saying goodbye to Windows NLB. The way the advanced networking features are moving towards layer 3 means that “useful hacks” like MAC spoofing for Windows NLB are going no longer going to work.  But until you have implement hardware load balancing I hope this blog has given you some ideas & tips to keep Windows NLB running smoothly for now. I’ve done quite few and while it takes some detective work & testing, so far I have come out victorious. Eat that Windows NLB!

  • Hot Iron, Cold Steel & Cables Are Still Paramount In The Era Of The Cloud


    Cloud, virtualized, on premise, hosted … the people in the field offices need to connect to them and as such hardware is not dead yet Winking smile. Commodities don’t mean obsolete or “in the cloud” only.

    clip_image002

    Some nice DELL PowerConnect 5548P switches. We’ve been using this line of switches (since the 53XX series) for many years now and with great success for in the datacenter (before we switched to 10Gbps) and campus/client access. They’ve never let us down at a price/value point that make the economies of using them to good to ignore.

    Once in a while, we’re out in the field making sure the people can access their apps, services, servers in the cloud, the data center or at a hosting provider. Meaning we get to play with some hardware and we all enjoy that still Smile. Whilst at work at several sites I’m once again confronted with commodities being treated like specialties with the following results:

    • Overly expensive
    • Very little value & capabilities (under delivery)
    • Slow delivery
    • Churning

    To avoid wasting you money or allowing it to be wasted you need to use common sense. If you use advisors get a consigliore, not a racketeer.

    1Gbps to the desktop and get some extra ports

    I’ve talked about getting affordable 10Gbps without compromising capabilities before so here I’ll look at the access/campus side of the story. I still find many organizations rolling out 100Mbps to the desktop for cost reasons and counting ports in orders of one. Two things to keep in mind. Buy 1Gbps and buy some extra.

    Buying vast quantities of something you don’t use but does it power is not a good idea. But being a complete scrooge and not having some extra ports is ridiculous. I have seen many thousands of € wasted in meetings about 10 to 40 switch ports too few in new building projects that have > 5000 outlets. The only real saving I see in in electricity used, if that is a major concern where you are at. Organizations spend tens of thousands of  € discussing something that would be fixed by spending a few thousand which would give extra benefits on top. That’s churning people. Creating work and billable hours by overinflating issues & crying wolf to justify the expenditure that’s supposedly needed to stave of disaster.

    On top of that when you do ask those architects to do some modern designs like SMB Direct  & DCB they freak out & repeat the above ritual. Chances are you’ll spend 20.000 to 30.00 euro on a 6 month study that says it can’t be done because of cost & the probability the sky will fall on your head, leaving you empty handed an poorer. You should have taken the money and just done it. Their scams defer responsibilities to untraceable entities, lines the pocket of consulting houses and, as no one is going to take responsibility to stop this madness, it just goes on forever whilst on paper everything is done by the book and compliancy to the rules is achieved.  Until the day some joker, frustrated at the lack of a few ports, attached a cheapo 8 ports switch to the outlet, creates a loop and brings down the buildings network affecting many thousands. Because the design didn’t handle that to well … been there, seen it.

    I also disagree with the practice of dropping in 100Mbps unless you have really good reasons. Structural cabling is being put in at Cat6A specifications nowadays and CAT5E has been put I for many years. 1Gbps is not a luxury if you do lots of data transfers within an office and have image intensive needs (more and more that is all of us with video, images, all in high res). Google fiber is coming to residential homes … guess what that could mean to services that can be delivered … Heaven forbid you buy 100Mbps because those fancy overpriced VOIP phones only do 100Mbps & you can’t afford the replace them.

    With QoS for VOIP and other use cases some extra bandwidth comes in handy as well for. Also don’t forget software installations & automated rollouts of desktops & laptops. Last but not least it helps deal with crappy network behavior of way to many software packets.

    On the number of ports and the price per port. We buy the most minimal support on switches possible. They hardly ever die on you and if something goes bad it’s a port perhaps, and even that is rare. So don’t waste money on support contracts. Buy some extra ports. For one you need some wiggle room and you have spare capacity to deal with port or even switch failures. If you need 400 ports, by 10*48 port switches. You have spare capacity and can even afford to lose a switch. If one really fails you most have a “lifetime warranty”. You finance 1Gbps to the desktop by dumping support you won’t need, buying value commodity switches and avoiding the racketeers mentioned above. If you need a network engineer, hire one, a good one.

    Than inevitably the cry comes: “you’ll saturate the uplinks”! Not a big issue for the small office (+/- 60 people) setup we did recently but what about a bit larger environments? Todays commodity switches had dual fiber uplink port,10Gbps capable, for a redundant lag. If you build a star design and not a cascade to a more capable core/top switch & you’re golden. It’s also great future proofing as we use access switches for a long time, over 7 years is not an exception, so give yourself some wiggle room.

    Cost you say? Again, forgo the expensive market leaders and you’ll get better value for less money that get the job done very well. Cables, even OM3 fiber, is affordable compared to the labor, construction and maintenance of a  > 1000 employee building. Put in enough cabling to allow for 21st century network traffic and make sure working on it is easy. Good principles used at the wrong place in the wrong way are no good to anyone except for the ones making money of this scam.

    RDMA Over RoCE With DCB Requires Tagged Non Default VLANs


    It’s DCB That Requires This

    For those of you who are experimenting with the RoCE variant of RDMA for SMB Direct in Windows Server 2012 (R2), make sure you have a VLAN tag in your configuration if this is more than a simple RDMA over two NICs. The moment you get DBC with PFC & ETS involved you’ll need non default tagged VLANs. Do note that PFC alone is good enough, ETS is strictly speaking not a requirement, but I’d consider doing it if you can.

    With Enhanced Transmission Selection (ETS) the network traffic type is classified using the priority value in the VLAN tag of the Ethernet frame. The priority value is the Priority Code Point (PCP), which is described in the IEEE 802.1Q specification and uses a 3-bit field in the VLAN tag with eight possible priority values (0 to 7).

    Priority-based Flow Control (PFC) allows to individually pause priorities of tagged traffic and helps to provide lossless or “no drop” behavior for a certain priority at the receiving port. As  above, each frame transmitted by a sending port is tagged with a priority value (0 to 7) in the VLAN tag. So for the traffic pause and resume functionality to work we need a VLAN tag to carry the priority value.

    Does It Work Without?

    But you’ll tell me that, as you may be lacking a DCB capable switch for lab purposes, you used a direct cable between your two RoCE NICs. And guess what RoCE, might have indeed worked for you without a VLAN tag. You can test & get a feel for what RoCE/RDMA can do for you with just the NICs. But as there is no switch involved you’re not using DCB for PFC/ETS and without that the need for the tagged VLAN isn’t there. Also see http://workinghardinit.wordpress.com/2013/05/03/smb-direct-roce-does-not-work-without-dcbpfc/.

    So there you go. Design your RoCE/RDMA network based on DCB with PFC( and ETS) and not just on the tests with an direct cable or you might miss a few details that are quite important. Happy testing!

    We Need Your Opinion On This Strategy, Vision, Management Issue …


    Could you give us your opinion on this?

    Lately people, managers, have asked me to give advice or at least my opinion on how to organize & manage IT. In the broad sense of the term. Infrastructure, software, services, support, on premise, cloud, data protection, security …  “Just think about it a bit”.

    That question “Could you give us your opinion on this?” is a hard one for me.  I could say “read my blog”, the non technical posts. But my opinion is often too high level and they don’t they actually want that. They want a solution. And it’s not that I don’t think about it or don’t have an opinion. But I can’t focus on areas out of my expertise, my control and priorities.

    Basically I cannot help them. Not because I’m that stupid or the matter is beyond our control. It’s because the way managers and organizations think is getting more and faster obsolete by the day.

    The Issue

    Our world, both privately and work related, is becoming more and more connected every day. That means there is a tremendous amount of input, leading to an ever continuing increase of permutations of ever more variables that come in to play. In short, complexity is on the rise at an enormous rate and will overwhelm us. Even worse is that this complexity only shows itself after things have gone wrong. That’s bad but, that also means there are probably many more relationships of cause and effect that haven’t even shown themselves yet. That kind of sounds like a time bomb.

    How do you deal with this? Not in the way so many are asking for. And I’m not here to tell my managers or customers what they want to hear. I’m in the business of telling them what they need to hear as I deal in results, not services or studies. More often than not they are looking for processes and methodologies to keep central control over planning, execution, operations and change. All this while the rug is literally pulled away under their feet. There’s the problem.

    Situations, technologies, solutions, frameworks, processes all have a time limited value that’s becoming shorter. So the idea that you can plan and control for many years ahead is obsolete in many areas in our ecosystem. There are just to many moving parts, that are changing too fast. So how do we manage this? What kind of leadership do you need? Well there is no easy answer.

    How do I deal with this?

    Personally I deal with this by working, collaborating & cooperating in a network, in “the community”. My insights, knowledge, help and support come from my network. Some of my colleagues, the contractors and consultants we hire are in that network. A lot of colleagues are not. Most managers are not. Why is that? They are stuck in a hierarchal world of centralized command and control that is failing them fast. At best they achieve good results, but very slow and at a very high expense. We can only hope that the results also don’t turn out bad. They want procedures & processes. Predictability & consistency but I deal with complexity in wide area of expertise that cannot readily be put into manuals and documentation. Not in a timely fashion. I’m in a dog fight (insert “Top Gun” theme). The processes & logistics provide the platform. Learn where procedures & methodologies work and where they’ll kill you. The knowledge and the skills we need are a living thing that feeds on a networked collective and are very much in flux.  I’m so much more better skilled and effective at my job through participating my global community than I can be tied into the confines of my current workplace they’d be mad not to leverage that, let alone prevent me from doing so. You can’t do it alone or in isolation.

    An example

    Yesterday was an extreme example in a busy week. I started work at 05:30 AM yesterday to set up a testing environment for questions I needed answered by a vendor who leverages the community at large. That’s required some extra work in the datacenter that I could have done by a colleague that was there today because I found out in time. I went to the office at 08:30. I worked all day on an important piece of work I mentioned in my network and was alerted to a potential issue. That led to knowledge sharing & testing. Meaning we could prevent that very potential issue and meanwhile we’re both learning. I went home at 18:30, dinner & testing. I was attending an MVP web cast at 20:00 PM till 21:00 PM learning new & better ways to trouble shoot clusters. I got a call at 19:10PM of a mate in Switzerland who’s running into SAN issues and I helped him out with the two most possible causes of this through my experience with SANs and that brand of HP SAN.  We did some more testing & research until 22:00 after which I wrote this blog up.

    We don’t get paid for this. This is true mutual beneficial cooperation. We don’t benefit directly and it’s not “our problem” or job goal. But oh boy do we learn and grow together and in such help each other and our employers/customers. It’s a true long term investment that pays of day by day the longer you are active in the community and network. But the thing is, I can’t put that into a process or manual. Any methodology that has to serve centralized command and control structure while dealing with agile subjects is bound to fail. Hence you see agile & scrum being abused to the level it’s just doing stuff without the benefits.

    Conclusion

    This is just one small and personal example. Management and leadership will have to find ways of nurturing collaboration and cooperation beyond the boundaries of their control. The skillset and knowledge needed are not to be found in a corporate manual or in never ending in house meetings & committees. Knowledge gained has to flow to grow As such it flows both in an out of your organization. You’re delusional if you think you can stop that today and it’s not the same a leaking corporate secrets. Hierarchies & management based on rank and pay grades are going to fail. And if those managers in higher pay grades can’t make the organization thrive in this ever more connected, faster moving world, they might not be worth that pay grade.

    I assure you that employees and consultants who live in the networked global community will quickly figure out if an organization can handle this. They will not and should not do their managers job. In fact they are already doing managers areal big favor by working and operating the way they do. They are leading at their level, they are leveraging their networks and getting the job done. They are taking responsibilities, they solve problems creatively and get results. It just doesn’t fit easily in an obsolete model of neatly documented procedures in a centralized command and control structure. They don’t need a manager for that, they need one that will make it possible to thrive in that ultra-connected ever changing fast paced world. Facilitate, stimulate and reward learning and taking responsibilities, not hierarchies. That way all people in your organization will lead or at least contribute to the best of their ability. You’ll need to trust them for that to work. If you don’t trust them, fine, but act upon it. Letting people you don’t trust work for and with you doesn’t work.

    How to do this is a managers & leaders challenge. Not mine. I know when I’m out of my depth or when not to engage. The grand visions, the strategic play of a company is their responsibility. Getting results & moving forward will come from your perpetually learning, and engaged workforce, if you don’t mess it up. And yes, that is your responsibility. Cultures are cultivated by definition. So if the culture of the company is to blame for things going south, realize you’re the ones supposed to make it a good one. People don’t leave organizations, they leave managers ;-) And to paraphrase the words of Walt Disney … you’re in a world of hurt if they leave you but stay at their desk and on the pay roll. It’s called mediocrity, which also serves a purpose, providing commodities & cookie template services whilst letting others shine. But if you want to be a thriving, highly skilled, expertise driven center of excellence … it’s going to take lot of hard and sustained work and it’s not a one way street.

    Live Migration over SMB Direct leaves more CPU cycles for Virtual RSS (vRSS) in Windows Server 2012 R2


    I recently (January 22nd 2014) gave a WebCast presentation for the Dutch Windows Management User Group (@WMUG_NL) in which I made the case for using SMB Direct with Live Migration to save CPU cycles other (VM) workloads. There are several areas where the CPU cycles are better spent but I used vRSS to show case one scenario.

    We’re using a 2 node Windows Server 2012 R2 Hyper-V cluster on Dell PowerEdge R720 servers with Mellanox ConnectX-3 (CSV  &  live migration) and Intel X520-DA (Hyper-V switch), all 10Gbps.

    This is what a CPU bottleneck looks like that can be solved by using vRSS in Windows Server 2012 R2.image

    The host machines are Hyper Threading enabled. The virtual switch is attached to a switch independent NIC team with dynamic mode. In this setup it’s normal that the sending VM is leveraging both members while the receiving VM traffic is coming in over one member of the host team.

    No let’s enable vRSS in the VM and see what this does for this picture.image

    Pretty impressive isn’t it. DidierTest03 is the sending VM running on host A and DidierTest04 is the receiving VM that has vRSS enabled and is running on Host B. For vRSS you need both hosts and VMs to run Windows Server 2012 or Windows 8.1. You can see the load is spread across 7 vCPUs in the VM. DidierTest04 has 8 vCPUs. I configured vRSS in the VM to be able to use 7 vCPUs and leave vCPU 0, the default one, alone to handle those workloads.

    image

    Given multiple Logical CPUs & vCPUs we can get line speed with 10Gbps inside a virtual machine. This, ladies and gentlemen is a thing of beauty.

    Now tell me, if you have business related needs for those CPU cycles why would you not offload the work that needs to be done for live migration to the NIC via SMB direct? This is about getting maximum VM density, performance & ROI form your infrastructure, whilst saving on servers, power and cooling. When you see the smile on your clients or bosses face, just say “you’re welcome” and smile back Open-mouthed smile.

    Failed Live Migrations with Event ID 21502 Planned virtual machine creation failed for virtual machine ‘VM Name’: An existing connection was forcibly closed by the remote host. (0×80072746) Caused By Wrong Jumbo Frame Settings


    OK so Live Migration fails and you get the following error in the System even log with event id 21502:

    image

    Planned virtual machine creation failed for virtual machine ‘DidierTest01′: An existing connection was forcibly closed by the remote host. (0×80072746). (Virtual Machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).

    Failed to receive data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0×80072746).

    There are some threads on the TechNet forums on this like here http://social.technet.microsoft.com/Forums/en-US/805466e8-f874-4851-953f-59cdbd4f3d9f/windows-2012-hyperv-live-migration-failed-with-an-existing-connection-was-forcibly-closed-by-the and some blog post pointing to TCP/IP Chimney settings causing this but those causes stem back to the Windows Server 2003 / 2008 era.

    In the Hyper-V event log Microsoft-Windows-Hyper-V-VMMS-Admin you also see a series of entries related to the failed live migration point to the same issue: image

      
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:15 AM
    Event ID:      20413
    Task Category: None
    Level:         Information
    Keywords:     
    User:          SYSTEM
    Computer:      SRV1.BLOG.COM
    Description:
    The Virtual Machine Management service initiated the live migration of virtual machine  ‘DidierTest01′ to destination host ‘SRV2′ (VMID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).
     
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:26 AM
    Event ID:      22038
    Task Category: None
    Level:         Error
    Keywords:     
    User:          SYSTEM
    Computer:      SRV1.BLOG.COM
    Description:
    Failed to send data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0×80072746).
     
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:26 AM
    Event ID:      21018
    Task Category: None
    Level:         Error
    Keywords:     
    User:          SYSTEM
    Computer:      SRV1.BLOG.COM
    Description:
    Planned virtual machine creation failed for virtual machine ‘DidierTest01′: An existing connection was forcibly closed by the remote host. (0×80072746). (Virtual Machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27).
     
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:26 AM
    Event ID:      22040
    Task Category: None
    Level:         Error
    Keywords:     
    User:          SYSTEM
    Computer:      SRV1.BLOG.COM
    Description:
    Failed to receive data for a Virtual Machine migration: An existing connection was forcibly closed by the remote host. (0×80072746).
    Log Name:      Microsoft-Windows-Hyper-V-VMMS-Admin
    Source:        Microsoft-Windows-Hyper-V-VMMS
    Date:          10/8/2013 10:06:26 AM
    Event ID:      21024
    Task Category: None
    Level:         Error
    Keywords:     
    User:          SYSTEM
    Computer:      srv1.blog.com
    Description:
    Virtual machine migration operation for ‘DidierTest01′ failed at migration source ‘SRV1′. (Virtual machine ID 41EF2DB-0C0A-12FE-25CB-C3330D937F27)

    There is something wrong with the network and if all checks out on your cluster & hosts it’s time to look beyond that. Well as it turns out it was the Jumbo Frame setting on the CSV and LM NICs.

    Those servers had been connected to a couple of DELL Force10  S4810 switches. These can handle an MTU size up to 12000. And that’s how they are configured. The Mellanox NICs allow for MTU Sizes up to 9614 in their Jumbo Frame property.  Now super sized jumbo frames are all cool until you attach the network cables to another switch like a PowerConnect 8132 that has a max MTU size of 9216. That moment your network won’t do what it’s supposed to and you see errors like those above. If you test via an SMB share things seem OK & standard pings don’t show the issue. But some ping tests with different mtu sizes & the –f (do no fragment) switch will unmask the issue soon. Setting the Jumbo Frame size on the CSV & LM NICs to 9014 resolved the issue.

    Now if on the server side everything matches up but not on the switches you’ll also get an event id 21502 but with a different error message:

    Event ID: 21502 The Virtual Machine Management Service failed to establish a connection for a Virtual machine migration with host XXXX. A connection attempt failed because the connected party did not properly respond after a period of time, or the established connection failed because connected host has failed to respond (0X8007274C)

    image

    This is the same message you’ll get for a known cause of shared nothing live migration failing as described in this blog post by Microsoft Shared Nothing Migration fails (0x8007274C).

    So there you go. Keep an eye on those Jumbo Frame setting especially in a mixed switch environment. They all have their own capabilities, rules & peculiarities. Make sure to test end to end and you’ll be just fine.

    I’m In Austin Texas For Dell World 2013


    This is the night time sky line of where I’m at right now. Austin, Texas, USA. That famous “Lone Star State” that until now I only knew from the movies & the media. Austin is an impressive city in an impressive state and, as most US experiences I’ve had, isn’t comparable with anything in my home country Belgium. That works both ways naturally and I’m lucky I get to travel a bit and see a small part of the world.image

    Dell World 2013

    So why am I here?  Well I’m here to attend DELL World 2013, but you got that already Smile

    image

    That’s nice Didier but why DELL World? Well, several reasons. For one, I wanted to come and talk to as many product owners & managers, architects & strategists as I can. We’re seeing a lot of interest in new capabilities that Windows Server 2012 (R2) brought to the Microsoft ecosystem. I want to provide all the feedback I can on what I as a customer, a Microsoft MVP and technologist expect from DELL to help us make the most of those. I’m convinced DELL has everything we need but can use some guidance on what to add or enhance. It would be great to get our priorities and those of DELL aligned. Form them I expect to hear their plans, ideas, opinions and see how those match up. Dell has a cost/value leadership position when it comes to servers right now. They have a great line up of economy switches that pack a punch (PowerConnect) & some state of the art network gear with Force10. it would be nice to align these with guidance & capabilities to leverage SMB Direct and NVGRE network virtualization. Dell still has the chance to fill some gaps way better than others have. A decent Hyper-V network virtualization gateway that doesn’t cost your two first born children and can handle dozens to hundreds of virtual networks comes to mind. That and real life guidance on several SMB Direct with DCB configuration guidance. Storage wise, the MD series, Equalogic & Compellent arrays offer great value for money. But we need to address the needs & interest that SMB 3.0, Storage Spaces, RDMA has awoken and how Dell is planning to address those. I also think that OEMs need to pick up pace & change some of their priorities when it comes to providing answers to what their customers in the MSFT ecosystem ask for & need, doing that can put them in a very good position versus their competitors. But I have no illusions about my place in & impact on the universe.

    Secondly, I was invited to come. As it turns out DELL has the same keen interest in talking to people who are in the trenches using their equipment to build solutions that address real life needs in a economical feasible way.  No, this is not “just” marketing. A smart vendor today communicates in many ways with existing & potential customers. Social media is a big part of that but also off line at conferences, events and both contributor and sponsor.  Feedback on how that works & is received is valuable as well for both parties. They learn what works &n doesn’t and we get the content we need. Now sure you’ll have the corporate social media types that are bound by legal & marketing constrictions but the real value lies in engaging with your customers & partners about their real technological challenges & needs.

    Third is the fact that all these trends & capabilities in both the Microsoft ecosystem and in hardware world are not happening in isolation. They are happening in a world dominated by cloud computing in all it’s forms. This impact everything from the clients, servers, servers to the data centers as well as the people involved. It’s a world in which we need to balance the existing and future needs with a mixture of approaches & where no one size fits all even if the solutions come via commodity products & services. It’s a world where the hardware  & software giants are entering each others turf. That’s bound to cause some sparks Smile. Datacenter abstraction layer, Software Defined “anything” (storage, networking, …), converged infrastructure. Will they collaborate or fight?

    So put these three together and here I am. My agenda is full of meetings, think tanks, panels, briefings and some down time to chat to colleagues & DELL employees alike.

    Why & How?

    Some time ago I was asked why I do this and why I’m even capable to do this. It takes time, money and effort.  Am I some kind of hot shot manager or visionary guru? No, not at all. Believe there’s nothing “hot” about working on a business down issue at zero dark thirty. I’m a technologist. I’m where the buck stops. I need things to work. So I deal in realities not fantasies. I don’t sell methods, processes or services people, I sell results, that’s what pays my bills long term. But I do dream and I try to turn those into realities. That’s different from just fantasy world where reality is an unwelcome guest. I’m no visionary, I’m no guru. I’m a hard working IT Pro (hence the blog title and twitter handle) who realizes all to well he’s standing on the shoulders of not just giants but of all those people who create the ecosystem in which I work. But there’s more. Being a mere technologist only gets you that far. I also give architectural & strategic advice as that’s also needed to make the correct decisions. Solutions don’t exist in isolation and need to be made in relation to trends, strategies and needs. That takes insight & vision. Those don’t come to you by only working in the data center, your desktop or in eternal meetings with the same people in the same place. My peers, employers and clients actively support this for the benefit of our business, customers, networks & communities. That’s the what, why and who that are giving me the opportunities to learn & grow both personally & professionally. People like Arlindo Alves and may others at MSFT, my fellow MVPs (Aidan Finn, Hans Vredevoort, Carsten Rachfahl, …), Florian Klaffenbach & Peter Tsai. As a person you have to grab those opportunities. If you want to be heard you need to communicate. People listen and if the discussions and subjects are interesting it becomes a two way conversation and a great learning experience. As with all networking and community endeavors you need to put in the effort to reap the rewards in the form of information, insights and knowledge you can leverage for both your own needs as well as for those in your network. That means speaking your mind. Being honest and open, even if at times you’re wrong. That’s learning. That, to me, is what being in the DELL TechCenter Rock StarDELL TechCenter Rock Star program is all about.

    Learning, growing, sharing. That and a sustained effort in your own development slowly but surely makes you an “expert”. An expert that realizes all to well how much he doesn’t known & cannot possible all learn.  Luckily, to help deal with that fact, you can turn to the community.