NVM Express over Fabrics


Any technologist who’s read, let alone used NVM Express (NVMe), is pretty enthusiastic about it’s capabilities and if it was not for availability and financial restrictions we’d all have at least a couple in our home systems and labs. It seems to succeed very well in making sure the host can keep up with the performance (low latencies, high throughput)) delivered by SSD drives than our current interfaces.

This means that many are very happy with future visions on how PCIe will dislodge SAS/SATA as the preferred SSD interface. This might seem feasible for local storage right now but how to deal with this in an actual storage array, what if we want to size this to a larger scale? There are no “PCIe JBODS”. So what does one do? Well, how did we do it in the past with FC? We created a fabric. Below we see several local & remote NVMe architectures even hybrid ones with traditional SAS.

image

That’s exactly what NVM Express Inc. is doing, creating the specs for a fabric. This holds the promise to achieve superior results due to the elimination of SCSI translation which reduces latency significantly by delivering NVMe end to end. Not only that but we also see the following efforts in the NVM Express Specification 1.2 to give it enterprise grade capabilities beyond pure performance.

  • Enhanced status reporting
  • Expanded capabilities including live firmware updates

There have been some early demos of NVMe over Fabrics mainly focusing on the “remote” performance. While local NVMe SSDs have the edge on absolute IOPS the difference with NVMe over a fabric is not significant. The reduction in latency is measured in < 10 µs,so that’s good news. The fabric leverages RDMA (yes, yet another reason that my time spending with this technology has been a useful investment). This can be Infiniband, RoCE or iWarp. There’s also the new kid on the block “Intel Omni Scale”  (even if their early demo used iWARP). There’s also a Mellanox RoCE demo.

image

Now with NVMe SSD disk speeds it seems that the writing is on the wall that ever better fabric performance will be needed to support the tremendous throughput this evolution of storage can deliver. RDMA seems poised for success in regards to this. Now, yes, strictly speaking the NVMe traffic does not require RDMA but let’s just say I don’t see anyone building it without. I also think this means even iWarp fabrics will use DCB (PFC) to make sure we have a lossless network. The amount of traffics will be immense and why not optimize for the best possible performance? I hold the opinion this is beneficial for east-west traffic today in larger environments, especially when in converged networks. Unless the Intel® Omni-Path Architecture blows everyone else away that is Smile. Too early to tell.

Now does this dictate the total and absolute obsolescence of iSCSI and FC? No. There is no reason why a NVMe Fabrics storage solution cannot offer storage to hosts via FC, iSCSI, SMB 3, NFS, FCoE, … They, potentially could even offer iWarp, RoCE or Infiniband to the hosts so you won’t lose your prior investments or get locked into one. I have no magic ball so I cannot tell you if this will happen. What I do now that when it comes to MPIO versus multichannel for load balancing and even failover and recovery, multichannel sometimes does a (far?) superior job in my honest opinion especially with older hypervisors even when the hypervisor uses separate sessions per virtual machine to achieve better load balancing over iSCSI or the like. Anyway, I digress. One thing I do know is that I’ll keep a keen eye on what Microsoft is doing in this space, especially in regards to Windows Server 2016. It’s time to up the level on scalability & support for newer technologies once again.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s