Thanks to some great people at Dell in Germany (yes, you Florian), Belgium and of course Alison Krause (@AlisonatDell), Maryna Frolova (@MarineroF) and Stephanie Woodstrom I got invited to attend the “Fluid Forward Think Tank” at the Dell Storage Forum in Paris.
We had a healthy variation in customers, partners, consultants and DELL employees discussing various aspects of IT related to storage. The task of herding the cats fell upon the shoulders of Simon Robinson (@simonrob451) who’s an Analyst and VP at 451 Research, a firm that deals with storage and information management. I for one think he did so brilliantly. This interactive discussion was streamed live and if you missed it you can click on this live stream link to look at our ramblings
I had to pitch some of my dreams of leveraging al the new mobility features as well as the high to continuous available that is being enabled with Windows Server 2012 Hyper-V on inherently unreliable components what opportunities these present to us customers and storage vendors.
It was a fun, educational discussion as the mix of backgrounds, industries, job functions was diverse enough to address all sides of the storage story, the good, the bad and the ugly. We gave them some food for taught I think. Well the folks at DELL can now take this back to Austin and reflect on it all. If need be, I’ll drop by some day to provide some feedback and remember @WarrenByle I ‘d like to try out that STI of his After an interview I ran of to a Compellent customer panel to learn something and provide some feedback on our first experiences.
As you might well know I’m in the process of doing a multi site SAN replacement project to modernize the infrastructure at a non disclosed organization. The purpose is to have a modern, feature reach, reliable and affordable storage solution that can provide the Windows Server 2012 roll out with modern features (ODX, SMI-S, …).
One of the nifty things you can do with a Compellent SAN is migrations from LUNs of the old SAN to the Compellent SAN with absolute minimal downtime. For us this has proven a real good way of migrating away from 2 HP EVA 8000 SANs to our new DELL Compellent environment. We use it to migrate file servers, Exchange 2010 DAG Member servers (zero downtime), Hyper-V clusters, SQL Servers, etc. It’s nothing less than a hidden gem not enough people are aware off and it comes with the SAN. I was told that it was hard & not worth the effort by some … well clearly they never used and as such don’t know it. Or they work for competitors and want to keep this hidden .
You have to set up the zoning on all SANs involved to all fabrics. This needs to be done right of course but I won’t be discussing this here. I want to focus on the process of what you can do. This is not a comprehensive how to. It depends on your environment and I can’t write you a migration manual without digging into that. And I can’t do that for free anyway. I need to eat & pay bills as well
Basically you add your target Compellent SAN as a host to your legacy SAN (in our case HP EVA 8000) with an operating system type of “Unknown”. This will provide us with a path to expose EVA LUNs to our Compellent SAN.
Depending on what server LUNs you are migrating this is when you might have some short downtime for that LUN. If you have shared nothing storage like in an Exchange 2010 or a SQL Server 2012 DAG you can do this without any downtime at all.
Stop any IO to the LUN if you can (suspend copies, shut down data bases, virtual machines) and take CSVs or disks offline. Do what is needed to prevent any application and data issue, this varies.
What we then do is we unpresent the LUN of a server on the legacy SAN.
After a rescan of the disks on the server you’ll see that disk/LUN disappear.
This same LUN we then present to the Compellent host we added above.
We then “Scan for Disks” in the Compellent Controller GUI. This will detect the LUN as an unassigned disk. That unassigned disk can be mapped to an “External Device” which we name after the LUN to keep things clear (“Classify Disk as External Device” in the picture below).
Then we right click that External Device and choose to “Restore Volume from External Device”.
This kicks off replication from the EVA LUN mapped to the Compellent target LUN. We can now map that replica to the host as you can see in this picture.
After this rescan the disks on the server and voila, the server sees the LUN again. Bring the disk/CSV back online and you’re good to go.
All the downtime you’ll have is at a well defined moment in time that you choose. You can do this one LUN at the time or multiple LUNs at once. Just don’t over do it with the number of concurrent migrations. Keep an eye on the CPU usage of your controllers.
After the replication has completed the Compellent SAN will transparently map the destination LUN to the server and remove the mapping for the replica.
The next step is that the mirror is reversed. That means that while this replica exists the data written to the Compellent LUN is also mirrored to the old SAN LUN until you break the mirror.
Once you decide you’re done replicating and don’t want to keep both LUNs in sync anymore, you break the mirror.
You delete the remaining replica disk and you release the external disk.
Now you unpresent the LUN from the Compellent host on your old SAN.
After a rescan your disks will be shown as down in unassigned disks and you can delete them there. This completes the clean up after a LUN migration.
When set up properly it works very well. Sure it takes some experimenting to deal with some intricacies, but once you figure all that out you’re good to go and are ready to deal with any hiccups that might occur. The main take away is that this provides for minimal downtime at a moment that you choose. You get this out of the box with your Compellent. That’s a pretty good deal I say!
So as you can see this particular environment will be ready for Windows Server 2012 & Hyper-V. Life is good!
At the end of this day I was doing some basic IO tests on some LUNs on one of the new Compellent SANs. It’s amazing what 10 SSDs can achieve … We can still beat them in certain scenarios but it takes 15 times more disks. But that’s not what this blog is about. This is about goofing off after 20:00 following another long day in another very long week, it’s about kicking the tires of Windows and the SAN now that we can.
For fun I created a 300TB LUN on a DELL Compellent, thin provisioned off cause, I only have 250 TB
I then mounted it to a Windows 2008 R2 test server.
The documented limit of a Volume in Windows 2008 R2 is 256TB when you use 64K allocation size. So I tested this limit by trying to format the entire LUN and create a 300TB simple volume. I brought it online, initialized it to an GPT disk, created a simple volume with an allocation unit size of 64K and well that failed with following error:
There is nothing unexpected about this. This has to do with the maximum NTFS volume size supported on a GPT disk. It
depends on the cluster size that is selected at the time of formatting. NTFS is currently limited to 2^32-1 allocation units. This yields a 256TB volume, using 64k clusters. However, this has only been tested to 16TB, or 17,592,186,040,320 bytes, using 4K cluster size. You can read up on this in Frequently asked questions about the GUID Partitioning Table disk architecture. The table below shows the NTFS limits based on cluster size.
This was the first time I had the opportunity to test these limits I formatted part of that LUN to a size close to the limit and than formatted the remainder to a second simple volume.
I still need get a Windows Server 2012 test server hooked up to the SAN. To see if anything has changed there. One thing is for sure, you could put at least 3 64TB VHDX files on a single volume in Windows. Not too shabby . It’s more than enough to put just about any backup software into problems. Be warned, MSFT tested and guarantees performance & behavior up to 64TB in Windows Server 2012, but beyond that you’d better do your own due diligence.
The next thing I’ll do when I have a Windows Server 2012 host hooked up is, is create 64TB VHDX file and see if I can go beyond it before things break. Why, well because I can and I want to take the new SAN and Windows 2012 for a ride to see what boundaries we can push. The SANs are just being set up so now is the time to do some testing.
Let’s say you are very happy with your SAN. You just love the snapshots, the thin provisioning, deduplication, automatic storage tiering, replication, ODX and the SMI-S support. Live is good! But you have one annoying issue. For example; to get the really crazy IOPS for your SQL Server 2012 DAG nodes you would have to buy 72 SSDs to add to you tier 1 storage in that SAN. That’s a lot of money I you know the price range of those. But perhaps you don’t even have SSDs in your SAN.To get the required amount of IOPS from your SAN with SAS or NL-SAS disks in second and respectively third level storage tier you would need to buy a ridiculous amount of disks and, let’s face it, waste that capacity. Per IOPS that becomes a very expensive and unrealistic option.
Some SSD only SAN vendors will happily sell you a SAN that address the high IOPS need to help out with that problem. After all that is their niche, their unique selling point, fixing IOPS bottle necks of the big storage vendors where and when needed. This is cheaper solution per IOPS than you standard SAN can deliver but it’s still a lot of money, especially if you need more than a couple of terabytes of storage. Granted they might give you some extra SAN functionality you are used to, but you might not need that.
Yes I know there are people who say that when you have such needs you also have the matching budgets. Maybe, but what if you don’t? Or what if you do but you can put 500.000 € towards another need or goal? Your competitive advantage for pricing your products and winning customers might come form that budget
Creative Thinking or Nuts?
Let’s see if we can come up with a home grown solution bases on Windows Server 2012 Hyper-V. If we can this might solve your business need, save a ton of money and extend (or even save) the usefulness of you SAN in your environment. The latter is possible because you successfully eliminated the biggest disk IO from you SAN.
The Solution Scenario
So let’s build 3 Hyper-V hosts, non-clustered, each with its own local SAS based storage with commodity SSD drives. You can use either storage pools/spaces with a non-raid SAS HBA or use a RAID SAS HBA with controller based virtual disks for this. If you’ve seen what Microsoft achieved with this during demos you know you can easily get to hundreds of thousands of IOPS. Let’s say you achieve half of what MSFT did in both IOPS and latency. Let’s just put a number on it => that’s about 500.000 IOPS and 5GB/s. Now reduce that for overhead of virtualization, the position of the moon and the fact things turn out a bit less than expected. So let’s settle for 250.000 IOPS and 2.5GB/s. Anybody here who knows what this kind of numbers would cost you with the big storage vendors their SANs? Right, case closed. Don’t just look at the cost, put it into context and look at the value here. What does and can your SAN do and at what cost?
OK we lose some performance due to the virtualization overhead. But let’s face it. We can use SR-IOB to get the very best network performance. We have hundreds of thousands of IOPS. All the cores on the hosts are dedicated to a single virtual machine running a SQL Server DAG node and bar 4Gb of RAM for the OS we can give all the RAM in the hosts to the VM. This one VM to one host mapping delivers a tremendous amount of CPU, Memory, Network and Storage capabilities to your SQL Server. This is because it gets exclusive use of the resources on the host, bar those that the host requires to function properly.
In this scenario it is the DAG that provides high availability to the SQL Server database. So we do not mind loosing shared storage here.
Because we have virtualized the SQL server you can leverage Shared Nothing Live Migration to move the virtual machines with SQL server to the central storage of the SAN without down time if the horsepower is no longer needed. That means that you might migrate another application to those standalone Hyper-V hosts That could be high disk IO intensive application, that is perhaps load balanced in some way so you can have multiple virtual machines mapped to the hosts (1 to 1, many to one). You could even automate this all and use the “Beast” as a dynamic resource based on temporal demands.
In the case of the SQL Server DAG you might opt to keep one DAG member on the SAN so it can be replicated and backed up via snapshot or whatever technology you are leveraging on that storage.
Extend to Other Use Cases
More scenarios are possible. You could build such a beast to be a Scale Out File Server or PCI RAID/Shared SAS if you need shared storage to build a Hyper-V cluster when your apps require it for high availability.
The latter looks a lot like a cluster in a box actually. I don’t think we’ll see a lot iSCSI in cluster in a box scenarios, SAS might be the big winner here up to 4 nodes (without a “SAS switch”, which brings even “bigger” scenarios to live with zoning, high availability, active cables and up to 24Gbps of bandwidth per port).
Using a SOFS means that if you also use SMB 3.0 support with your central SAN you can leverage RDMA for shared nothing live migration, which could help out with potentially very large VHDs of your virtual SQL Servers.
Please note that the big game changer here compared to previous versions of windows is Shared Nothing Live Migration. This means that now you have virtual machine mobility. High performance storage and the right connectivity (10Gbps, Teaming, possibly RDMA if using SMB 3.0 as source and target storage) means we no longer mind that much to have separate storage silos. This opens up new possibilities for alleviating IOPS issues. Just size this example to your scenarios & needs to think about what it can do for you.
Disclaimer: This is white board thinking & design, not a formal solution. But I’d cannot ignore the potential and possibilities. And for the critics. No this doesn’t mean that I’m saying modern SANs don’t have a place anymore. Far from it, they solve a lot of needs in a lot of scenarios, but they do have some (very expensive) pain points.
First some stats: 36 pallets of hardware handled over a period of 10 days. 29 of those over a period of 3 days. Most of it didn’t even exist at the beginning of the month, it was just an order. But DELL is a logistical force to be reckoned with. “Easy as DELL” is a reality, the speed at which they respond to request and orders is amazing. For quality/price balance, service, logistics, speed and support, it’s hard to beat them
A lot of people are used to dealing with slower processes and think SANs take at least 2 to 3 months to de delivered after ordering. This means they are caught of guard by this. I’m happy to say I’m not otherwise the data center would have been blocked by a tsunami of packaging material and hardware.
We’ve been busy unloading, unpacking, racking and partially cabling the new hardware coming in for a multi site SAN project. And let’s not forget the labeling. While we are far from finished, this good news. We’re finally busy working on the installation after the long time consuming process of procuring the equipment. That’s never an easy process, let alone a fast one. But I digress.
What are we working with?
Dell Compellent SANs (intra and inter site data protection / redundancy)
PowerVault MD3600 & MD1200 storage units for disk to disk backup capacity
Now to go from this
to this and beyond …
Takes quite a while as you can imagine and we still have a ton of stuff to do . I’ll be sharing my experiences and findings via this blog when I can.
My high level design focuses on scale out to achieve both performance, flexibility and resiliency. We’ll build a modular scale up and scale out solution using commodity hardware and not in a mega redundant, ultra scalable single and very expensive storage solution. You can read more on my views about this subject here Some Thoughts Buying State Of The Art Storage Solutions Anno 2012.For the backup we are following the same approach. We cannot afford to pay the amounts of money that seems to be needed to buy high end backup appliances. We have plans to leverage Windows 2012 to help us achieve this but these are subjects for some other blog posts later.
I’m very exited about the TRIM/UNMAP support in Windows Server 2012 & Hyper-V with the VHDX file. Thin provisioning is a great technology. It’s there is more to it than just proactive provisioning ahead of time. It also provides a way to make sure storage allocation stays thin by reclaiming freed up space form a LUN. Until now this required either the use of sdelete on windows or dd for the Linux crowd, or some disk defrag product like Raxco’s PerfectDisk. It’s interesting to note here that sdelete relies on the defrag APIs in Windows and you can see how a defragmentation tool can pull off the same stunt. Take a look at Zero-fill Free Space and Thin-Provisioned Disks & Thin-Provisioned Environments for more information on this. Sometimes an agent is provided by the SAN vendor that takes care of this for you (Compellent) and I think NetApp even has plans to support it via a future ONTAP PowerShell toolkit for NTFS partitions inside the VHD (https://communities.netapp.com/community/netapp-blogs/msenviro/blog/2011/09/22/getting-ready-for-windows-server-8-part-i). Some cluster file system vendors like Veritas (symantec) also offer this functionality.
A common “issue” people have with sdelete or the like is that is rather slow, rather resource intensive and it’s not automated unless you have scheduled tasks running on all your hosts to take care of that. Sdelete has some other issue when you have mount points, sdelete can’t handle that. A trick is to use the now somewhat ancient SUBST command to assign a drive letter to the path of the mount point you can use sdelete. Another trick would be to script it yourself see. Mind you can’t just create a big file in a script and delete it. That’s the same as deleting “normal” data and won’t do a thing for thing provisioning space reclamation. You really have to zero the space out. See (A PowerShell Alternative to SDelete) for more information on this. The script also deals with another annoying thing of sdelete is that is doesn’t leave any free space and thereby potentially endangers your operations or at least sets off all alarms on the monitoring tools. With a home grown script you can force a free percentage to remain untouched.
With Windows Server 2012 and Hyper-V VHDX we get what is described in the documentation “’Efficiency in representing data (also known as “trim”), which results in smaller file size and allows the underlying physical storage device to reclaim unused space. (Trim requires physical disks directly attached to a virtual machine or SCSI disks in the VM, and trim-compatible hardware.) It also requires Windows 2012 on hosts & guests.
I was confused as to whether VHDX supports TRIM or UNMAP. TRIM is the specification for this functionality by Technical Committee T13, that handles all standards for ATA interfaces. UNMAP is the Technical Committee T10 specification for this and is the full equivalent of TRIM but for SCSI disks. UNMAP is used to remove physical blocks from the storage allocation in thinly provisioned Storage Area Networks. My understanding is that is what is used on the physical storage depends on what storage it is (SSD/SAS/SATA/NL-SAS or SAN with one or all or the above) and for a VHDX it’s UNMAP (SCSI standard)
Basically VHDX disks report themselves as being “thin provision capable”. That means that any deletes as well as defrag operation in the guests will send down “unmaps” to the VHDX file, which will be used to ensure that block allocations within the VHDX file is freed up for subsequent allocations as well as the same requests are forwarded to the physical hardware which can reuse it for it’s thin provisioning purpose. Also see http://msdn.microsoft.com/en-us/library/hh848053(v=vs.85).aspx
So unmap makes it way down the stack from the guest Windows Server 2012 Operating system, the VHDX , the hyper visor and the storage array.This means that an VHDX will only consume storage for really stored data & not for the entire size of the VHDX, even when it is a fixed one. You can see that not just the operating system but also the application/hypervisor that owns the file systems on which the VHDX lives needs to be TRIM/UNMAP aware to pull this off.
The good news here is that there is no more sdelete to run, scripts to write, or agents to install. It happens “automagically” and as ease of use is very important I for one welcome this! By the way some SANs also provide the means to shrink LUNs which can be useful if you want the space used by a volume is so much lower than what is visible/available in windows and you don’t want people to think you’re wasting space or all that extra space is freely available to them.
To conclude I’ll be looking forward to playing around with this and I hope to blog on our experiences with this later in the year. Until Windows Server 2012 & VHDX specifications are RTM and fully public we are working on some assumptions. If you want to read up on the VHDX format you can download the specs here. It looks pretty feature complete.
I’ve been looking into storage intensively for some time. At first it was reconnaissance. You know, just looking at what exist in software & hardware solutions. At that phase it was pure and only functionality wise as we found our current more traditional SANs a dead end.
After that there was the evaluation of reliability, performance and support. We went hunting for both satisfied and unsatisfied customers, experiences etc. We also considered whether a a pure software SAN on commodity hardware would do for us or whether we still need specialized hardware or at least the combination of specialized software on vendor certified and support commodity hardware. Yes even if you have been doing things a certain way for a longer time and been successful with is it pays to step back and evaluate if there are better ways of doing it. This prevents tunnel vision and creates awareness of what’s out there that you might have missed.
Then came the job of throwing out vendors who we thought couldn’t deliver what was needed and /or who have solutions that are great but just to expensive. After that came the ones whose culture, reputation was not suited for or compatible with our needs & culture. So that big list became (sort of) a long list, which eventually became a really short list.
There is a lot of reading thinking, listening, discussing done during these phases but I’m having fun as I like looking at this gear and dreaming of what we could do with it. But there are some things in storage world that I found really annoying and odd.
Scaling Up or Scaling Out with High Availability On My mind
All vendors, even the better ones in our humble opinion, have their strong and weak points. Otherwise they would not all exist. You’ll need to figure out which ones are a good or the best fit for your needs. So when a vendor writes or tells me that his product X is way above others and that those others their product Z only competes with the lower end Y in his portfolio I cringe. Storage is not that simple. On the other hand they sometimes over complicate straightforward functionality or operational needs if they don’t have a great solution for it. Some people in storage really have gotten trivializing the important and complicating the obvious down to an art. No brownie points for them!
One thing is for sure, when working on scalability AND high availability things become rather expensive. It’s a bit like the server world. Scale up versus scale out. Scaling up alone will not do for high availability except at very high costs. Then you have the scalability issue. There is only so much you can get out of one system and the last 20% become very expensive.
So, I admit, I’m scale out inclined. For one, you can fail over to multiple less expensive systems and if you have an “N+1” scalability model you can cope with the load even when losing a node. On top of that you can and will use this functionality in your normal operations. That means you know how it works and that it will work during a crisis. Work and train in the same manner as you will when the shits hits the fan. It’s the only way you’ll really be able to cope with a crisis. Remember, chances are you won’t excel in a crisis but will fall back to you lowest mastered skill set.
Oh by the way, if you do happen to operate a nuclear power plant or such please feel free to work both fronts for both scalability & reliability and then add some extra layers. Thanks!
Expensive Scale Up Solutions On Yesterday’s Hardware?
I cannot understand what keeps the storage boys back so long when it comes to exploiting modern processing power. Until recently they all stilled lived in the 32 bit world running on hardware I wouldn’t give to the office temp. Now I’d be fine with that if the prices reflected that. But that isn’t the case.
Why did (does) it take ‘m so long to move to x64 bit? That’s been our standard server build since Windows 2003 for crying out loud and our clients have been x64 since the Vista rollout in 2007. It’s 2012 people. Yes that’s the second decade of the 21st century.
What is holding the vendors back from using more cores? Realistically, if you look at what’s available today, it is painful to see that vendors are touting the dual quad core controllers (finally and with their software running x64 bit) as their most recent achievement. Really, dual Quad core, anno 2012? Should I be impressed?
What’s this magic limit of 2 controllers with so many vendors? Did they hard code a 2 in the controller software and lost the source code of that module?
On the other hand what’s the obsession with 4 or more controllers? We’re not all giant cloud providers and please note my ideas on scale out versus scale up earlier.
Why are some spending time and money in ASIC development for controllers? You can have commodity motherboard with for sockets and 8, 10, 12 cores. Just buy them AND use them. Even the ones using commodity hardware (which is the way to go long term due to the fast pace and costs) don’t show that much love for lots of cores. It seems cheap and easy, when you need a processor upgrade or motherboard upgrade. It’s not some small or miniature device where standard form factors won’t work. What is wrong in your controller software that you all seem to be so slow in going that route? You all talk about how advanced, high tech, future tech driven the storage industry is, well prove it. Use the 16 or to 32 cores you can easily have today. Why? Because you can use the processing powers and also because I promise you all one thing: that state of the art newly released SAN of today is the old, obsolete junk we’ll think about replacing in 4 years time so we might not be inclined to spend a fortune on it . Especially not when I have to do a fork lift upgrade. Been there, done that and rather not do it again. Which brings us to the next point.
Flexibility, Modularity & Support
If you want to be thrown out of the building you just need to show even the slightest form of forklift upgrade for large or complex SAN environments. Don’t even think about selling me very expensive highly scalable SANs with overrated and bureaucratic support. You know the kind where the response time in a crisis is 1/10 of that of when an ordinary disk fails.
Flexibility & Modularity
Large and complex storage that cost a fortune and need to be ripped out completely and/or where upgrades over its life time are impossible or cost me an arm and a leg are a no go. I need to be able to enhance the solution where it is needed and I must be able to do so without spending vast amounts of money on a system I’ll need to rip out within 18 months. It has more like a perpetual, modular upgrade model where over the years you can enhance and keep using what is still usable .
If that’s not possible and I don’t have too large or complex storage needs, I’d rather buy a cheap but functional SAN. Sure it doesn’t scale as well but at least I can throw it out for a newer one after 3 to 4 years. That means I can it replace long before I hit that the scalability bottleneck because it wasn’t that expensive. Or if I do hit that limit I’ll just buy another cheap one and add it to the mix to distribute the load. Sure that takes some effort but in the end I’m better and cheaper off than with expensive, complex highly scalable solutions.
To be brutally honest some vendors read their own sales brochures too much and drank the cool aid. They think their support processes are second to none and the best in the business. If they really believe that they need to get out into the field an open up their eyes. If they just act like they mean that they’ll soon find out when the money starts talking. It won’t talk to you.
Really some of you have support process that are only excellent and easy in your dreams. I’ll paraphrase a recent remark on this subject about a big vendor “If vendor X their support quality and the level of responsiveness what only 10% of the quality of their hardware buying them would be a no brainer”. Indeed and now that fact it’s a risk factor or even a show stopper.
Look all systems will fail sooner or later. They will. End of story. Sure you might be lucky and never have an issue but that’s just that. We need to design and build for failure. A contract with promises is great for the lawyers. Those things combined with the law are their weapons on their battle field. An SLA is great for managers & the business. These are the tools they need for due diligence and check it off on the list of things to do. It’s CYA to a degree but that is a real aspect of doing business and being a manger. Fair enough. But for us, the guys and gals of ICT who are the boots on the ground, we need rock solid, easy accessible and fast support. Stuff fails, we design for that, we build for that. We don’t buy promises. We buy results. We don’t want bureaucratic support processes. I’ve seen some where the overhead is worse than the defect and the only aim is to close calls as fast as they can. We want a hot line and an activation code to bring down the best support we can as fast as we can when we need it. That’s what we are willing to pay real good money for. We don’t like a company that sends out evaluation forms after we replaced a failed disk with a replacement to get a good score. Not when that company fails to appropriately interpret a failure that brings the business down and ignores signals from the customer things are not right. Customers don’t forget that, trust me on this one.
And before you think I’m arrogant. I fail as well. I make mistakes, I get sick, etc. That’s why we have colleagues and partners. Perfection is not of this world. So how do I cope with this? The same way as when we designing an IT solution. Acknowledge that fact and work around it. Failure is not an option people, it’s pretty much a certainty.That’s why we make backups of data and why we have backups for people. Shit happens.
The Goon Squad Versus Brothers In Arms
As a customer I never ever want to have to worry about where your interests are. So we pick our partners with care. Don’t be the guy that acts like a gangster in the racketeering business. You know they all dress pseudo upscale to hide the fact they’re crooks. We’re friends, we’re partners. Yeah sure, we’ll do grand things together but I need to lay down the money for their preferredsolution that seems to be the same whatever the situation and environment.
Some sales guys can be really nice guys. Exactly how nice tends to depend on the size of your pockets. More specifically the depth of your pockets and how well they are lined with gold coin is important here. One tip, don’t be like that. Look we’re all in business or employed to make money, fair enough, really. But if you want me be your client, fix my needs & concerns first. I don’t care how much more money some vendor partnerships make you or how easy it is to only have to know one storage solution. I’m paying you to help me and you’ll have to make your money in that arena. If you weigh partner kickbacks higher than our needs than I’ll introduce you to the door marked “EXIT”. It’s a one way door. If you do help to address our needs and requirements you’ll make good money.
The best advisors – and I think we have one – are those that remember where the money really comes from and whose references really matter out there. Those guys are our brothers in arms and we all want to come out of the process good, happy and ready to roll.
The joy simply is great, modern, functional, reliable, modular, flexible, affordable and just plain awesome storage. What virtualization /Private cloud /Database /Exchange systems engineer would mind getting to work with that. No one, especially not when in case of serious issues the support & responsiveness proves to be rock solid. Now combine that with the Windows 8 & Hyper-V 3.0 goodness coming and I have a big smile on my face.
There have been a good number of people who’ve always used, some a lot more and some others a lot less, a bit of Microsoft bashing to gain some extra credibility or try to position other products as superior. Sometimes this addressed, at least, some real challenges and issues with Microsoft products. A lot of the time it doesn’t. I have always found this ridiculous. In the early years of this century I was told to get out of the Microsoft stack and into the LAMP stack to make sure I still had a job in a few years’ time. My reaction was to buy Inside SQL Server 2000 among other technology books . The paradox is that in some cases, like some storage integrators, is that the ones doing the bashing are forgetting that their customers are often heavily invested in the Microsoft stack.
I Still Have A Job
As you might have realized already, I still have a job today. I’m very busy, building more and better environments based on Microsoft technologies. Microsoft does not get everything right. Who does? Sometimes it takes more than a few tries, sometimes they fail. But they also succeed in a lot of their endeavors.They are capable to learn, adapt and provide outstanding results with a very good support system to boot (I would dare say that you get out of that what you put into it). Given the size and nature of the company, combined with IT evolving at the speed of light, that’s not an easy task.
Today that ability translates into the upcoming release of Windows 8. Things like Hyper-V 3.0, the new storage and networking features, the improvements to clustering and the file system are the current state an evolution. A path along Windows 2000 over Windows 2003(R2), to the milestone Windows 2008 which was improved with Windows 2008 R2. Now, Windows 8 being the next generation improves vastly on that very good and solid foundation. With Windows 8 we’ll take the next step forward in building highly scalable, highly available, feature rich a very functional solutions in a very cost effective manner. On top of that we can do more now than ever before, with less complexity and with affordable standard hardware. If you have a bigger budget, great, Windows 8 will deliver even more and better bang for the buck if and when your hardware vendors get on the band wagon.
Windows 8 & Storage
One of the things the Windows BUILD Conference achieved is that it wanted me to buy hardware that I couldn’t get yet. Just try asking DELL or HP for RDMA support on 10Gbps and you get a bit of a vacant blank stare.
Another thing is that it made me look at our storage roadmap again. One of the few sectors in IT that are still very expensive is storage. Some of the storage vendors might start to feel a bit like a major network gear vendor. You know the one that has also seen the effects of serious competition by high quality but lower cost kit. Just think about what Storage Pools/Spaces will do for affordable, easy to use and rich storage solutions. Both with standard over the shelf available (read affordable) hardware and with modern SANs that leverage the Windows 8 features there is value. Heath my warning storage vendors. You’re struggling in the SMB market due to complexity, cost and way to much overhead and expensive services. Well it’s only going to get worse. You’ll have to come with better proposals or you’ll end up being high end / niche market players in the future. Let’s face it, if I can buy a super micro chassis with the disks of my choosing I can build my own storage solution for cheap and use Windows 8 to achieve my storage needs. Perhaps is 80/20 but hey, that’s great. It’s not that much better with more expensive solutions (vendor disks are ridiculously over priced) and the support process is sometimes a drain on your workforce’s time and motivation. And yes you paid for that. Compare this with being able to buy some spare parts on the cheap and having it all available of the shelf with the vendors. No more calls, no more bureaucratic mess for return parts, nor more IT illiterate operators to work through before you reach support that can be sub standard as well. Once you reach a certain level of hardware quality there is not that much difference any more except for price and service. Granted, some vendors are better at this then others. The really big ones often struggle getting this right.
I’ve been in this business long enough to know that all stuff breaks. SLAs are fine for lawyers and for management. CYA is part of doing business. But for the IT Pro in the field you need reliable people, gear and services. On top of that you have to design for failure. You know things will break. So it should be a cheap, easy and fast as possible to fix while your design and architecture should cope with the effects of a failure. That’s what IT Pros need and that what’s keeps things running (not that SLA paper in the mailbox of your manager).
Show the Windows customers a bit more love than you have done in the past. Some in the storage industry tend to like to look down on the Windows OS. But guess what, it is your largest customer base. Unless you want to end up in the same niche as a very expensive personal trainer for Hollywood stars (tip: there’s not a huge job market there) you’d better adjust to new realities. A lot of them are doing that already , some of them aren’t. To those: get over it and leverage the features in Windows 8. You’ll be able to sell to a more varied public and at the high end you’ll have even better solutions to offer. Today I notice way to many storage integrators who haven’t even looked at Windows 8. It’s about time they started … really, like today. I mean how do you want to sell me storage today if you can’t answer my queries on Windows 8 & System Center 2012 support and integration? To me this is huge! I want to know about ODX, RDMA, SMI-S and yes I want you to be able to answer me how your storage deals with CSVs. You should know about the consumption of persistent ISCSI-3 reservations and a rock solid hardware VSS provider. If you can do that it creates the warm fuzzy feeling a customers need to make that leap of faith.
When I look at the network improvements in Windows 8. Things like RDMA, SMB 2.2; File Transfer Offload and what that means for file sharing and data intensive environments I’m pretty impressed. Then there is Hyper-V 3.0 and it many improvements. Only a fool would deny that it is a very good, affordable & rich hypervisor with a bright future as far as hypervisors go (they are not the goal, just a means to an end). Live Storage Migration, an extensible virtual switch, monitoring of the virtual switch, Network Virtualization, Hyper-V Replica, … it’s just too much to mention here. But hop on over to Windows 8 Hyper-V Feature Glossary by Aidan Finn. He’s got a nice list up of the new features relevant to the Hyper-V crowd. Again, we see improvements for all business sizes, from SMB to enterprise, including the ISPs and Cloud providers. Windows 8 is breaking down barriers that would interdict it’s use in various environments and scenarios. Objections based on missing features, scalability, performance or security in multi tenancy environments are being wiped of the map. If you want to see some musing on this subject just look at Group Video Interview: What is your favorite Hyper-V feature in Windows 8?.
2012 & Beyond
Hyper-V is growing. It’s already won a lot of hearts and minds of many smaller Microsoft shops but it’s also growing in the enterprise. The hybrid world is here when you look at the numbers, even if it’s not yet the case in your neck of the woods. Why? Cost versus features. Good enough is good enough. Especially when that good is rather great. On top of that the integration is top notch and it won’t cost you a fortune and save you a lot of plumbing hassle.
Basically everyone can benefit from all this. You’ll get more and better at a lesser or at least a more affordable cost. Even if you don’t use any Microsoft technologies you’ll benefit from the increased competition. So everyone can be happy.
As TechNet subscribers we had access to Windows Storage Server 2008 with Microsoft iSCSI Software Target 3.2 (also see Jose Barreto’s blog on this here). That was sweet but for one little issue. This SKU cannot be a Hyper-V Host. In order not to lose a physical host in the lab you could edit the MSI installer from the Windows Storage Server 2008 install media where you would delete the SKU check. Problem solved but not very legal so nobody ever did that . You can install Windows Storage Server in a VM for the lab I know but that becoming very SkyNet like … Virtual servers providing virtual storage for virtual servers … and while a good option to have I like to have a hardware host.
Bring Windows Storage Server 2008 R2 along and Microsoft decided that we could have the have the iSCSI Software Target 3.3 software without constraints, except that you needed a TechNet/MSDN subscription, to install on W2K8R2. This is the one I’m running in my labs at the moment installed on a Physical Windows Server 2008 R2 Enterprise edition that also is a Hyper-V host. This provides all my iSCSI storage to both physical and virtual clusters . I used it to test MelioFS with FileScaler recently with a 2 node virtual cluster.
Today, Jose Barreto blogged about the public release of iSCSI Software Target 3.3 for Windows Server 2008 R2. This is very good news as now everyone has access to an iSCSI target for labs, testing, POCs and even production. Thank you Microsoft . Now with some luck we could get some SMI-S support for it with SCVMM2012 ? Please?
If you need some help, Jose Barreto has a bunch of blog posts on configuring the iSCSI target, so I suggest you check out his site. As an added benefit, Microsoft iSCSI Software Target 3.3 setup & configuration is scriptable using PowerShell.
After implementing a couple of SAN’s with backup solutions I have come to dislike Virtual Tape Libraries. This is definitely technology that, for us, has never delivered the promised benefits. To add insult to injury it is overly expensive and only good to practice hardware babysitting. When discussing this I’ve been told that I want things to cheap and that I should have more FTE to handle all the work. That’s swell but the business and the people with the budgets are telling me exactly the opposite. That explains why in the brochures it’s all about reduced cost with empty usage of acronyms like CAPEX and OPEX. But when that doesn’t really materialize the message to the IT Pros is to get more personnel and cash. In the best case (compared to calling you a whining kid) they‘ll offer to replace the current solution with the latest of the greatest that has, wonder oh wonder, reduced CAPEX & OPEX. Fool me once shame on you, fool me twice, shame on me.
So one thing I’m not planning to buy, ever again, is a Virtual Tape Library (VLS). Those things have a shelf life of about 2 years max. After that they turn into auto disintegrating pieces of crap you’ll spend babysitting for the rest of the serviceable live. This means regularly upgrading the firmware to get your LUN(s) back, if you get them back that is. This means convincing tech support to come up with a better solution than restarting from scratch when they acknowledge that their OS never cleans up its own log files and thus one day just kicks the bucket. Luckily they did go the extra mile on that one after we insisted and got a workaround without losing al backups. Babysitting also means that replacing the battery kits of all shelves becomes a new hobby. You become so good at it that you have better and faster way of doing it than the junior engineers they send who happily exclaim “so this is what it looks like”. The latter is not a confidence builder. The disks fail at a rate of 1 to 2 per week until you replaced almost all of them. Those things need to be brought down to fix just about anything. That means shutting down the disk shelf’s as well and cutting the power, not just a reboot so yes you need to be in the data center.
There is no RAID 6, no hot spares (global or otherwise). The disks cost an arm and a leg and have specialized hardware to make sure all runs fine and well. But in they are plain7.200 rpm cheap 500 GB SATA disks that cost way too much. The need for special firmware I can understand but the high cost I cannot. The amount of money you pay in support costs and in licensing the storage volume is more than enough to make a decent profit. Swapping disks and battery kits isn’t hard and we do it ourselves as waiting for a support engineer takes more time. We have spares at hand. We buy those when we by the solution. We’ve grown wise that way. We buy a couple units of all failure prone items at the outset of any storage project. Having only RAID 5 means that one disk failure puts you virtual tapes at a very high risk so you need to replace it as soon as possible. Once they shipped us a wrong disk, our VTL went down the drain due to incorrect firmware on disk. They demanded to know how it got in there. Well Mr. Vendor, you put it in yourself a as a replacement for a failed disk. In the first year it often happened they didn’t have more than 1 spare disk to ship. If anyone else has a VLS in your area you’re bound to hit that limit have to wait longer for parts. They must have upped the spare parts budget to have some more on hand just for us as we now get a steady supply in .
When you look at the complete picture the cost of storage per GB on a VLS is a much as on 1st tier SAN storage. That one doesn’t fly well. At least the SAN has decent redundancy and is luckily a 100 fold more robust and reliable. Why buy a VLS when you can have a premier tier SAN for the same cost, the VLS functionality? No sir that also never lived up to its promises. It has come to the point that the VTL, due to the underlying issues with the device, are more error prone than our Physical Tape Library. That’s just sad. Anyway, we never got the benefits we expected form those VTL. For disk based backup I don’t want a Virtual Tape Library System anymore. It just isn’t worth the cost and hassle.
Look you can buy 100 TB of SATA storage, put it in couple of super micro disk bays, add a 10Gbps Ethernet to your backup network and you’re good to go. Hey that even gives you RAID 6, the ability to add hot spares etc. You buy some spare disks, controllers, NICS, and perhaps even just build two of these setups. That would give you redundant backup to two times 100 TB for under 60.000 €. A VLS with 100TB, support and licensing will put you back the 5 fold of that. Extending that capacity costs an arm and a leg and you’re babysitting it anyway so why bleed money while doing that?
Does this sound crazy? Is this blasphemy? The dirty little secret they don’t like you to know is that’s how cloud players are doing it. First tier storage is always top notch, but if you talk about backing up several hundreds of terabytes of data, the backup solutions by the big vendors are prohibitively expensive. This industry looks a lot like a mafia racketeering business. Well if you don’t buy it you’ll get into trouble, you’ll lose your data. You won’t’ be able to handle it otherwise. Accidents do happen. The guys selling it even dress like mobsters in their suits. But won’t you miss out on cool things like deduplication? If your backup software supports it you can still have that. The licensing cost for this isn’t that bad a deal when compared to VLS storage costs. And do realize instead of 2*100 TB you could make due to 2* 25 TB Hey that price even dropped even more.
When it comes to provisioning storage our strategy is to buy as much of your storage needs during the acquisition phase. That’s the only time deals can be made. The amount of discounts you’ll get will make you wonder what the markup in this market actually. It must be huge. And storage can’t cost as much as you think to build as they would like us to believe. Last time the storage sales guy even told us they were not making any money on the deal. Amazing how companies giving away their products have very good profit margins, highly paid employees with sales bonuses and 40.000 € cars If one vendor or reseller ever tells you that things will get cheaper with time, they are lying. They are actually saying their profit margins will increase during the life cycle of you storage solution. Look, all 1st tier storage is going to be expensive. The best you can hope for is to get a good quality product that performs as promised and doesn’t let you down. We’ve been fortunate in that respect to our SAN solutions but when it comes to backup solutions I’m not pleased with the state of that industry. Backups are extremely important live savers, but for some reason the technology and products remain very buggy, error prone, labor intensive and become very expensive when the data volumes and backup requirements rise.