How to Spot a Legacy Storage Vendor

January 18, 2021

As do all of us in the storage industry, I love reading what Chris Mellor writes. Last week he described a “street fight” between NetApp and Pure Storage, with each trying to articulate that the other is a legacy storage vendor.

I even retweeted his story with our own quick summary:

I then decided to spend some time expanding upon the message from my tweet with more details, specifically, how to spot storage vendors selling legacy architectures. I feel strongly about this subject, as I started a company to run away from these older limitations and tradeoffs that were minted in the `90s. First, however, I am compelled to acknowledge that every new storage vendor gets to stand on the shoulders of the giants of the industry that developed the prior innovation, and for this I have tremendous respect for the legacies built by both NetApp and Pure Storage.

Now, let’s turn to my Twitter post, examine the five points in that tweet, and elaborate on what makes a vendor a legacy vendor. (Expansion on this subject requires abundantly more than 280 characters!) Keep in mind that vendors don’t have to hit all points to be considered legacy vendors in my eyes, but obviously the more points they “tick,” the less relevant they are in their abilities to solve the problems of tomorrow.

Point #1: Selling Systems Built with a Proprietary “Tin”

Surprisingly, the term “tin” came out in other blogs to which Mellor referred and was not brought up by me.

Storage systems started as highly engineered hardware-based solutions in the `80s and `90s of the last century. To advance the innovations in resiliency, density, and performance, storage system designers back then were forced to engineer special buses, memory, and resilient enclosures. Then in the early 2000s there was a strong movement toward COTS (Commodity off-the-shelf) hardware-based solutions that were driven by software. I was a successful part of that movement at XIV (acquired by IBM), where we were the first to build a Tier-1 block\SAN solution with no custom-engineered hardware. Although we were not designing any hardware, all of the solutions back then still relied on specialized components from obscure/niche “shelves”; even these COTS solutions were not the standard servers you could just buy from server vendors, such as Hewlett Packard Enterprise or Supermicro. We relied on specialty vendors like Xyratex or others.

Fast forward to today. With the advancement in flash and NVMe technology, networking architectures, and server platforms, it is possible to build best-of-breed storage solutions using totally standard off-the-shelf components that are widely available from pretty much any server vendor.

Summary #1: If you are still using a storage solution that is available only in a customized proprietary hardware form factor from your storage vendor, and you cannot run it on the server vendor that you like doing business with, it’s a clear indication you’re not using a solution based on current design principles, and you’re buying a legacy storage.

This brings me to the second point from my tweet–about the cloud.

Point #2: Cloud Offering Is Different & Compromised Compared to Proprietary “Tin” → Hybrid Cloud is Limited

Legacy vendors that have solutions based on their own proprietary “tins” with hardware dependency cannot run the same software on public cloud infrastructures. They have to develop a different solution. Sure, they may give it the same name, sneaking “Cloud” in the title, but it is a totally different solution from that which runs on the custom “tin.” In fact, they even can run as a managed service on the proprietary tin inside the public cloud data center, but that is not what the cloud is about.

If they want to run on cloud infrastructures, then they must either port a subset of their solutions to run on the cloud, while changing some design elements, or create completely new products to run on the cloud. Either way, the functionality of legacy vendors’ products on the cloud is not the same as that which customers enjoy on premises (on the private cloud). The integrations need to be different, which means that scalability, resiliency, and performance of the solution on the public cloud is different.

Many of the organizations we are working with are transitioning toward a hybrid cloud model, running the bulk of their workloads on their on-premises private clouds, while having the ability to burst to the public cloud for elasticity or for DR (Disaster Recovery). This allows organizations to leverage the compelling economics of on-premises storage–and the compelling economics of cloud-DR (don’t pay unless a drill or actual disaster) and bursting using spot instances.

At Weka we were able to show that we got an enterprise-grade shared-file storage solution on AWS to the top 3 positions of the IO500 table, where the first two solutions are not even commercial offerings in GA.

Summary #2: If your current storage vendor does not support exactly the same specification for features, CLI, and performance, and if it doesn’t scale the same on-premises and in the cloud, these are clear indications that you’re using a legacy vendor.

Let’s keep going.

Point #3: Limited Scale and Support for Mixed Workloads → Tons of Silos

Modern design principles allow the creation of highly distributed storage systems that can grow to hundreds of petabytes and even exabytes. Also, it is possible to come up with IO stacks that allow running diverse applications on the same system.

Legacy solutions were created with design principles, limitations, and tradeoffs that prevented coming up with large scale–in terms of capacity or performance scaling. Moreover, this design forced tuning parameters (this was also referenced in the blogs!) that limit each system performance envelope to certain types of IOs. When data sets were relatively small and storage systems were single purpose, this innovation worked, but today’s workloads are exploring data in the petascale, and IO patterns are unknown.

You shouldn’t have to consider what your system limitations might be if you want to tune for small IOs and low latency (high IOPS number) versus large IOs with high throughput or many metadata operations. You should be able to run them all over the same data on a single scalable system built on modern hardware designs. Period.

Another important requirement, for both on-premises and the cloud, is having the ability to expand a system on demand while it is online. Otherwise, you’re stuck with a horrible sizing effort each time you start a new project and buy a system. Good up-to-date storage systems must have the ability to expand, to add more capacity or performance while in heavy production usage.

Summary #3: Go back and think about your datacenter architecture decisions.  If you must deploy more storage systems than functionally required by physical separation of resources, you’re using a legacy storage vendor. If you must make sizing decisions that lock you in for the life of that system, you’re also using a legacy storage vendor.

There’s more….

Point #4: Limited Aggregate Performance & Single Client Performance → Unfit for GPU

About a decade ago it became obvious that Moore’s law ceased to exist. The enterprise world had to solve larger, more complex problems, which led to extreme scale-out, and due to limitations of legacy storage vendors it also led to the proliferation of “Big Data” that tried to circumvent those limitations.

In the last few years GPUs have entered the data centers, and they provide significantly more compute capacity than was possible using even very large CPU compute farms.

A single, modern GPU-filled server today can reach to about 5 PetaFLOPS. This performance is equivalent to the world’s top supercomputer of about a decade ago. These supercomputers that filled rooms a decade ago had IO systems that could reach dozens of GB/s aggregate throughput to get them rolling. Now that this compute capacity fits in a single box, it still needs that kind of IO throughput, and when leveraging multiple GPU-filled servers the aggregate throughput needs to be even greater.

Future-looking companies are migrating their workloads to GPUs. To be effective, each GPU-based server needs to access data that historically would have equaled the aggregate throughput of dozens or hundreds of legacy CPU servers connected to different storage systems. This trend started with AI\ML, but now we see many examples across pharmaceutical companies, financial organizations, and retail establishments where customers are replacing large-scale, “Big Data,” open-source solutions that fill entire data centers with a single machine or a few racks of infrastructure.

In order to leverage and unleash the power of the GPU platform, storage systems need to “up their game” in terms of aggregate overall performance and, even more importantly, single-client performance; otherwise, they face the risk of wasting the valuable compute cycles of these dense compute systems.

Summary #4: If the storage system you’re using now has the same single-client performance limitations that existed about a decade ago, you’re using a legacy storage vendor. If the system you’re using now has an aggregate throughput\IOPS number that has not increased dramatically when compared with numbers a decade ago, you’re using a legacy storage vendor.

Let explore the final point from my tweet.

Point #5: Data Backup and DR Are Performed by Others–or They’re Afterthoughts

Data Protection (backup\archive\etc.) is a huge responsibility for any storage professional. Back in the time when storage systems were limited in scale and workload, mobility to the cloud was unheard of, backup was a simple procedure, which led to the proliferation of “secondary storage” backup vendors.

With data capacities growing exponentially, businesses demand a sound DR strategy, a cloud bursting strategy, and also backup and archive on-premises and on the cloud. Viewing these solutions as discrete and solving each one individually with separate products is expensive and wasteful, however.  When your data grows to the petabyte scale you don’t want to store too many copies of it to satisfy separate uses (three for archive, another one for DR, etc.) Also, you do not want to integrate several different products to do that.

Now, if we considered the “bread and butter” features of storage resiliency and integrity, here are some questions to ask, the answers of which could indicate that you’re using a legacy vendor from these perspectives:

  • Does your storage system crawl to its knees when it is 75% full?
  • Does storage system performance crawl to a halt during rebuilds, or as long some hardware component is still faulty?
  • Do you still have to build 100% of the failed storage media even when it had little data?

Summary #5: If your primary storage product, your backup product, and your cloud product are different entities, you are dealing with a legacy storage vendor. If your storage vendor forces you to treat backup and DR differently and store the data twice, you’re using a legacy vendor. If your storage vendor forces you to go to a third-party solution in order to get sound backup or archive strategy, you’re also using a legacy vendor. If the performance of the storage system drops significantly during rebuild, you’re using a legacy vendor. If you are still rebuilding blocks and not files, you are using a legacy vendor. If you don’t have end-to-end data integrity protection to the client (each block has a checksum that is calculated at the client and verified on each step of the way to ensure no bit-rot), you’re using a legacy storage vendor.

There’s one more point to make.

A Bonus: Point #6

I can go on longer about what makes a good storage solution, and this is probably a good subject for the next blog.  I have many positive aspects to share about up-to-date storage solutions that have different parallels from those described here—that is, other than those regarding the characteristics of legacy storage vendors.

Speaking of which, I’m going to add another identification cue to this guide even though it was not part of my original tweet. Another strong indication that you’re buying from a legacy storage vendor is this: if that vendor has many offerings, and you have to use different products to achieve your goal, this is an indication that what you’re currently trying to achieve was not the original design point of the solution offered to you.

Summary #6: If your storage vendor has many products, each with slightly different tradeoffs, and you have to use a different mix of them as solutions to different projects, you’re using a legacy vendor.

Conclusion

It’s important to reiterate that all new storage vendors get to build on the great technologies of the legacy vendors. I respect all of Weka’s competitors in the marketplace and acknowledge that the majority of the storage solutions that are sold today are solid–and obviously still have a place in the data center of today. As CIOs start to think about the datacenters of the future and what objectives they will have to meet in years to come, they are well served to question whether the solution that took them to 2020 will meet the needs of the future. Each generation produces a giant leap in performance and scale.

By the way, if you’re still curious as to my opinion about the original matter, both NetApp and Pure are obviously selling legacy storage products by the standards described here. Clearly, both are legacy storage vendors.

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire