VAST Data Makes the Case for All-Flash Storage; Do you Agree?

By John Russell

December 1, 2020

Founded in 2016, all-flash storage startup VAST Data says it is on the verge of upending storage practices in line with its original mission which was and remains “to kill the hard drive,” says Jeff Denworth, one of three founders and VP of products and marketing. That’s a big claim and mission. Indeed, there are several data management/storage startups seeking to disrupt the roughly 20-to-30-year-old storage paradigm based on hard disk drives (HDD) and tape and data tiering.

No one disputes the power of flash but it (still) costs too much. Instead of becoming the dominant mainstream storage technology it has found many critical niches – burst buffers is just one example – where it dramatically improves performance. Without doubt flash’s storage system footprint is expanding. VAST Data argues that for large enough storage systems, roughly one petabyte or greater, the company’s single tier approach along with critical coding innovations is both less expensive and much higher performing than the current typical mix of HDD and tape.

We’ll see. VAST Data has raised a big war chest ($180 million), attracted significant customers (Lawrence Livermore National Laboratory is one), and says it grew 3x in 2019 and that 2020 has shown no letdown despite the pandemic. Most recently, at the fall GTC conference, Nvidia highlighted VAST Data’s ability to leverage Nvidia’s new GPUDirect (direct path from storage to GPU) capabilities on a DGX A100 system where it outperformed traditional parallel file systems.

Jeff Denworth, VAST Data

The proof, of course, is in details and sales and because VAST Data is private, it’s hard to gauge. 3x growth can be impressive or disappointing depending on the starting point. Says Denworth, “I understand the point you’re trying to make, and I’m telling you that there’s something extremely exceptional happening here. We’ll start to take the covers off over the next couple of months.”

At a post SC20 briefing with HPCwire this week, Denworth provided a glimpse under the covers. He talked about how the company leverages container technology to gain scale and performance, what innovations around erasure codes and approaches VAST has developed, and how the company is able to achieve long-term duty cycles (more than ten years) from commodity flash. He coalesced those ideas into a cost argument saying “that gets you to a price point that is not [just] on a TCO basis superior to what you’re buying today, but on a total cost of acquisition basis.”

Notably, VAST Data doesn’t see itself as an HPC storage supplier but as broader enterprise storage supplier with the scale and performance to serve HPC requirements and particularly well-suited to serve the AI and blended HPC/AI workloads where large datasets are critical, for example, in training. Its top segment is the financial services market, followed closely by genomics. Denworth also cited the intelligence community and automotive as important segments. Currently, its product is for on-premises only.

Presented here is a portion of that conversation. Like many, HPCwire is eager to see what “taking the covers off” in a few months means.

HPCwire: Maybe give us a quick primer on VAST Data. The company and you haven’t been shy about tackling big goals.

Jeff Denworth: When we started the company, we discovered that customers weren’t looking for much more performance than what you could get from the generation of flash arrays that came out around the 2010 to 2015 timeframe. That kind of challenged this idea that there would be a big wide open market for even faster flash storage products. Around the same time, we saw a bunch of very high profile, super-fast flash companies and projects kind of dissolve. So when we stopped looking at performance, we started looking at capacity. We realized that there were many opportunities to innovate [there]. At the same time, we saw technology trends, like big data and AI starting to emerge, in particular where deep learning training sets get more efficient, more accurate, [and] more effective as you expose them to larger and larger data sets.

This conclusion that the pyramid of storage that customers had been managing for the last 20 to 30 years might actually be obsolete. [It was] a time when the capacity within your datacenters is the most valuable thing as opposed to like some sort of small database that previously was the most valuable thing, because that’s how AI gets trained effectively. So we started VAST with a simple goal, which is to kill the hard drive. The objective is to basically take customers, from an operational perspective, to a simpler end state, where you can imagine one tier of storage for all of their data. Then applications benefit because then you elevate all of the data sets to basically NVMe levels of performance.

You can imagine a system that has been engineered to couple the performance that you get from a tier-one all flash array with the cost that you would pay for a tier-five archive. And it’s counterintuitive, but the only way to get the cost of storage underneath the price points that you pay for hard drive-based infrastructure is to use flash. The reason for that is the way these new data reduction codes that we’ve pioneered work. You need to basically store data in very small fragments. If you did that with a hard drive, you would basically kind of fragment the hell out of the drive and you wouldn’t be able to read from it. But with flash, you don’t care.

HPCwire: So if not performance, then capacity became the target, along with flexibility?

Jeff Denworth: When we looked at the capacity opportunity, we quickly understood that file systems and object storage is where most of the capacity was in the market. So if we were building high capacity flash, it made sense to make it a distributed file system. And here, just as with the tradeoff between performance and capacity, which is one that we think we’ve broken for the first time in 30 years, there’s also a tradeoff around protocols that we’re also trying to break. The way we saw things is people were using block storage for direct attached storage for lowest latency, sub-millisecond access to their data. You use file systems when you want easy data services provisioning. You buy parallel file systems, if you want RDMA support, and good scale out. Then you buy object storage if you just want to go cheap and deep with your data.

Here we’re making a scale-out file and object storage system that has the latency of an all-flash block device. So sub-millisecond. We’ve extended the utility of file interfaces, in particular NFS, to make it not only appropriate for environments like HPC cluster environments, but also AI. We have support for NFS over RDMA. We have support for NFS multi-pathing. Nvidia showcased the results of our GPU direct storage as part of the GTC conference they did a few weeks ago.

HPCwire: The Nvidia proof-point received a fair amount of notice. Interesting that you chose NFS to focus on rather than a parallel file system. What was the thinking there?

Nvidia DGX-A100

Jeff Denworth: The parallel file system companies are saying, you know, this [product] is x times faster than NFS. And that [product] is y times faster than NFS. But we were using NFS and were able to get for a single DGX A100, the fastest performance that Nvidia has seen. And so why do we take the path of building an enterprise NAS product as opposed to building a parallel file system out of this technology? Well, the first principle of VAST is that we want this to be broadly applicable. And NAS has always been much simpler to deploy, because you just get to use the capabilities that are in the host operating system; you don’t have to install custom parallel file system software on a customer’s cluster, which basically conflates their operating system strategy with their storage agenda because now those two are intertwined.

This is a level of complexity, we never wanted our customers to get to and we just want something that’s ubiquitous that any single application can consume. We made the decision to build a NAS, [and] we realized that there were a ton [of capabilities] in the kernel that could get NFS to a point where it could be, in our case, 90 times faster than what conventional TCP-based NFS has historically been. We call this the concept of Universal Storage, and it’s a product that you can use for everything from VMware to some of the largest distributed applications in the world. Whereas we don’t consider ourselves an HPC storage company, because the product is much more broadly applicable than just HPC storage has classically been, we make a very good storage product for HPC customers.

HPCwire: VAST touts the use of containers in its approach. How specifically does using containers help achieve the scale and performance goals?

Jeff Denworth: Containers, at their simplest level, provide two primary benefits for us. The first is that they are abstracted from the hardware. For example, you can upgrade your file server software without having to reboot your operating system. That in and of itself is such a liberation with respect to being able to build resilient infrastructure. But think of a more sophisticated environment where customers already got Kubernetes running within their environment. Imagine that there is no notion of a classic storage controller in your environment, but rather VAST is a microservice that runs within the environment and scales itself up and down dynamically. Based upon the needs of the application at any given time. This all becomes possible thanks to that abstraction.

Containers also buy you statelessness. What we built is a distributed systems architecture where there is absolutely no state within the controller code of our system. So every single CPU has some sort of Docker image running in it. Every Docker image has a global view to all of the state across the network on the NVMe fabric that we build on. Once you get there, that means that no two containers need to talk to each other at all, because they’re not coordinating any operation that’s happening within them. All of the data is being essentially journaled and articulated down in XPoint on the other side of the network.

Once you have stateless distributed storage systems, you can get to linear scalability. That’s one of the reasons that some of the larger customers in the space like our product, because every time you add a CPU, it becomes a linear unit of performance scale. Take any shared nothing cluster in the marketplace today, where the CPU owns some of the state of the file system, and every time there’s an update to one device within that machine, that update has to be propagated or broadcast out onto the network to all of the other machines. That creates cross talk that ultimately limits scalability.

We’ve sat down with some executives in the AI space, and they say, ‘Tell us how you can be as scalable as something like a parallel file system.’ And we have to say, ‘No, you have to understand this architecture is more scalable than a parallel file system [because] there is zero bottlenecks within the system.’ It’s basically just a large array of XPoint drives and flash drives, and an independent array of shared-everything loosely-coupled containers.

HPCwire: Does the use of Optane memory constrain you to using the Intel interface?

Jeff Denworth: We don’t use [Intel] persistent memory, actually, we use the 3D XPoint 2.5-inch drives, so there is no CPU, it just presents itself as a block device over NVMe fabrics to the machine. There’s no CPU dependency whatsoever.

HPCwire: How are you able to keep the system costs down. That’s a central part of the VAST Data pitch and flash is still not cheap.

Jeff Denworth: You could argue [there are] four ways we’ve done it. The first is, from the onset, we basically made a decision to support the lowest grade flash that that we can, at any given time, consume. A little-known fact about VAST, when we started, is a lot of our early systems and prototypes that we were shipping were built off laptop flash. Since then, we’ve evolved to ship enterprise grade datacenter grade QLC drives. The only reason that we’re using QLC drives as opposed to laptop flash is QLC is cheaper than the TLC laptop flash drives you could buy, [and] you get one SSD controller chip per 15 terabytes versus two terabytes. From a manufacturing cost, it’s more effective. QLC has the property of being by far the least expensive way that you can consume flash right now and we wanted to start from a basis of working with the lowest cost componentry.

[Doing] that requires an entirely different global storage controller architecture, to be able to drive to QLC flash in a way where it won’t wear down prematurely. We have to, in essence, shape writes through a form of log structuring. We take advantage of the fact that the system understands the long-term life expectancy of data as a file system. So we place data into flash blocks according to their life expectancy. And we never co-mingle long-term and short-term data within the same block. So we never have to do brute force garbage collection through the process. With this [approach], we get about 20 times the stated endurance out of these drives than when Intel first announced them and that allows us to put systems on the floor for up to a decade. If you can put flash on the floor for up to a decade, then you don’t worry about the same performance considerations that you did for hard drive-based storage, because you basically have just an endless pool of IOPs at that point. We have customers such as NOAA that have placed orders for 10-year system deployments from us.

The second thing we do is a new type of erasure code. This erasure code is also intended to break a [traditional] tradeoff. That’s a tradeoff between the overhead that you pay to protect against data loss and the resilience that you get from a system in the event of protecting yourself from data loss. We call them locally decodable codes, and the big invention here is we’ve reduced the time to rebuild and increased the number of redundancies you have in a write stripe. And because we write everything into 3D XPoint, the system has the luxury of time to build really fat write stripes. Then when they get moved down into flash at scale, you’re writing at 146-plus-four RAID stripes. At 146-plus-four, you’re paying two and a half percent for your data protection overhead. At the same time, because we’ve got up to four redundancies and we’ve reduced the rebuild times, what you get out of it is 60 million years of meantime to data loss, which is also unprecedented.

HPCwire: You’d mentioned data reduction as another key.

Jeff Denworth: Yes, the third and the final thing we did is what I alluded to earlier about putting data into small pieces. It’s a new form of data reduction that we call similarity-based data reduction. The market has always known that flash gives you the ability to do more fine-grained deduplication and compression and that was great because you could reduce down a database or some virtual machines. When we started, it was our investors that actually said, ‘You know, you can’t do the same things with file and object data, just so you know, it’s already pre-reduced. All the work has been done there, there’s nothing to be gained.’ So we started looking at the nature of data. If you just take two log files, for example, even if they’re pre-compressed, you’ve got timestamps that are interwoven between these different log files on different days. All of this stuff is common across the files, but there’s enough entropy that’s woven across the data such that it would trip up a classic deduplication approach.

What we did is we basically tried to combine the best of both deduplication and compression. The way the system works is your data is written into the system and hits that XPoint buffer untransformed. Your application gets an acknowledgment back that the writes have been completed instantaneously, so you’re basically writing at XPoint speeds. In the background, what we do is we start to run a hashing algorithm against the data just like the backup appliance would work. But a backup appliance will use a SHA 256 hash and say, ‘Okay, if this exact block exists elsewhere in the system, just create a pointer for this new one and don’t store it twice.’

That’s not how our approach works at all. What we are doing is basically a distance calculation; we’re measuring the relative distance between a new block that’s hit the system, and all of the other blocks that are already in the cluster. Once we find that two blocks are close enough to each other, they don’t have to be exactly the same, we start to compress them against each other using a compression algorithm that comes from Facebook called z standard. So compression goes down to byte granularity, which means that we’re basically doing Delta compression and just removing the random bytes and storing those as deltas after the fact.

If you’ve ever used, for example, Google image search, and you upload a photo, and say, ‘Okay, show me everything that’s like that,’ the principles of our similarity-based data reduction were pretty much the same. What we’re doing is we’re measuring the distance and once we find that stuff looks close enough to each other, then we start using a much more fine-grained data reduction approach than deduplication. The compression has some awesome advantages. For example, Lawrence Livermore right now is [getting] three-to-one data reduction. We’ve got customers that take backup software that already has deduplication and compression, then store those backups down into VAST as a target after they’ve been reduced and the VAST system will be able to reduce them by another three-to-one or to six-to-one data reduction, because we’re much more fine-grained.

HPCwire: So boil it all down for me.

Jeff Denworth: Okay. If you take your QLC, and you can extend [its] lifespan to 10 years, and then you add almost no overhead for data protection, and you amplify the system capacity through data reduction – those four things get you to a price point that is not [just] on a TCO basis superior to what you’re buying today, but on a total cost of acquisition basis. That’s how we are introducing entirely new flash economics.

Typically, we don’t sell systems of less than a petabyte. And the reason for that is the flash that we’re using was never intended to be a 10-terabyte system designed for a database to overwrite its logs every 24 hours. We’re building a system that is intended to essentially blur the lines between a capacity tier and performance tier, such that we can amortize the endurance of your applications across all of your flash investment. So a petabyte is basically the barrier today.

HPCwire: Could you briefly describe your target market?

Jeff Denworth: If you look at our customer base, it’s largely concentrated from a number of different industries where you have very high concentrations of performance and data. Our number one segment is financial services. This is everything from hedge funds to market makers to banks. We have one public reference in the space that came in April, it’s a company called Squarepoint Capital. They rolled out of Barclays some years ago. The genomics space probably rivals financial services as our other big target segment. We’ve sold to all of the major ASC labs within the Department of Energy. We’ve also got customers in the intelligence community around the world, we’ve got customers doing manufacturing, things like electric cars and autonomous vehicles. We have several customers in the web scale space and we have AI as a horizontal practice that cuts across all the markets that we work in.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

What’s New in HPC Research: Galaxies, Fugaku, Electron Microscopes & More

January 25, 2021

In this regular feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

Red Hat’s Disruption of CentOS Unleashes Storm of Dissent

January 22, 2021

Five weeks after angering much of the CentOS Linux developer community by unveiling controversial changes to the no-cost CentOS operating system, Red Hat has unveiled alternatives for affected users that give them severa Read more…

By Todd R. Weiss

China Unveils First 7nm Chip: Big Island

January 22, 2021

Shanghai Tianshu Zhaoxin Semiconductor Co. is claiming China’s first 7-nanometer chip, described as a leading-edge, general-purpose cloud computing chip based on a proprietary GPU architecture. Dubbed “Big Island Read more…

By George Leopold

HiPEAC Keynote: In-Memory Computing Steps Closer to Practical Reality

January 21, 2021

Pursuit of in-memory computing has long been an active area with recent progress showing promise. Just how in-memory computing works, how close it is to practical application, and what are some of the key opportunities a Read more…

By John Russell

HiPEAC’s Vision for a New Cyber Era, a ‘Continuum of Computing’

January 21, 2021

Earlier this week (Jan. 19), HiPEAC — the European Network on High Performance and Embedded Architecture and Compilation — published the 8th edition of the HiPEAC Vision, detailing an increasingly interconnected computing landscape where complex tasks are carried out across multiple... Read more…

By Tiffany Trader

AWS Solution Channel

Fire Dynamics Simulation CFD workflow on AWS

Modeling fires is key for many industries, from the design of new buildings, defining evacuation procedures for trains, planes and ships, and even the spread of wildfires. Read more…

Supercomputers Assist Hunt for Mysterious Axion Particle

January 21, 2021

In the 1970s, scientists theorized the existence of axions: particles born in the hearts of stars that, when exposed to a magnetic field, become light particles, and which may even comprise dark matter. To date, however, Read more…

By Oliver Peckham

Red Hat’s Disruption of CentOS Unleashes Storm of Dissent

January 22, 2021

Five weeks after angering much of the CentOS Linux developer community by unveiling controversial changes to the no-cost CentOS operating system, Red Hat has un Read more…

By Todd R. Weiss

HiPEAC Keynote: In-Memory Computing Steps Closer to Practical Reality

January 21, 2021

Pursuit of in-memory computing has long been an active area with recent progress showing promise. Just how in-memory computing works, how close it is to practic Read more…

By John Russell

HiPEAC’s Vision for a New Cyber Era, a ‘Continuum of Computing’

January 21, 2021

Earlier this week (Jan. 19), HiPEAC — the European Network on High Performance and Embedded Architecture and Compilation — published the 8th edition of the HiPEAC Vision, detailing an increasingly interconnected computing landscape where complex tasks are carried out across multiple... Read more…

By Tiffany Trader

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

By Oliver Peckham

President-elect Biden Taps Eric Lander and Deep Team on Science Policy

January 19, 2021

Last Friday U.S. President-elect Joe Biden named The Broad Institute founding director and president Eric Lander as his science advisor and as director of the Office of Science and Technology Policy. Lander, 63, is a mathematician by training and distinguished life sciences... Read more…

By John Russell

Pat Gelsinger Returns to Intel as CEO

January 14, 2021

The Intel board of directors has appointed a new CEO. Intel alum Pat Gelsinger is leaving his post as CEO of VMware to rejoin the company that he parted ways with 11 years ago. Gelsinger will succeed Bob Swan, who will remain CEO until Feb. 15. Gelsinger previously spent 30 years... Read more…

By Tiffany Trader

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

By John Russell

Intel ‘Ice Lake’ Server Chips in Production, Set for Volume Ramp This Quarter

January 12, 2021

Intel Corp. used this week’s virtual CES 2021 event to reassert its dominance of the datacenter with the formal roll out of its next-generation server chip, the 10nm Xeon Scalable processor that targets AI and HPC workloads. The third-generation “Ice Lake” family... Read more…

By George Leopold

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

By John Russell

Esperanto Unveils ML Chip with Nearly 1,100 RISC-V Cores

December 8, 2020

At the RISC-V Summit today, Art Swift, CEO of Esperanto Technologies, announced a new, RISC-V based chip aimed at machine learning and containing nearly 1,100 low-power cores based on the open-source RISC-V architecture. Esperanto Technologies, headquartered in... Read more…

By Oliver Peckham

Azure Scaled to Record 86,400 Cores for Molecular Dynamics

November 20, 2020

A new record for HPC scaling on the public cloud has been achieved on Microsoft Azure. Led by Dr. Jer-Ming Chia, the cloud provider partnered with the Beckman I Read more…

By Oliver Peckham

NICS Unleashes ‘Kraken’ Supercomputer

April 4, 2008

A Cray XT4 supercomputer, dubbed Kraken, is scheduled to come online in mid-summer at the National Institute for Computational Sciences (NICS). The soon-to-be petascale system, and the resulting NICS organization, are the result of an NSF Track II award of $65 million to the University of Tennessee and its partners to provide next-generation supercomputing for the nation's science community. Read more…

Is the Nvidia A100 GPU Performance Worth a Hardware Upgrade?

October 16, 2020

Over the last decade, accelerators have seen an increasing rate of adoption in high-performance computing (HPC) platforms, and in the June 2020 Top500 list, eig Read more…

By Hartwig Anzt, Ahmad Abdelfattah and Jack Dongarra

Aurora’s Troubles Move Frontier into Pole Exascale Position

October 1, 2020

Intel’s 7nm node delay has raised questions about the status of the Aurora supercomputer that was scheduled to be stood up at Argonne National Laboratory next year. Aurora was in the running to be the United States’ first exascale supercomputer although it was on a contemporaneous timeline with... Read more…

By Tiffany Trader

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

By Tracey Bryant

Leading Solution Providers

Contributors

Top500: Fugaku Keeps Crown, Nvidia’s Selene Climbs to #5

November 16, 2020

With the publication of the 56th Top500 list today from SC20's virtual proceedings, Japan's Fugaku supercomputer – now fully deployed – notches another win, Read more…

By Tiffany Trader

Texas A&M Announces Flagship ‘Grace’ Supercomputer

November 9, 2020

Texas A&M University has announced its next flagship system: Grace. The new supercomputer, named for legendary programming pioneer Grace Hopper, is replacing the Ada system (itself named for mathematician Ada Lovelace) as the primary workhorse for Texas A&M’s High Performance Research Computing (HPRC). Read more…

By Oliver Peckham

At Oak Ridge, ‘End of Life’ Sometimes Isn’t

October 31, 2020

Sometimes, the old dog actually does go live on a farm. HPC systems are often cursed with short lifespans, as they are continually supplanted by the latest and Read more…

By Oliver Peckham

Gordon Bell Special Prize Goes to Massive SARS-CoV-2 Simulations

November 19, 2020

2020 has proven a harrowing year – but it has produced remarkable heroes. To that end, this year, the Association for Computing Machinery (ACM) introduced the Read more…

By Oliver Peckham

Nvidia and EuroHPC Team for Four Supercomputers, Including Massive ‘Leonardo’ System

October 15, 2020

The EuroHPC Joint Undertaking (JU) serves as Europe’s concerted supercomputing play, currently comprising 32 member states and billions of euros in funding. I Read more…

By Oliver Peckham

Intel Xe-HP GPU Deployed for Aurora Exascale Development

November 17, 2020

At SC20, Intel announced that it is making its Xe-HP high performance discrete GPUs available to early access developers. Notably, the new chips have been deplo Read more…

By Tiffany Trader

Nvidia-Arm Deal a Boon for RISC-V?

October 26, 2020

The $40 billion blockbuster acquisition deal that will bring chipmaker Arm into the Nvidia corporate family could provide a boost for the competing RISC-V architecture. As regulators in the U.S., China and the European Union begin scrutinizing the impact of the blockbuster deal on semiconductor industry competition and innovation, the deal has at the very least... Read more…

By George Leopold

HPE, AMD and EuroHPC Partner for Pre-Exascale LUMI Supercomputer

October 21, 2020

Not even a week after Nvidia announced that it would be providing hardware for the first four of the eight planned EuroHPC systems, HPE and AMD are announcing a Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This