Startup Launches Highly Parallel Storage System

By Michael Feldman

March 28, 2008

Atrato, Inc., a new storage vendor, emerged from stealth mode this week to unveil the company’s first product: the Velocity1000 (V1000) storage system. The new product offers 11,000 Input/Output operations per second (IOPS) and 50 terabytes of raw disk capacity in a 3U rack. The company relaunched itself back in February, when it changed it name from Sherwood Information Partners to Atrato and announced $18 million in funding. It also revealed some of its big name backers, including Jesse Aweida, founder, former president and CEO of StorageTek; Tom Porter, formerly CTO of Seagate; Gary Gentry, SVP Maxtor, Seagate; and Dick Blaschke, an IBM and EMC veteran.

The V1000 product is a unique storage appliance aimed at the high performance computing, digital entertainment and web sectors, where I/O performance and cost are big driving factors. The offering is designed to address a growing problem in high-end storage: the imbalance between storage capacity and I/O performance. Capacity is doubling every 18-24 months. But access speeds — data transfer rate, seek time and rotational latency — are only increasing around five percent per year. With capacity increasing exponentially and access speeds increasing linearly, application responsiveness is suffering. This is especially true for applications that have a random data access pattern.

Atrato’s goal was to create a high performance, highly dense, but energy-efficient storage system. It uses its patented Self-maintaining Array of Identical Disks (SAID) technology to construct a highly dense, sealed enclosure that is guaranteed to be maintenance-free for at least three years (more about how this is done in a just a moment). The company also claims it can achieve all this with much lower energy use than conventional storage. “At a given performance level, we use 80 percent less power than commercially available systems today, whether it’s NetApp, LSI or DataDirect Networks,” says Dan McCormick, Atrato co-founder and CEO. The company says a V1000 setup can deliver 17.3 IOPS/Watt versus a typical industry figure of 4 IOPS/Watt.

The majority of the power savings come from the building blocks of the disk enclosure. Instead of using 3.5 inch enterprise-class SATA or SAS disks, Atrato engineers decided to use lots of mobile-class 2.5 inch SATA disks in their drive enclosure. Mobile SATA disks are built for power constrained platforms like laptop computers, but tend to be lower capacity — 100 GB to 320 GB. The smaller size of the disks compared to their enterprise counterparts actually contributes to their energy efficiency, since the moving parts don’t have to travel as fast or as far.

The overall approach is to use mass parallelization of these relatively small disk drives to construct a more efficient system. This is analogous to the manycore approach for processors, where lots of simpler, less powerful cores are used to build a high performance computer. In the case of the V1000, the more granular storage model improves random access performance and energy efficiency at the same time. Efficiency is also increased by the system software, which manages data placement on the disks in order to optimize the seek operations.

The Atrato engineers had to overcome a number of drawbacks of mobile-class drives to make the system reliable. In general, these devices exhibit rotational vibration instabilities. When they get too close to each other, drive performance can drop by 70 percent or more. Heat and signal integrity can also become a problem when they’re packed closely together. The engineers were able to design proprietary drive packaging that circumvents these problems. According to McCormick, with this special packaging, they’ve been able to derive enterprise-class performance from mobile-class hardware.

The company guarantees three years of maintenance-free operation for their enclosure, with no disk replacements required. By contrast, in a conventional enterprise setup, when a drive fails, a light blinks on the front panel and then a call is made for somebody to come out and perform the drive replacement (hopefully the worker pulls the right one and doesn’t bring the system down in the process). In most cases, when the offending drive is sent back to the factory, no problem is detected. “We take that same process and move it inside the box,” says McCormick.

Within the SAID enclosure is a virtual spare — extra capacity that is ready in case of a drive failure. In fact, at any given time, the system is replicating data from 15 to 20 of the most suspect drives. So when a failure occurs, the drive is taken off-line and put in the “drive hospital.” Diagnostics are used to determine what’s wrong. In many cases, the error can be isolated and the drive can be put back online. During that time, the user is unaware anything has happened, since there has been no interruption of service or performance hit.

By anticipating drive failure, McCormick says they’ve been able to eliminate any single point of failure. Even if the hardware is beyond redemption, the drive just remains off-line for the life of the product and the software works around it. The system employs a variety of RAID technologies (RAID 5, 6, 10 and 50) as well as its predictive rebuild technology to support this level of reliability. According to testing done by Atrato engineers, they’ve been able to empirically model a three to five year time frame for sustaining the product’s performance and reliability.

As in any redundant storage system, a certain amount of capacity has to be sacrificed. The system allows users to configure trade off capacity with some level of reliability. At the high end, McCormick says as much as 80 to 90 percent of the raw storage capacity is available to the user. More conservative users can drive the usable capacity down to 50 percent or even lower if they choose to maintain the highest levels of reliability. For many customers, this would be a reasonable tradeoff, since storage capacity is cheap and getting cheaper, while drive maintenance costs are exactly the opposite.

Atrato’s initial customers are likely to be users that have strict performance and power requirements and/or require maintenance-free operation. The first customer announced this week is SRC Computers. They have integrated the V1000 into a system for a government sector customer who needed high levels of random access performance. That SRC system achieves 20,000 IOPS with 14 terabytes of usable capacity.

Atrato is not announcing its pricing at this point, but McCormick says that a 20 terabyte (raw capacity) system starts somewhere in the $150K range. There are certainly less expensive storage systems out there on a price/gigabyte basis (based on high-capacity 3.5 inch SATA), but on a dollar/IOPS basis, the highly parallelized Atrato architecture gives the V1000 the edge.

In fact, the company isn’t going head to head against mainstream enterprise storage solutions. Systems with really big storage tend to be used by applications that don’t need extreme levels of I/O performance or are accessing data sequentially on the disk. Atrato’s niche is where near-instantaneous data transactions are required. McCormick sees his competition as the emerging technologies of flash disk and RAM-based external storage. At this point, he thinks those technologies are not quite ready for prime time because of a combination of price, performance and reliability issues. But, he says, when solid state drives make sense, they’ll be happy to bring them into their product line.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire