Perspective — Samsung announced earlier in the year its 30.72TB drive, positioning it as an enterprise SSD, which, along with the huge capacity, has around four times the read and three times the write capabilities of its consumer SSD. But at a price point of between $10,000-$20,000, who would actually use them?
Clearly these drives are targeted at those organisations with pretty significant budgets, so how do you continue to take best advantage of the largest capacity drives on a budget?
Bigger is better in the storage industry – we always want more of it. Many organisations will choose larger drives and usually only use traditional hard disk drives because of cost implications. The alternative, SSDs, are both costly and have limited capacity.
Whilst it’s great having 30TB worth of capacity, something that is often ignored is how this amount of storage will impact performance. If a customer requests a certain amount of storage capacity, but also needs performance above a particular rate, you have to consider that most traditional hard drives peak out in performance at around 300MB/s. As you start putting bigger and bigger drives into a system, you are reducing the number of drives required to meet capacity. Inadvertently, this will decrease the performance you can get out of a system, triggering the need for more capacity than required, just to attain the required performance figure.
Often people fail to acknowledge that the larger the drive becomes, the more data is potentially at risk should the drive fail. This is the same whether SSD, tape or hard disk drives. With a failure, you could potentially lose all the data on that particular drive.
Traditional RAID (redundant array of independent disks) technologies haven’t really moved on since the 1980s when they were first developed. There are a lot of industries still using RAID 6, which allows for two disk failures within the RAID set before any data is lost. However, due to failure rates and rebuild times you are limited by the number of drives that are in that particular RAID group and you are also limited by their speed in trying to rebuild the missing drive and its data.
As the capacity of drives continues to grow at an exponential rate, it will take much longer to rebuild. It already takes days to rebuild drives on the capacity we already have, so with drives of around 30TB capacity, it could take over a week to reconstruct a failed drive. With such a long recovery time, this increases the failure risk of another drive in the RAID group.
These challenges started to be addressed a few years ago in HPC and the cloud, so rather than using traditional RAID, organisations are using de-clustered arrays which essentially places many more drives into the same pool and data is distributed more widely across more disks. This lessens the impact of a drive failure, affecting the loss of a proportion of the data rather than its entirety. It also allows part of the missing data to be re-built before complete drive failure and all drives to participate in the reconstruction on a single drive failure.
Another noticeable difference in how storage systems are being created and utilised is through the convergence of both compute and storage, with the availability of fast network interconnects it has become possible to populate individual compute nodes with large capacity drives and have them participate in the storage subsystem. This allows for practically linear scaling of storage and performance each time you scale your compute.
In traditional HPC, this newer approach hasn’t quite caught on yet, and there’s still the use of separate storage and compute system elements. Whereas when you are looking at cloud platforms, these are becoming more converged in the use of technologies, so the failure of an individual component becomes less of an issue. When running HPC on premise, the components are more important, particularly with storage, when using traditional RAID. By using 30TB drives and significantly increasing capacity, it will start driving the HPC market to look more at de-clustered arrays, to allow faster re-build times in the event of a drive failure.
We’ve seen this recently with IBM, Lenovo and NetApp all offering their own versions of de-clustered array products. This will be a more realistic option for organisations looking for larger capacity on a budget.