June 6, 2013

OpenStack and the SDSC Research Cloud

Ian Armas Foster

The San Diego Supercomputer Center launched a public cloud system for universities in the area designed specifically to run on commodity hardware with high performance solid-state drives. The center, which currently holds 5.5 PB of raw storage, is open to educational and research users in the University of California. But perhaps the most notable feature of the system is that it is built off OpenStack Swift object-based storage, important for the research facilities SDSC plans to serve.

Object-based storage is not a new thing for cloud providers; indeed it exists as the foundation for stores built by Amazon, Google, and Microsoft. However, the fact that an open source initiative like OpenStack will form the backbone of this cloud-based research network offers cost and customization advantages. Part of that customization includes supporting both Rackspace/Swift API and the standard Amazon Simple Storage Service (S3) toolsets.

“My group is a group of three,” said SDSC Storage Platform Manager Steve Meier on the advantages of an open source platform permeating the research network. “We manage the infrastructure. We do some development and keep things going, but we didn’t have a large team to build and support clients and run the infrastructure as well. If you have users that are currently using S3, and they have scripts or command-line clients or other ways to manipulate their data upload, download, search, theoretically they could now point that tool at SDSC’s cloud storage and it would just work.”

The system was actually born out of SDSC’s movement away from tape-based storage, a system used by many for long-term data. However, the tape method, whose data is slow and expensive to recover, does not jive well with the realization by research institutions that all data could be useful data.

“The best use case for tape is ‘write once, read never,’” Meier said. “Our researchers archive and look at data more often,” Meier said. “When you have lots of accesses coming from reading back, and then [you have to] keep up with all the writes, there are additional costs to have enough hardware resources to also validate the tapes. All of those considerations made tape an expensive technology for us…With object storage, you can use relatively cheap hardware. You can spread your investment out.”

The inexpensive hardware to which Meier referred includes 14 Aberdeen x539 storage servers each equipped with 24 Hitachi 2 TB near-line SAS drives.

The promise is that SDSC’s cloud-based research network will be relatively inexpensive, coming in at a set rate of $0.0325 per month for a GB of storage. That translates to $32.50 per TB per month or $390 per TB per year.

Meier’s hope was not to compete with major cloud providers like Amazon and Google but rather to build an inexpensive option for researchers. “We never intended to directly compete [with major cloud providers]. As a non-profit, that’s not our charter,” Meier said. “Our competition was to try to come up with technology that gave our researchers competitive advantages to get grants and have technologies that they could use to help further their research.”

