UTK NICS
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud

SDSC Announces HPC Storage Cloud


Web-based system offers high durability, security, and speed for diverse user base

SAN DIEGO, September 22 -- The San Diego Supercomputer Center (SDSC) at the University of California, San Diego, today announced the launch of what is believed to be the largest academic-based cloud storage system in the U.S., specifically designed for researchers, students, academics, and industry users who require stable, secure, and cost-effective storage and sharing of digital information, including extremely large data sets.

“We believe that the SDSC Cloud may well revolutionize how data is preserved and shared among researchers, especially massive datasets that are becoming more prevalent in this new era of data-intensive research and computing,” said Michael Norman, director of SDSC. “The SDSC Cloud goes a long way toward meeting federal data sharing requirements, since every data object has a unique URL and could be accessed over the Web.”

SDSC’s new Web-based system is 100% disk-based and interconnected by high-speed 10 gigabit Ethernet switching technology, providing extremely fast read and write performance.  With an initial raw capacity of 5.5 petabytes – one petabyte equals one quadrillion bytes of storage capacity, or the equivalent about 250 billion pages of text – the SDSC Cloud has sustained read rates of 8 to 10 gigabytes (GB) per second that will continually improve as more nodes and storage are added. That’s akin to reading all the contents of a 250GB laptop drive in less than 30 seconds.

Moreover, the SDSC Cloud is scalable by orders of magnitude to hundreds of petabytes, with aggregate performance and capacity both scaling almost linearly with growth. Full details about the new SDSC Cloud can be found at http://cloud.sdsc.edu.

Conceived in planning for UC San Diego’s campus Research Cyberinfrastructure (RCI) project, the initiative quickly grew in scope and partners as many saw the technology as functionally revolutionary and cost effective for their needs.  At launch, users and research partners include, among others, UC San Diego’s Libraries, School of Medicine, Rady School of Management, Jacobs School of Engineering, and SDSC researchers, as well as federally-funded research projects from the National Science Foundation, National Institutes for Health, and Centers for Medicare and Medicaid Services.

“The SDSC Cloud marks a paradigm shift in how we think about long-term storage,” said Richard Moore, SDSC’s deputy director. “We are shifting from the ‘write once and read never’ model of archival data, to one that says ‘if you think your data is important, then it should be readily accessible and shared with the broader community.’”

“UC San Diego is one of the most data-centric universities in the country, so our goal was to develop a centralized, scalable data storage system designed to meet performance, functionality, and capacity needs of our researchers and partners across the country, and to evolve and scale with the needs of the scientific community,” said Dallas Thornton, SDSC’s division director of cyberinfrastructure services. “Developing this resource in-house atop the OpenStack platform allows for highly-capable and flexible, yet extremely cost-effective solutions for our researchers.”

OpenStack is a scalable, open-sourced cloud operating system jointly launched in July 2010 by NASA and Rackspace Hosting, which today powers some of the largest public and private cloud computing services using this scalable and proven software.

Durability and Security

Data stored in SDSC’s new cloud is instantly written to multiple independent storage servers, and stored data is validated for consistency on a round-the-clock basis. “This leads to very high levels of data durability, availability, and performance, all of which are of paramount importance to researchers and research organizations,” said Ron Joyce, SDSC’s associate director of IT infrastructure and a key architect of the system.

The SDSC Cloud leverages the infrastructure designed for a high-performance parallel file system by using two Arista Networks 7508 switches, providing 768 total 10 gigabit (Gb) Ethernet ports for more than 10Tbit/s of non-blocking, IP-based connectivity.  The switches are configured using multi-chassis link aggregation (MLAG) for both performance and failover.

“This network configuration allows us to unshackle extreme-scale/extreme-performance storage from individual clusters and instead make data available at unprecedented speeds across our university campus and beyond,” said Philip Papadopoulos, SDSC’s division director of UC systems. “In addition to incredibly fast data transmission speeds, our goal was to build a high-performance storage system right from the start that was completely scalable to meet the evolving needs and requirements of the campus, as well those within industry and government.”

The environment also provides high-bandwidth wide-area network connectivity to users and partners thanks to multiple 10Gb connections to CENIC (Corporation for Education Network Initiatives in California), ESNet (Energy Sciences Network), and XSEDE (Extreme Science and Engineering Discovery Environment). This allows huge amounts of data, such as sky surveys or mapping of the human genome, to be rapidly transported simultaneously to/from the SDSC Cloud.

In addition to large storage capacity and high-speed transmissions, the SDSC Cloud provides:

  • Cost advantages: Standard “on-demand” storage costs start at only $3.25 a month per 100GB of storage, and there are no I/O networking charges. A “condo” option, which allows users to make cost-effective long term investment in hardware that becomes part of the SDSC Cloud, is also available. Users will soon have the option to have additional copies of their data stored offsite at UC Berkeley, one of SDSC’s partners in the project.
  • Anywhere, anytime accessibility and wide compatibility: Every data file is given a persistent URL, making the system ideal for data sharing such as library or institutional collections. Access permissions can be set by the data owner, allowing a full spectrum of options from private to open access. The HTTP-based SDSC Cloud supports the RackSpace Swift and Amazon S3 APIs and is accessible from any web browser, clients for Windows, OSX, UNIX, and mobile devices. Users can also write applications that directly interact with the SDSC Cloud.
  • Enhanced security: Users set their own access/privacy levels. Users know and can coordinate precisely where their data is stored in the cloud, including replicated copies. In addition, a HIPAA and FISMA compliant storage option launches on October 1st in partnership with the Integrating Data for Analysis, Anonymization and SHaring (iDASH) program at UC San Diego, a National Center for Biomedical Computing (NCBC) project funded in 2010 under the NIH Roadmap for Bioinformatics and Computational Biology.

Working in Tandem with Other SDSC Storage Systems

The SDSC Cloud is configured to work in tandem with other innovative storage technologies at the supercomputer center. One is the Data Oasis system, a Lustre-based parallel file system designed primarily for high-performance, low-latency scratch and medium-term project storage, ideal for researchers conducting data-intensive operations on SDSC’s Triton, Trestles, and Dash high-performance computing (HPC) systems.

 

SDSC’s Data Oasis is currently capable of speeds of 50GB/s, meaning that researchers can today retrieve a terabyte of data – or one trillion bytes – in less than 20 seconds. By early 2012,  Data Oasis will be expanded to serve SDSC’s Gordon, the first supercomputer within the HPC community focused on integrating large amounts of flash-based SSD (solid state drive) memory. As Gordon enters production in January 2012, SDSC will double the speed of Data Oasis to 100GB/s, making it one of the fastest parallel file systems in the academic research community. While Data Oasis is used for in-process HPC storage, the SDSC Cloud is designed to accommodate any storage needs either prior to or afterward, delivering durable, secure storage that can be shared within SDSC or across the country with ease.

About SDSC

As an Organized Research Unit of UC San Diego, SDSC is considered a leader in data-intensive computing and cyberinfrastructure, providing resources, services, and expertise to the national research community including industry and academia. Cyberinfrastructure refers to an accessible and integrated network of computer-based resources and expertise, focused on accelerating scientific inquiry and discovery. SDSC supports hundreds of multidisciplinary programs spanning a wide variety of domains, from earth sciences to biology to astrophysics to bioinformatics and health IT. With its two newest supercomputer systems, Trestles and the soon-to-be-launched Gordon, SDSC is a partner in XSEDE (Extreme Science and Engineering Discovery Environment), the most advanced collection of integrated digital resources and services in the world.

Related Links

SDSC: http://www.sdsc.edu/

SDSC Cloud: http://cloud.sdsc.edu

UC San Diego: http://www.ucsd.edu/

Arista Networks: http://www.aristanetworks.com/

OpenStack: http://www.openstack.org/

-----

Source: SDSC

HPCwire on Twitter

Discussion

There are 0 discussion items posted.

Join the Discussion

Join the Discussion

Become a Registered User Today!


Registered Users Log in join the Discussion

May 22, 2012

May 21, 2012

May 18, 2012

May 17, 2012

May 16, 2012

May 15, 2012

May 14, 2012

May 11, 2012

May 10, 2012

May 09, 2012


Most Read Features

Most Read Around the Web

Most Read This Just In

Acer

Feature Articles

OpenACC Starts to Gather Developer Mindshare

PGI, Cray, and CAPS enterprise are moving quickly to get their new OpenACC-supported compilers into the hands of GPGPU developers. At NVIDIA's GPU Technology Conference this week, there was plenty of discussion around the new HPC accelerator framework, and all three OpenACC compiler makers, as well as NVIDIA, were talking up the technology.
Read more...

NVIDIA Launches Kepler Into HPC

NVIDIA has introduced its first Kepler-generation GPU product for high performance computing, and revealed some of the inner working of the new architecture. The announcement took place at the kickoff of the company's GPU Technology Conference taking place this week in San Jose, California.
Read more...

Intel Rolls Out New Server CPUs

Intel Corp. has launched three new families of Xeon processors, joining the Xeon E5-2600 series the chipmaker introduced in March. These latest chips span the entire market for the Xeon line, from four- and two-socket servers, down to entry-level workstations and microservers. A number of HPC server makers, including SGI, Dell, and Appro announced updated hardware based on the new silicon.
Read more...

Around the Web

NVIDIA’s Bill Dally Talks 3D Chips and More at GTC

May 16, 2012 | Chief scientist discusses memory stacks, interconnects, and US technology leadership.
Read more...

NVIDIA Unveils Virtualized GPU with Kepler-Based Board

May 15, 2012 | GPU maker conjures up visualization technology for virtual desktops.
Read more...

Zettaflops Will Happen Says HPC Analyst

May 14, 2012 | Pessimistic predictions about technology have a poor track record, according to 451's John Barr.
Read more...

Next-Gen Memory on the Horizon

May 10, 2012 | DRAM manufacturers gear up for DDR4.
Read more...

US Energy Secretary Talks Supercomputing

May 09, 2012 | Steven Chu discusses the role of supercomputing in energy research.
Read more...

Sponsored Whitepapers

Sponsored Multimedia

ISC Think Tank 2012

Newsletters

PGI


HPC Job Bank


Featured Events







HPC Wire Events