Visit additional Tabor Communication Publications
August 12, 2009
Linux cluster maker Penguin Computing hopped on the HPC-in-a-cloud bandwagon this week with the announcement of its HPC on-demand service. Called Penguin On Demand (POD), the service consists of an HPC compute infrastructure whose capacity can be rented on a pay-as-you-go basis or through a monthly subscription.
As it exists today, the POD infrastructure consists of 1200 Xeon cores spread over a number of clusters at a single facility. Penguin offers a choice of GigE or DDR InfiniBand interconnects and the option to tap into NVIDIA Tesla GPU computing hardware. By cloud standards the number of cores is tiny. But since Penguin also sells systems for a living, it would be relatively easy for them to scale up the infrastructure rather quickly if customer demand warranted additional capacity.
According to Penguin, the on-demand facility has sufficient bandwidth to allow the transfer of reasonably large data files directly to POD over the Internet. The company also offer a "disk caddy" service that allows the transfer of 1 TB+ files overnight. The disks are provided as part of the service and are actually owned by the customer and are returned to them once the data has been transferred to POD storage.
The software stack consists of CentOS, a community-supported OS based on Red Hat Enterprise Linux, as well as the company's Scyld ClusterWare cluster management software. "Scyld enables us to rapidly provision a set of compute nodes for our customers based on their demand -- so we can scale up and scale down efficiently," says Penguin Computing CEO Charles Wuischpard.
Penguin is aiming the POD at a variety of HPC verticals. According to Wuischpard, the initial interest came from the life sciences sector, but they have recently seen interest from a number of Fortune 500 manufacturing companies and some smaller hedge funds firms.
Users with in-house Penguin systems can get access to the POD service via the Scyld software suite. Since Scyld ClusterWare includes TORQUE and offers a scheduling package called TaskMaster, policies in the scheduling software can be set such that when a particular threshold is reached, jobs submitted on the local resource are automatically redirected to the POD system.
Unlike generic cloud computing set-ups like Amazon's EC2, user applications run directly on the compute nodes without virtualization in order to maximize performance. "POD is geared strictly towards applications that thrive in an HPC environment and would otherwise be starved for performance on a virtualized cloud computing environment," explains Wuischpard.
In that sense, it's not really a cloud in the classic sense (if there is such a thing), but rather a dedicated infrastructure built for on-demand HPC. In fact, the model used by Penguin is the same as most HPC on-demand offerings, such as IBM's Computing On Demand service and R Systems' dedicated hosting service. Thus far, a virtualized purpose-built HPC cloud with elastic capacity has yet to appear.
At the hardware level, the biggest criticism of general-purpose clouds is that they lack low latency interconnects so important to tightly-coupled MPI applications. As pointed at recently by Ian Foster, for short running HPC applications this may not be much of an issue. But for codes expected to execute for hours, days, or even longer, fast server-to-server communication is all but mandatory. Since at least some of the POD hardware includes InfiniBand-equipped servers, the service offers this natural advantage.
Setting up a POD account requires some initial hand-holding with Penguin technical staff. They will help set up the compute environment, explain the account management features, and answer any questions. After that, the POD service can be accessed via SSH to run user applications directly. If a customer requires more assistance, Penguin techies are available (via their Customer Portal) to help with issues that might come up or to help users squeeze more performance from user codes.
According to Penguin, their on-demand service is priced to provide a significant improvement in price-performance for HPC applications when compared to running on traditional cloud computing offerings. (The implication is that you will pay more per CPU-hour than for, say, EC2, but better performance will more than offset the price premium.) "Users pay only for the core hours that they use," says Wuischpard. "Monthly contracts are available, which provide for a reduction in the average cost per core hour. And yes, we do have the concept of 'roll-over' hours!"
At this point, Penguin is not offering SLAs or QoS guarantees in the general offering. But, according to Wuischpard, these could be implemented if a customer has such a requirement. He says they do guarantee that if a job fails because of a POD hardware failure, then it can be rerun at no cost.
From a business point of view, the OEM-as-cloud-provider will be an interesting model to follow. If margins continue to shrink on commodity-based clusters, selling compute on-demand services may offer a natural way to tap into new revenue streams. As pointed out by many cloud gazers, the largest compute utility today is essentially being run out of the back of a bookstore. Renting CPU cycles from a system vendor would seem at least as reasonable.
Jun 18, 2013 |
The world's largest supercomputers, like Tianhe-2, are great at traditional, compute-intensive HPC workloads, such as simulating atomic decay or modeling tornados. But data-intensive applications--such as mining big data sets for connections--is a different sort of workload, and runs best on a different sort of computer.
Jun 18, 2013 |
Researchers are finding innovative uses for Gordon, the 285 teraflop supercomputer housed at the San Diego Supercomputer Center (SDSC) that has a unique Flash-based storage system. Since going online, researchers have put the incredibly fast I/O to use on a wide variety of workloads, ranging from chemistry to political science.
Jun 17, 2013 |
The advent of low-power mobile processors and cloud delivery models is changing the economics of computing. But just as an economy car is good at different things than a full size truck, an HPC workload still has certain computing demands that neither the fastest smartphone nor the most elastic cloud cluster can fulfill.
Jun 14, 2013 |
For all the progress we've made in IT over the last 50 years, there's one area of life that has steadfastly eluded the grasp of computers: understanding human language. Now, researchers at the Texas Advanced Computing Center (TACC) are utilizing a Hadoop cluster on its Longhorn supercomputer to move the state of the art of language processing a little bit further.
Jun 13, 2013 |
Titan, the Cray XK7 at the Oak Ridge National Lab that debuted last fall as the fastest supercomputer in the world with 17.59 petaflops of sustained computing power, will rely on its previous LINPACK test for the upcoming edition of the Top 500 list.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?
Join our webinar to learn how IT managers can migrate to a more resilient, flexible and scalable solution that grows with the data center. Mellanox VMS is future-proof, efficient and brings significant CAPEX and OPEX savings. The VMS is available today.