Spring Brings Bloom of New Offerings

By Michael Feldman

April 4, 2008

HPC vendors seem to have awakened from their winter slumber. A trio of notable products were released into the spring sunshine this week: the first QDR (40 Gbps) InfiniBand adapter from Mellanox; an on-demand HPC development platform from Interactive Supercomputing; and, from newcomer ScaleMP, a flash module that aggregates x86 servers into a virtual SMP.

QDR Adapters First, Switches to Follow

Mellanox introduced its new QDR InfiniBand host channel adapters on Tuesday. The dual port ConnectX IB 40Gbps HCAs demand PCI Express 2.0 (PCIe Gen2) to take advantage of the increased bandwidth that comes with QDR. Mellanox is claiming 6460 MB/s (bi-directional) MPI application bandwidth and sub-microsecond latencies. Higher level protocols would get a speed bump as well. Using IP over InfiniBand (IPoIB), Mellanox expects to deliver close to 4000 MB/s for socket-based communications. For those keeping score at home, the max rate for 10GbE is 1200 MB/s.

“We really see that it’s important to match the bandwidth to the processor and the memory subsystem from an I/O perspective, especially for high performance computing,” said Thad Omura, VP of product marketing at Mellanox.

That’s InfiniBand in a nutshell. And it helps explain why HPC users have been the most enthusiastic adopters of the technology. As communication performance external to the motherboard becomes more of a bottleneck, InfiniBand looks better and better.

Adapters that support both InfiniBand native connectors and QSFP are available now. Native InfiniBand connectors will run up to 5 meters over passive (copper) cables. The IBTA has recently approved the standard to run QDR InfiniBand over QSFP (Quad Small Form-factor Pluggable), a connector that provides better signaling and allows for either passive copper, active copper or active optical fiber. Mellanox is thinking that QSFP will increasingly become the interface of choice for 40 Gbps InfiniBand, especially as volume ramps up.

By making their HCAs available now, Mellanox is trying to get out in front of the QDR adoption cycle. Customers can start qualifying the adapters (using node-to-node configurations) with existing applications before the switches and active cabling solutions become available. Mellanox is hoping that getting the adapters out early will grease the wheels for the switches and cable gear. “It’s not a small feat to move the market to 40 Gbps,” Omura told me. “This is something we’re doing to accelerate the transition.”

Mellanox has dual motivations here, since it also produces the QDR InfiniBand switch silicon for vendors like Cisco and Voltaire. Mellanox is expecting high density QDR switches to show up in the second half of 2008. Active copper and optical cables that can support the 40 Gbps data rates are also expected to become generally available in the same timeframe. Vendors like of Gore, Zarlink, Intel, Luxtera and others have been talking up their high bandwidth cabling solutions for over a year now.

Mellanox appears to be on a roll right now. With yearly revenue growth of 73 percent ($48.5 million in 2006 and $84.1 million in 2007), the interconnect vendor is on track to ride an aggressive QDR ramp up. IDC recently upped its InfiniBand forecast numbers due to quicker than expected adoption of DDR gear in 2007. With the introduction of QDR InfiniBand on top of a rapidly maturing software stack, Mellanox expects the technology to get a quick ride to the top. The first big QDR InfiniBand systems should show up on the TOP500 list in November. By 2010, QDR should become the dominant InfiniBand speed, overtaking both SDR and DDR deployments.

Supercomputing in the Cloud

The promise of user-friendly HPC as-a-service got a bit closer this week with a new computing on-demand offering from Interactive Supercomputing (ISC). The company announced a version of their Star-P software platform that can be accessed via the Internet. Up until now, Star-P, which allows users to easily parallelize MATLAB and Python codes, required that you owned your own supercomputer or at least knew someone who did. With the Star-P On-Demand offering, you’re now able to log on to a remote cluster and run your supercomputerized codes on a pay-per-use basis.

The company is offering this service in conjunction with their new partner Tsunamic Technologies, a Florida-based firm that provides the underlying cluster computing service. The hardware setup is a 256-core system using 2.33 MHz Intel quad-core processors on dual-socket nodes, with 8 GB of memory per node. A user can request up to 168 cores for a single Star-P job.

The process is pretty straightforward. Users with desktop MATLAB and Python applications download the free Star-P client software. Through the client, the user runs MATLAB or Python code, instrumented with Star-P commands to parallelize the application. Behind the scenes the client uses a secure shell (ssh) to talk to the remote cluster and generates MPI code on the fly to distribute the application.

While IBM, Sun, HP and others are already offering on-demand supercomputing, users typically have to come with their own MPI codes to get any benefit. Sun’s Network.com has attracted a number of ISVs to host their specific applications on top of their utility computing network, but Interactive Supercomputing is offering the first general-purpose HPC service. The on-demand service is really an extension of what ISC has been trying to do all along — remove the barriers to supercomputing for domain experts who know little about HPC programming

The company’s existing customers — mostly government labs and universities, financial services firms and life sciences organizations — already own HPC infrastructure, so on-demand is less of a draw here. Financial applications, in particular, are not a good fit for this model because of the sensitivity of the data. (However algorithm development in this area could be done remotely.) Non-sensitive applications in pharmaceuticals, government, manufacturing and other industries would be good candidates for the on-demand model. It’s especially useful to small- or medium-sized organizations whose computing demands are in peaks and valleys and who have little incentive to maintain complex computing infrastructure. Up until now, these kinds of customers have had little access to supercomputing.

Potential Star-P on-demand users are able to kick the tires with a 20 hour trial account. After that, they can purchase the service for $2.77 per core-hour. If customers are going to use the service regularly, they can purchase monthly packages of core-hours in various sized bundles at discount pricing — as low as $1.35 per core hour. They can also buy hours in bulk over longer periods of time. ISC’s business arrangement with Tsunamic is set up so that the more Star-P hours are used, the more ISC is compensated on a percentage a basis.

According to David Gibson, ISC’s vice president of sales, at the $2.77 flat rate most of the fee is going to Tsunamic; as the user buys larger bundles of hours, the money gets split more evenly. The tricky part for ISC was finding the right business model and hooking up with a computing service partner who would be flexible enough to work with a smaller software vendor. “The utility computing hardware providers have been challenged because they’re having a hard time getting commercial software vendors who have legacy business models around license sales to get in this game,” noted Gibson.

SMPs on the Cheap

This week, newcomer ScaleMP Inc. announced a novel technology that is able to aggregate commodity x86 server motherboards into a high-end SMP (symmetric multiprocessor system) machine. Actually ScaleMP is not that new. The company has been around since 2003 and its technology has been incorporated into a number of obscure platforms offered by vendors such as Dell, SGI and others regional resellers. For the past five years, ScaleMP has been in extended stealth mode, while it has been busy promoting its solution to system integrators and OEMs.

The heart of the technology is the vSMP flash software module that plugs into commodity x86 server boards. The module acts as a BIOS extension that aggregates all processors and memory into a virtual shared-memory SMP system. In conventional SMP machines, proprietary chipsets and interconnects are used. That’s why you pay a premium for such systems. The upside is that they’re easier to program and manage because the global memory model is more comfortable for the operating system and application software. At the same time, legacy MPI codes have no trouble running in a shared memory environment. Commodity-based SMP machines would have the best of all worlds.

The tricky part in turning what are essentially clusters into SMPs is maintaining reasonable memory performance from the relatively slow InfiniBand connection used to connect the server boards. That’s where the vSMP software comes in. It maintains cache coherency between boards and uses local memory for caching. It’s based on a cache-only memory architecture (COMA). In COMA, a remote memory access causes the data to be moved to memory that’s local to that processor. This allows the system to get around the relatively slow communication between motherboards by using local DRAM as a very large cache. Unlike the NUMA model where the application follows the data, in COMA, the data follows the application. Typically COMA is accomplished in hardware; in the ScaleMP implementation, memory migration and management is performed by software on the vSMP modules.

That’s only part of the story. “The advantage of software is that we have 15 caching coherency mechanisms,” said Shai Fultheim, founder and CEO of ScaleMP. According to him, these mechanisms take advantage of memory access patterns at runtime, which results in more intelligent caching than could be realized with a static hardware-based approach.

The other advantage of the architecture is that it allows system builders to construct machines that decouple memory capacity from compute capacity by mixing motherboards. Some applications want relatively more compute power or more memory capacity than is possible in a cluster of homogeneous nodes. For example, a system to run electronic design automation (EDA) code would like lots of memory, but relatively less compute power. So one could hook together a lot of slow single-core, single-processor boards (but with fully populated memory) with one or two quad-core boards.

The ScaleMP technology comes in two flavors: the embedded solution for large-scale systems that can aggregate up to 128 cores and 1 terabyte of memory, and the standalone solution that hooks together two quad- or dual-core dual-socket boards to create a four-socket SMP for under $10,000. The embedded solution has been in production for a year and half and is currently available in systems sold by SGI and Dell. The standalone system was the one unveiled this week and is aimed at the entry-level HPC market. ScaleMP claims a 70 percent price/performance advantage compared to traditional four-socket systems, in addition to 25 percent power consumption savings and 50 percent rack-space savings.

It will be interesting to see if the other HPC system vendors pick up the technology. In particular, companies like HP and IBM, which offer high-end shared memory systems in their Itanium- and POWER-based product lines, respectively, might see commodity SMP as a threat. The fact that SGI is selling the vSMP-enabled f1200 system alongside their shared memory Itanium-based Altix machines is probably an indication that there’s some daylight between that the two technologies — or at least SGI thinks so.

There are just a handful of machines with vMP technology deployed in the field. If you have one, I’d love to hear from you.


As always, comments about HPCwire are welcomed and encouraged. Write to me, Michael Feldman, at editor@hpcwire.com.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Researchers Scale COSMO Climate Code to 4888 GPUs on Piz Daint

October 17, 2017

Effective global climate simulation, sorely needed to anticipate and cope with global warming, has long been computationally challenging. Two of the major obstacles are the needed resolution and prolonged time to compute Read more…

By John Russell

Student Cluster Competition Coverage New Home

October 16, 2017

Hello computer sports fans! This is the first of many (many!) articles covering the world-wide phenomenon of Student Cluster Competitions. Finally, the Student Cluster Competition coverage has come to its natural home: H Read more…

By Dan Olds

UCSD Web-based Tool Tracking CA Wildfires Generates 1.5M Views

October 16, 2017

Tracking the wildfires raging in northern CA is an unpleasant but necessary part of guiding efforts to fight the fires and safely evacuate affected residents. One such tool – Firemap – is a web-based tool developed b Read more…

By John Russell

HPE Extreme Performance Solutions

Transforming Genomic Analytics with HPC-Accelerated Insights

Advancements in the field of genomics are revolutionizing our understanding of human biology, rapidly accelerating the discovery and treatment of genetic diseases, and dramatically improving human health. Read more…

Exascale Imperative: New Movie from HPE Makes a Compelling Case

October 13, 2017

Why is pursuing exascale computing so important? In a new video – Hewlett Packard Enterprise: Eighteen Zeros – four HPE executives, a prominent national lab HPC researcher, and HPCwire managing editor Tiffany Trader Read more…

By John Russell

Student Cluster Competition Coverage New Home

October 16, 2017

Hello computer sports fans! This is the first of many (many!) articles covering the world-wide phenomenon of Student Cluster Competitions. Finally, the Student Read more…

By Dan Olds

Intel Delivers 17-Qubit Quantum Chip to European Research Partner

October 10, 2017

On Tuesday, Intel delivered a 17-qubit superconducting test chip to research partner QuTech, the quantum research institute of Delft University of Technology (TU Delft) in the Netherlands. The announcement marks a major milestone in the 10-year, $50-million collaborative relationship with TU Delft and TNO, the Dutch Organization for Applied Research, to accelerate advancements in quantum computing. Read more…

By Tiffany Trader

Fujitsu Tapped to Build 37-Petaflops ABCI System for AIST

October 10, 2017

Fujitsu announced today it will build the long-planned AI Bridging Cloud Infrastructure (ABCI) which is set to become the fastest supercomputer system in Japan Read more…

By John Russell

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Intel Debuts Programmable Acceleration Card

October 5, 2017

With a view toward supporting complex, data-intensive applications, such as AI inference, video streaming analytics, database acceleration and genomics, Intel i Read more…

By Doug Black

OLCF’s 200 Petaflops Summit Machine Still Slated for 2018 Start-up

October 3, 2017

The Department of Energy’s planned 200 petaflops Summit computer, which is currently being installed at Oak Ridge Leadership Computing Facility, is on track t Read more…

By John Russell

US Exascale Program – Some Additional Clarity

September 28, 2017

The last time we left the Department of Energy’s exascale computing program in July, things were looking very positive. Both the U.S. House and Senate had pas Read more…

By Alex R. Larzelere

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

NERSC Scales Scientific Deep Learning to 15 Petaflops

August 28, 2017

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according Read more…

By Rob Farber

Oracle Layoffs Reportedly Hit SPARC and Solaris Hard

September 7, 2017

Oracle’s latest layoffs have many wondering if this is the end of the line for the SPARC processor and Solaris OS development. As reported by multiple sources Read more…

By John Russell

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last w Read more…

By John Russell

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Graphcore Readies Launch of 16nm Colossus-IPU Chip

July 20, 2017

A second $30 million funding round for U.K. AI chip developer Graphcore sets up the company to go to market with its “intelligent processing unit” (IPU) in Read more…

By Tiffany Trader

Leading Solution Providers

Amazon Debuts New AMD-based GPU Instances for Graphics Acceleration

September 12, 2017

Last week Amazon Web Services (AWS) streaming service, AppStream 2.0, introduced a new GPU instance called Graphics Design intended to accelerate graphics. The Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

EU Funds 20 Million Euro ARM+FPGA Exascale Project

September 7, 2017

At the Barcelona Supercomputer Centre on Wednesday (Sept. 6), 16 partners gathered to launch the EuroEXA project, which invests €20 million over three-and-a-half years into exascale-focused research and development. Led by the Horizon 2020 program, EuroEXA picks up the banner of a triad of partner projects — ExaNeSt, EcoScale and ExaNoDe — building on their work... Read more…

By Tiffany Trader

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Cray Moves to Acquire the Seagate ClusterStor Line

July 28, 2017

This week Cray announced that it is picking up Seagate's ClusterStor HPC storage array business for an undisclosed sum. "In short we're effectively transitioning the bulk of the ClusterStor product line to Cray," said CEO Peter Ungaro. Read more…

By Tiffany Trader

Intel Launches Software Tools to Ease FPGA Programming

September 5, 2017

Field Programmable Gate Arrays (FPGAs) have a reputation for being difficult to program, requiring expertise in specialty languages, like Verilog or VHDL. Easin Read more…

By Tiffany Trader

IBM Advances Web-based Quantum Programming

September 5, 2017

IBM Research is pairing its Jupyter-based Data Science Experience notebook environment with its cloud-based quantum computer, IBM Q, in hopes of encouraging a new class of entrepreneurial user to solve intractable problems that even exceed the capabilities of the best AI systems. Read more…

By Alex Woodie

Intel, NERSC and University Partners Launch New Big Data Center

August 17, 2017

A collaboration between the Department of Energy’s National Energy Research Scientific Computing Center (NERSC), Intel and five Intel Parallel Computing Cente Read more…

By Linda Barney

  • arrow
  • Click Here for More Headlines
  • arrow
Share This