NSF Project Sets Up First Machine Learning Cyberinfrastructure – CHASE-CI

By John Russell

July 25, 2017

Earlier this month, the National Science Foundation issued a $1 million grant to Larry Smarr, director of Calit2, and a group of his colleagues to create a community infrastructure in support of machine learning research. The ambitious plan – Cognitive Hardware and Software Ecosystem, Community Infrastructure (CHASE-CI) – is intended to leverage the high-speed Pacific Research Platform (PRP) and put fast GPU appliances into the hands of researchers to tackle machine learning hardware, software, and architecture issues.

Given the abrupt rise of machine learning and its distinct needs versus traditional FLOPS-dominated HPC, the CHASE-CI effort seems a natural next step in learning how to harness PRP’s high bandwidth for use with big data projects and machine learning. Perhaps not coincidentally Smarr is also principal investigator for PRP. As described in the NSF abstract, CHASE-CI “will build a cloud of hundreds of affordable Graphics Processing Units (GPUs), networked together with a variety of neural network machines to facilitate development of next generation cognitive computing.”

Those are big goals. Last week, Smarr and co-PI Thomas DeFanti spoke with HPCwire about the CHASE-CI project. It has many facets. Hardware, including von Neumann (vN) and non von Neumann (NvN) architectures, software frameworks (e.g., Caffe and TensorFlow), six specific algorithm families (details near the end of the article), and cost containment are all key target areas. In building out PRP, the effort leveraged existing optical networks such as GLIF by building termination devices based on PCs and providing them to research scientists. The new device — dubbed FIONA (Flexible I/O Network Appliances) – was developed by PRP co-PI Philip Papadopoulos and is critical to the new CHASE-CI effort. A little background on PRP may be helpful.

Larry Smarr, director, Calit2

As explained by Smarr, the basic PRP idea was to experiment with a cyberinfrastructure that was appropriate for a broad set of applications using big data that aren’t appropriate for the commodity internet because of the size of the of the datasets. To handle the high speed bandwidth, you need a big bucket at the end of the fiber notes Smarr. FIONAs filled the bill; the devices are stuffed with high performance, high capacity SSDs and high speed NICs but based on the humble and less expensive PC.

“They could take the high data rate without TCP backing up and thereby lowering the overall bandwidth, which traditionally has been a problem if you try to go directly to spinning disk,” says Smarr. Currently, there are on the order of 40 or 50 of these FIONAs deployed across the West Coast. Although 100 gigabit throughput is possible via the fiber, most researchers are getting 10 gigabit, still a big improvement.

DOE tests the PRP performance regularly using the visualization tool MaDDash (Monitoring and Debugging Dashboard). “There are test transfers of 10 gigabytes of data, four times a day, among 25 organizations, so that’s roughly about 300 transfers four times a day. The reason why we picked that number, 10 gigabytes, was because that’s the amount of data you need to get TCP up to full speed,” says Smarr.

Thomas DeFanti, co-PI, CHASE-CI

Networks are currently testing out at 5, 6, 7, 8 and 9 gigabits per second, which is nearly full utilization. “Some of them really nail it at 9.9 gigabits per second. If you go to 40 gigabit networks that we have, we are getting 13 and 14 gigabits per second and that’s because of the [constrained] software we are using. If we go to a different software, which is not what scientists routinely use [except] the high energy physics people, then we can get 30 or 40 or 100 gigabits per second – that’s where we max out with the PC architecture and the disk drives on those high end units,” explains DeFanti.

The PRP has proven to be very successful, say Smarr and DeFanti. PRP v1, basically the network of FIONAs, is complete. PRP v2 is in the works. The latter is intended to investigate advanced software concepts such as software defined networking and security and not intended to replace PRP v1. Now, Smarr wants to soup up FIONAs with FPGAs, hook them into the PRP, and tackle machine learning. And certainly hardware is just a portion of the machine learning challenge being addressed.

Data showing increase in PRP performance over time.

Like PRP before it, CHASE-CI is a response to an NSF call for community computer science infrastructure. Unlike PRP, which is focused on applications (e.g. geoscience, bioscience) and whose architecture was largely defined by guidance from domain scientists, CHASE-CI is being driven by needs of computer scientists trying to support big data and machine learning.

The full principal investigator team is an experienced and accomplished group including: Smarr, (Principal Investigator), Calit2; Tajana Rosing (Co-Principal Investigator), Professor, CSE Department, UCSD; Ilkay Altintas (Co-Principal Investigator), Chief Data Science Officer, San Diego Supercomputer Center; DeFanti (Co-Principal Investigator), Full Research Scientist at the UCSD Qualcomm Institute, a division of Calit2;  and Kenneth Kreutz-Delgado (Co-Principal Investigator), Professor, ECE Department, UCSD.

“What they didn’t ask for [in the PRP grant] was what computer scientists need to support big data and machine learning. So we went back to the campuses and found the computers scientists, faculty and staff that were working on machine learning and ended up with 30 of them that wrote up their research to put into this proposal,” says Smarr. “We asked what was bottlenecking the work and [they responded] it was a lack of access to GPUs to do the compute intensive aspects of machine learning like training data sets on big neural nets.”

Zeroing in on GPUs, particularly GPUs that emphasize lower precision, is perhaps predictable.

“[In traditional] HPC you need 64-bit and error correction and all of that kind of stuff which is provided very nicely by Nvidia’s Tesla line, for instance, but actually because of the noise that is inherent in the data in most machine learning applications it turns out that single precision 32-bit is just fine and that’s much less expensive than the double precision,” says Smarr. For this reason, the project is focusing on less expensive “gaming GPUs” which fit fine into the slots on the FIONAs since they are PCs.

The NSF proposal first called for putting ten GPUs into each FIONA. “But we decided eight is probably optimal, eight of these front line game GPUs and we are deploying 32 of those FIONAs in this new grant across PRP to these researchers, and because they are all connected at 10 gigabits/s we can essentially treat them as a cloud,” says Smarr. There are ten campuses initially participating: UC San Diego, UC Berkeley, UC Irvine, UC Riverside, UC Santa Cruz, UC Merced, Sand Diego State University, Caltech, Stanford, and Montana State University. [A brief summary of researchers and their intended focus by campus is at the end of the article (taken from the grant proposal).]

As shown in the cost comparison below, the premium for high end GPUs such as the Nvidia P100 is dramatic. The CHASE-CI plan is to stick with commodity gaming GPUs, like the Nvidia 1080, since they are used in large volumes which keeps the prices down and the improvements coming. Nevertheless Smarr and DeFanti emphasize they are vendor agnostic and that other vendors have expressed interest in the program.

“Every year Nvidia comes out with a new set of devices and then halfway way through the year they come out with an accelerated version so in some sense you are on a six month cycle. The game cards are around $600 and every year the [cost performance] gets better,” says DeFanti. “We buy what’s available, build, test, and benchmark and then we wait for the next round. [Notably], people in the community do have different needs – some need more memory, some would rather have twice as many $250 GPUs because they are really fast and just have less memory. So it is really kind of custom and there’s some negotiation with users, but they all fit in the same chassis.”

DeFanti argues the practice of simulating networks on CPUs has slowed machine learning’s growth. “Machine learning involves looking at gigabytes of data, millions of pictures, and basically doing that is a brute force calculation that works fine in 32 bit, nobody uses 64 bit. You chew on these things literally for a week even on a GPU, which is much faster than a CPU for these kind of these things. That’s a reason why this field was sort of sluggish just using simulators; it took too much time on desktop CPUs. The first phase of this revolution is getting everybody off simulators.”

That said, getting high performance GPUs into the hands of researchers and students is only half of the machine learning story. Training is hard and compute-intensive and can take weeks-to-months depending upon the problem. But once a network is trained, the computer power required for the inference engine is considerably less. Power consumption becomes the challenge particularly because these trained networks are expected to be widely deployed on mobile platforms.

Here, CHASE-CI is examining a wide range of device types and architectures. Calit2, for example, has been working with IBM’s neuromorphic True North chip for a couple of years. It also had a strong role in helping KnuEdge develop its DSP-based neural net chip. (KnuEdge, of course, was founded by former NASA administrator Daniel Goldin.) FPGAs also show promise.

Says Smarr, “They have got to be very energy efficient. You have this whole new generation of largely non von Neumann architectures that are capable of executing these machine learning algorithms on say streams of video data, radar data, LIDAR data, things like that, that make decisions in real time like approval on credit cards. We are building up a set of these different architectures – von Neumann and non von Neumann – and making those available to these 30 machine learning experts.”

CHASE-CI is also digging into the needed software ecosystem to support machine learning. The grant states “representative algorithms from each of the following six families will be selected to form a standardized core suite of ‘component algorithms’ that have been tuned for optimal performance on the component coprocessors.” Here they are:

  • Deep Neural Network (DNN) and Recurrent Neural Network (RNN) algorithms, including layered networks having fully-connected and convolutional layers (CNNs), variational autoencoders (VAEs), and generalized adversarial networks (GANs). Training will be done using, modern approaches to backpropagation, stochastic sampling, bootstrap sampling, and restricted (and unrestricted) Boltzmann ML. NNs provide powerful classification and detection performance and can automatically extract a hierarchy of features.
  • Reinforcement Learning (RL) algorithms and related approximate Markov decision process (MDP) algorithms. RL and inverse-RL algorithms have critical applications in areas of dynamic decision-making, robotics and human/robotic transfer learning.
  • Variational Autoencoder (VAE) and Markov Chain Monte Carlo (MCMC) stochastic sampling algorithms supporting the training of generative models and metrics for evaluating the quality of generative model algorithms. Stochastic sampling algorithms are appropriate for training generative models on analog and digital spiking neurons. Novel metrics are appropriate.
  • Support Vector Machine (SVM) SVMs can perform an inner product and thresholding in a high-dimensional feature space via use of the “kernel trick”.
  • Sparse Signal Processing (SSP) algorithms for sparse signal processing and compressive sensing, including Sparse Baysian Learning (SBL). Algorithms which exploit source sparsity, possibly using learned over-complete dictionaries, are very important in domains such as medical image processing and brain-computing interfacing.
  • Latent Variable (LVA) Algorithms for source separation algorithms, such as PCA, ICA, and IVA. LV models typically assume that a solution exists in some latent variable sparse within which the components are statistically independent. This class of algorithms includes factor analysis (FA) and non-negative matrix factorization (NMF) algorithms.

Despite such ambitious hardware and software goals, Smarr suggests early results from CHASE-CI should be available sometime in the fall. Don’t get the wrong idea, he cautions. CHASE-CI is a research project for computer science not a production platform.

“We are not trying to be a big production site or anything else. But we are trying to really explore, as this field develops, not just the hardware platforms we’ve talked about but the software and algorithm issues. There’s a whole bunch of different modes of machine learning and statistical analysis that we are trying to match between the algorithms and the architectures, both von Neumann and non von Neumann.

“For 30 years I have been sort of collecting architectures and mapping a wide swath of algorithms on them to port applications. Here we are doing it again but now for machine learning.”

Sample List of CHASE-CI Researchers and Area of Work*

  • UC San Diego: Ravi Ramamoorthi (ML for processing light field imagery), Manmohan Chandraker (3D scene reconstruction), Arun Kumar (deep learning for database systems), Rajesh Gupta (accelerator-centric SOCs), Gary Cottrell (comp. cognitive neuroscience & computer vision), Nuno Vasconcelos (computer vision & ML), Todd Hylton (contextual robotics), Jurgen Schulze (VR), Ken Kreutz-Delgado (ML and NvN), Larry Smarr (ML and microbiome), Tajana Rosing (energy efficiency of running MLs on NvNs), Falko Kuester (ML on NvNs in drones).
  • UC Berkeley: James Demmel (CA algorithms), Trevor Darrell (ML libraries)
  • UC Irvine: Padhraic Smyth (ML for biomedicine and climate science), Jeffrey Krichmar 
(computational neuroscience), Nikkil Dutt (FPGAs), Anima Anandkumar (ML)
  • UC Riverside: Walid Najjar (FPGAs), Amit Roy-Chowdhury (image and video analysis)
  • UC Santa Cruz: Dimitris Achlioptas (linear layers and random bipartite graphs), Lise Getoor (large-scale graph processing), Ramakrishna Akella (multi-modal prediction and retrieval), Shawfeng Dong (Bayesian deep learning and CNN models in astronomy)
  • UC Merced: Yang Quan Chen (agricultural drones)
  • San Diego State: Baris Aksanli (adaptive learning for historical data)
  • Caltech: Yisong Yue (scalable deep learning methods for complex prediction settings)
  • Stanford: Anshul Kundaje (ML & genetics), Ron Dror (structure-based drug design using ML)
  • Montana State: John Sheppard (ML and probablistic methods to solve large systems problems)

*  Excerpted from the CHASE-CI grant proposal

Images courtesy of Larry Smarr, Calit2

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

TACC Researchers Test AI Traffic Monitoring Tool in Austin

December 13, 2017

Traffic jams and mishaps are often painful and sometimes dangerous facts of life. At this week’s IEEE International Conference on Big Data being held in Boston, researchers from TACC and colleagues will present a new Read more…

AMD Wins Another: Baidu to Deploy EPYC on Single Socket Servers

December 13, 2017

When AMD introduced its EPYC chip line in June, the company said a portion of the line was specifically designed to re-invigorate a single socket segment in what has become an overwhelmingly two-socket landscape in the d Read more…

By John Russell

Microsoft Wants to Speed Quantum Development

December 12, 2017

Quantum computing continues to make headlines in what remains of 2017 as tech giants jockey to establish a pole position in the race toward commercialization of quantum. This week, Microsoft took the next step in advanci Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

Explore the Origins of Space with COSMOS and Memory-Driven Computing

From the formation of black holes to the origins of space, data is the key to unlocking the secrets of the early universe. Read more…

ESnet Now Moving More Than 1 Petabyte/wk

December 12, 2017

Optimizing ESnet (Energy Sciences Network), the world's fastest network for science, is an ongoing process. Recently a two-year collaboration by ESnet users – the Petascale DTN Project – achieved its ambitious goal t Read more…

AMD Wins Another: Baidu to Deploy EPYC on Single Socket Servers

December 13, 2017

When AMD introduced its EPYC chip line in June, the company said a portion of the line was specifically designed to re-invigorate a single socket segment in wha Read more…

By John Russell

Microsoft Wants to Speed Quantum Development

December 12, 2017

Quantum computing continues to make headlines in what remains of 2017 as tech giants jockey to establish a pole position in the race toward commercialization of Read more…

By Tiffany Trader

HPC Iron, Soft, Data, People – It Takes an Ecosystem!

December 11, 2017

Cutting edge advanced computing hardware (aka big iron) does not stand by itself. These computers are the pinnacle of a myriad of technologies that must be care Read more…

By Alex R. Larzelere

IBM Begins Power9 Rollout with Backing from DOE, Google

December 6, 2017

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers, Summit a Read more…

By Tiffany Trader

Microsoft Spins Cycle Computing into Core Azure Product

December 5, 2017

Last August, cloud giant Microsoft acquired HPC cloud orchestration pioneer Cycle Computing. Since then the focus has been on integrating Cycle’s organization Read more…

By John Russell

GlobalFoundries, Ayar Labs Team Up to Commercialize Optical I/O

December 4, 2017

GlobalFoundries (GF) and Ayar Labs, a startup focused on using light, instead of electricity, to transfer data between chips, today announced they've entered in Read more…

By Tiffany Trader

HPE In-Memory Platform Comes to COSMOS

November 30, 2017

Hewlett Packard Enterprise is on a mission to accelerate space research. In August, it sent the first commercial-off-the-shelf HPC system into space for testing Read more…

By Tiffany Trader

SC17 Cluster Competition: Who Won and Why? Results Analyzed and Over-Analyzed

November 28, 2017

Everyone by now knows that Nanyang Technological University of Singapore (NTU) took home the highest LINPACK Award and the Overall Championship from the recently concluded SC17 Student Cluster Competition. We also already know how the teams did in the Highest LINPACK and Highest HPCG competitions, with Nanyang grabbing bragging rights for both benchmarks. Read more…

By Dan Olds

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

NERSC Scales Scientific Deep Learning to 15 Petaflops

August 28, 2017

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according Read more…

By Rob Farber

Oracle Layoffs Reportedly Hit SPARC and Solaris Hard

September 7, 2017

Oracle’s latest layoffs have many wondering if this is the end of the line for the SPARC processor and Solaris OS development. As reported by multiple sources Read more…

By John Russell

AMD Showcases Growing Portfolio of EPYC and Radeon-based Systems at SC17

November 13, 2017

AMD’s charge back into HPC and the datacenter is on full display at SC17. Having launched the EPYC processor line in June along with its MI25 GPU the focus he Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Japan Unveils Quantum Neural Network

November 22, 2017

The U.S. and China are leading the race toward productive quantum computing, but it's early enough that ultimate leadership is still something of an open questi Read more…

By Tiffany Trader

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last w Read more…

By John Russell

Leading Solution Providers

Amazon Debuts New AMD-based GPU Instances for Graphics Acceleration

September 12, 2017

Last week Amazon Web Services (AWS) streaming service, AppStream 2.0, introduced a new GPU instance called Graphics Design intended to accelerate graphics. The Read more…

By John Russell

Perspective: What Really Happened at SC17?

November 22, 2017

SC is over. Now comes the myriad of follow-ups. Inboxes are filled with templated emails from vendors and other exhibitors hoping to win a place in the post-SC thinking of booth visitors. Attendees of tutorials, workshops and other technical sessions will be inundated with requests for feedback. Read more…

By Andrew Jones

EU Funds 20 Million Euro ARM+FPGA Exascale Project

September 7, 2017

At the Barcelona Supercomputer Centre on Wednesday (Sept. 6), 16 partners gathered to launch the EuroEXA project, which invests €20 million over three-and-a-half years into exascale-focused research and development. Led by the Horizon 2020 program, EuroEXA picks up the banner of a triad of partner projects — ExaNeSt, EcoScale and ExaNoDe — building on their work... Read more…

By Tiffany Trader

IBM Begins Power9 Rollout with Backing from DOE, Google

December 6, 2017

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers, Summit a Read more…

By Tiffany Trader

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Tensors Come of Age: Why the AI Revolution Will Help HPC

November 13, 2017

Thirty years ago, parallel computing was coming of age. A bitter battle began between stalwart vector computing supporters and advocates of various approaches to parallel computing. IBM skeptic Alan Karp, reacting to announcements of nCUBE’s 1024-microprocessor system and Thinking Machines’ 65,536-element array, made a public $100 wager that no one could get a parallel speedup of over 200 on real HPC workloads. Read more…

By John Gustafson & Lenore Mullin

Flipping the Flops and Reading the Top500 Tea Leaves

November 13, 2017

The 50th edition of the Top500 list, the biannual publication of the world’s fastest supercomputers based on public Linpack benchmarking results, was released Read more…

By Tiffany Trader

Intel Launches Software Tools to Ease FPGA Programming

September 5, 2017

Field Programmable Gate Arrays (FPGAs) have a reputation for being difficult to program, requiring expertise in specialty languages, like Verilog or VHDL. Easin Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This