At the 2014 HPC User Forum in Seattle, Ryan Quick and Arno Kolster from PayPal describe how the company is using HPC to transform its chaotic real-time server data into intelligent, actionable insight. The unique “Systems Intelligence” approach uses HP’s Moonshot server powered by TI processors to aggregate, analyze and act on transaction data in real time.
The goal for PayPal is to detect patterns and anomalies and take action upon those before the user experience is negatively impacted. The main challenge is doing this in real-time as PayPal needs to process some 3 million events every second from thousands of sources in its datacenter. Source events include application logs, machine data, environmental data from the datacenters, and social media events. There is about 25 Tb of data coming in per hour, 20 Mb per second of machine data from thousands of PayPal servers, some 50,000 metadata relationships, and an ever-increasing tide of social media trends and customer interactions to consider.
“We basically take all that data, put it all together and correlate events across those streams,” says Kolster, Senior MTS, Database Architect, PayPal. “For instance if we put out a release out on Thursday night in San Jose, and a few minutes later we notice an increase in customer interactions in Dublin and Twitter feeds in Germany saying this latest release in PayPal isn’t working correctly, we can now correlate all three events within seconds, whereas before it would take several hours for people to understand what was happening.”
The Systems Intelligence flow architecture shares many similarities with a PayPal fraud detection system that was also built on HPC principles. It’s fairly simplistic, says Kolster, but when it gets down to the actual deployment, it becomes much more complex. All the source event data gets thrown on a huge bus in real time and ingested by app servers, which are doing inline processing. There are complex event processors on each of those application servers, and a huge shared memory event window with the SGI UV2000. The event stream is augmented with offline databases, both relational and graph databases. The machine learning element pushes new models back into the application servers. An alerting and notification system is used for problem remediation.
PayPal’s exploration of HPC started as far back as 2006. As Ryan Quick, Principal Architect in the Advanced Technology Group at PayPal, explains, “Our job is looking at the next best thing.” Quick and Kolster started shopping in HPC because they had a set of problems, especially around real-time, that weren’t being met by the tools they could acquire from their regular channels.
“There’s a weird gray area where your needs aren’t being met in the enterprise, but HPC is still a little too bleeding-edge,” adds Quick.
In discussing how they decided upon the HP-TI platform, Quick recalls looking at Kolster and saying, “what they’ve done here is build an HPC cluster and they put it on a system on a chip.” The KeyStone multicore processors provided a powerful combination of four ARM Cortex-A15 cores, eight C66x DSPs, plus internal fabric and networking capabilities.
As partner TI explains, the 66AK2Hx SoC running in HP’s Moonshot platform possesses some unique advantages to aid in real time processing. These include:
1) C66x DSP cores that have great signal processing performance as well as very low latency response times and can receive, process, and return packet data very quickly.
2) An integrated I/O fabric that moves data quickly and with low latency. The C66AK2H IO fabric utilizes sRIO that has 10x lower hop to hop latency than Ethernet I/O.
3) Additional KeyStone II architecture elements such as the Multicore Navigator and TeraNet which further enable low latency data movement within and across devices.
The new platform essentially treats Complex Event Processing as Digital Signals. “You turn the data into a signal that can be analyzed in hardware,” says Quick. And with eight DSPs, they can ingest many signals at once and then pattern recognize against all of them simultaneously. The system is also quite efficient: it runs at 55 watts per cartridge (4 SoCs/cartridge) and delivers an impressive 11.2 gigaflops-per-watt. As a point of comparison, the most energy-efficient supercomputer in the world as per the most recent edition of the Green500 list – TSUBAME-KFC in Japan – offers a more modest 4.4 gigaflops-per-watt.
The application is currently available to the public through TI. The product includes OpenCL, the full development kit and the software.