If it weren’t for the heavy-hitter technology team behind start-up Pattern Computer, which emerged from stealth today in a live-streamed event from San Francisco, one would be tempted to dismiss its claims of inventing something revolutionary called “pattern discovery” in contrast to conventional pattern recognition. The HPC community is wary of black box claims in which spectacular results are presented or promised without revealing the underlying technology.
Pattern Computer, flying under the radar as Coventry Computer for the past couple of years, is the brainchild of technologist and entrepreneur Mark Anderson who has assembled a team including some very familiar HPC names – Michael Riddle, chief systems architect at Pattern (Autodesk founder), James Reinders, systems architect, (Intel), Irshad Mohammed, software development engineer (Fermilab), Ty Carlson, CTO (Amazon and Microsoft), and Eric Greenwade, technical fellow (Microsoft, LLNL, LBNL, LANL) to name just a few.
Add to that two very impressive alpha clients – Larry Smarr, Calit2 director, a man with access to rather substantial HPC resources, and Lee Hood, founder of the Institute for Systems Biology and developer of early automated DNA sequencing machines used in the Human Genome Project – who discussed in glowing terms the early results from and potential impact of Pattern Computer’s technology in bioscience.
It’s hard to dismiss such a lineup, however chary Pattern Computer is about revealing technical details. In a nutshell, Pattern Computer says it has developed an approach to exploring data that permits very high dimensionality exploration in contrast to the pairwise approach that now dominates. It has also figured out how to do the calculations more efficiently with existing hardware architecture organized specifically for this kind of data exploration. Exceedingly complicated network layers are not required. Fancy math and software, and clever hardware architecture are.
Here’s what Anderson told HPCwire in a pre-briefing:
“A simple way to put this in historic terms would be to say if you look at the entire history of human-computer interactions until now, essentially what you are seeing, I think, is we tell the computers what we want and then the computers come back with what we have asked for in a better and better means, faster and more accurate and more of it. What we are hoping now to see is a true inflection point that moves from that kind of relationship to one in which we tell the system we want something and the system brings back something we have never expected before, don’t even understand why it is there. And then of course the question becomes why is it there, and we will actually be able to tell people why it is there. So these will be true discoveries.”
“We may never expose everything we know, all of our crown jewels. We are going to trade secret more than patent to protect our most important secrets. You can assume that we have found new ways of using that hardware. And we have a lot of proprietary mathematics and software to do that – which we do. You can probably assume that over time things will get more and more complex and that there will be more and more hardware, unique hardware involved. But basically we are trying to make use of the most advanced [hardware now available]. So we really like non-von Neumann chips as an example and we think that the heterogeneous chip architecture is the only way to go.”
In the release accompanying the launch, Pattern lays out its claim thusly, “Pattern discovery is an emerging category – an extension of the machine learning field – that distinguishes itself by using both supervised and unsupervised learning. While pattern recognition solutions are widely available, pattern discovery uniquely identifies previously hidden, higher-order correlations in vast datasets without instructions as to where or what to look for.”
There’s a lot to digest here. It’s not clear how much similarity exists between Pattern Computer as announced today and nascent plans formulated in 2015 which planned to use IBM’s TrueNorth neuromorphic chip (see GeekWire article, New startup building ‘desktop supercomputer,’ seeking big breakthroughs using chips that work like the human brain). The latter design, also called Pattern Computer, was a result of a challenge issued and solution sketched out during the October 2015 Future in Review (FiRe) Conference, owned by Anderson. Many of the same people are involved now.
According to a Pattern Computer spokesman, “That was merely the beginning for Coventry/Pattern Computer. What’s being announced is fully realized and ready for additional deployment, featuring more advanced computer systems, a data center, headquarters and partnerships in place — all developed over the past few years in stealth.” Coventry was founded in 2016, is headquartered in San Juan Island, Washington. Headcount is under 50. The company declined to name its investors.
Today’s event was labelled Splash 1 and focused on the company’s basic capabilities and their application to bioscience as a demonstration use case. James (Ben) Brown, department head, molecular ecosystems biology, LBNL, and chair, environmental bioinformatics, University of Birmingham, UK, was instrumental in helping Pattern Computer develop its biomedical practice. Brace for other Splashes around different domains advised Anderson.
“This is a universal system, so it doesn’t care what arena you’re in or what silo it’s in or what type of data it looks at. As far as we can tell it is completely not religious about that,” said Anderson. “These Splash waves will have different types of companies with them. So this first wave is biomedical. Each one will be completely different from the prior one, partly because we want to show that it’s able to do that work but also because I think it establishes an important truth in design of computing where one doesn’t have to be on a highly-supervised, and then finely-tuned algorithm to that exact science, but in fact one can use a general approach and have deep success.”
The intent is to sell “discovery” as a service. “We really don’t want to be box sellers,” said Anderson. Just provide the data set you think that represents the problem. “You would have an area expert of your own who we would work with, a PI of some kind. We have people who do the ingest of the data and they would work with that person. Once we have it we’ll take it from there, and come back and show you what we have discovered and help you understand what that means to you.”
Sounds a bit magical, which it isn’t and is not the impression Pattern Computer wants to convey. Still, the tight-lipped posture will likely spur some skepticism well as efforts by many in the HPC community to uncover the technical details. Fundamentally, said Anderson, Pattern Computer has developed a new way to look at problem space – a method that relies more on leveraging high dimensionality rather than huge data sets, exhaustive iteration, or very many layered network training.
“We can do very high dimensional analysis, essentially n dimensional analysis where most folks are dealing with pairwise functions,” said Anderson. “We’ll be talking on the 23rd about two fields. One is cancer. The other is personalized medicine. In both cases, and in very short periods of time, we’ve been able to make discoveries and in each case it is not by doing what you might guess. It’s not by running against [a data set of] 10,000 instead of 5,000. We are not using that kind of tool kit. But we have been able to look at things which are very high dimensional.
“I think you know the usual stuff, using those tools of yesterday giving single pairwise information on genetic contribution to cancer. People struggle with getting beyond that. We can do, so far up to six, and have actually done much higher numbers. We can take 20,000 variables and reduce them to the six that matter and then actually understand the dynamic relationships between those six. No one as far as I know has ever done that before. We are working with teams who are oncology teams now, academic and institutional.”
Working with a well-known and heavily investigated breast cancer database, Anderson said Pattern Computer team did a first run on the database and “found a druggable discovery in about 24 hours.”
The proof points offered today are impressive. Smarr, of course, is a long time HPC pioneer who in recent years has been investigating the human microbiome including developing novel computational tools. Anderson said, “Larry had been using other HPC tools. We were able in a very short time, about a week, to do runs against the data that had already been exposed to others and find new things for him to help him create a new hypothesis and research angle, and find out literally new dynamics of disease description.”
Hood has explored virtually every aspect of life sciences technology. His centerpiece concept is what he calls P4 Medicine (Predictive, Preventive, Personalized and Participatory) which in broad terms would use blood biomarkers and digital technology to characterize a person’s health including genetic and environmental factors. Done in a timely way, the hope is P4 can drive research, clinical, health maintenance issues.
Compressing the details of Hood’s (P4) , Smarr’s (IBD and microbiome), and Brown’s (breast cancer) individual work discussed is challenging. A description of Smarr’s work is available on the Pattern Computer website. It was clear high dimensional analysis allowed each of them to gain new, sometimes unexpected insight and that doing that requires a special platform. For example, it was noted that the time required to identify all the interactions of six genes in a 20,000-gene set would take at least 25 years on very high end HPC resources. Pattern Computer reported completing the task in one day and the results led to new actionable insight in the cancer work.
Pattern Computer’s strategy is to use project results and respected third-party testimonials such as these – rather than a detailed explanation of its technology – to attract users. How viable that approach is, time will tell. In any case, said Anderson, the Pattern Computer method requires purpose-built architecture.
“One thing we are going to talk about [at the launch] is why couldn’t you try to run this on an HPC system today? Why bother with redesigning the entire stack. We actually have come up with a mathematical proof of why it is so hard. And the numbers are rather astonishing. We think it’s somewhere between 10^20thand 10^40thcalculations, the numbers of cycles are so high, you couldn’t do it even at Livermore [National Lab]. It’s just too much. We have found ways of reducing the problem so we can deal with very high numbers of variables and yet not have to do what you would normally have to do on a supercomputer.”
Pattern Computing currently has adequate computing resources, according to Anderson, but plans to scale up. “We are already doing runs against databases that are usually done on a supercomputer or a cluster. [Our] datacenter is not huge but it works. It’ll get bigger. At some time in the future we might be free to talk a little bit more about that architecture but it will be different from what you have been used to seeing.”
We’re left with something of a black box quandary, but the highly credentialed technical team and early users convey credibility. It will be interesting to watch how the company fares.