A powerhouse concept in attaining new knowledge is the notion of the “emergent property,” the combination of formerly stovepiped scientific disciplines and exploratory methods to form cross-disciplinary intelligence that generates breakthrough insight. In computer science, we see vendors working to leverage the emergent property principle, building bridges across computing and analytical techniques in the pursuit of more powerful AI.
At SC18 this week in Dallas, IBM’s vice president of HPC and cognitive systems, Dave Turek – one of the brainiest and most engaging thinkers in HPC – met with us to talk about new tools from Big Blue that combine HPC and AI. One of them, referred to as intelligent simulation, using AI to accelerate HPC-powered simulations by reducing unnecessary simulation runs, bringing more focus to experimentation and getting to the right answer faster.
Turek’s point is that directing compute power in efficient ways is important – that focusing only on generating more processing power, no matter how amped up, only gets you so far. “The leaps high performance computing has made in computing power don’t always correlate to improved insights,” Turek blogged this week, “and we’re examining ways for researchers to apply advanced analytics to design better experiments.”
IBM is working on integrating Bayesian probability, a 200-plus year-old mathematical technique that “analyzes what I know, and suggests what I should do next, thereby helping eliminating simulations that are unlikely to yield desired results from experiment designs,” Turek said.
Preliminary trials are showing results. The IBM team has worked with customers in pharmaceuticals, chemistry and materials science and has observed that Bayesian principles has reduced simulation runs by as much as 75 percent while increasing the accuracy of answers, according to Turek.
“In an era where Moore’s Law doesn’t have the kick it once had this is a dramatic result, and these techniques could be the path to radically reduced hardware cost and deeper insight by a combination of classic HPC and modern analytics techniques.”
Industry analyst Addison Snell, CEO of Intersect360 Research, who is familiar with IBM’s work in intelligent simulation, is impressed.
“This is a strategic direction we have heard from IBM and others, which we think will ultimately help shape the future of HPC,” he said. “While this level of AI-augmented HPC won’t be achieved this year or next, it is nevertheless worthy of exploration and research today.”
IBM plans to encapsulate the HPC-Bayesian capability in an appliance, Turek said, that can be installed adjacent to an existing cluster of other architecture. The appliance will be pre-programmed, so researchers only need to tell the systems to exchange data, and the Bayesian appliance would design smarter simulation instructions for the primary cluster.
Turek added that IBM is working with Penguin Computing and Cray on this project. Big Blue plans to bring these capabilities to its existing suite of AI-driven products, including the IBM Power Systems AC922 server and IBM ESS storage, the building block of the Summit and Sierra supercomputers (ranked the nos. one and two, respectively, most powerful supercomputers) along with the IBM PowerAI toolkit.
Earlier this year at a conference of the American Chemical Society in Boston, IBM demonstrated a tool called IBM RXN that predicts the outcome of organic chemical reactions (it’s available on the web free-of-charge to use on the IBM Zurich system). Turek said professional chemists were invited to take part in a head-to-head competition with the cognitive discovery tool, and the tool beat the chemists “by about 4 to 1 in terms of accurately predicting the outcomes, and we were doing it in seconds,” according to Turek.
“In the context of HPC, this technology presents a unified approach to complement existing simulations with data-inspired analytics,” Turek said. “And it can, in some cases, even displace classic mod-sim completely.”
In a related project also slated for integration with Power servers and PowerAI tools, IBM is attempting to lessen the pain of AI-related data prep, infamous for consuming 80 percent of researchers’ (and data scientists’) time. Referred to as “cognitive discovery,” the objective is to improve data ingest at scale using integrated tools that help stockpile catalogues of scientific data that are automatically converted into a “knowledge graph,” a visual representation of the data’s relationships.
Turek said the tools have enabled IBM researchers to build a knowledge graph of 40 million scientific documents in 80 hours, a rate of 500,000 documents per hour. The tools ingest and interpret data formatted as PDFs, handwritten note books, spreadsheets, pictures and more.
The tools, in short, amass, organize and search more information than can possibly be grasped by the human mind in a world in which data, documents and knowledge are exploding exponentially.
“The tool is being designed to help bring order to chaotic data,” Turek said, “and contribute to establishing a corporate memory for all the HPC work an organization has ever performed, something of critical importance as employees retire or leave.”
He said the cognitive discovery tools have deep search capabilities against the knowledge graph that allow exploration of complicated queries and include relevance ratings of search results. The tools will be applied across business use cases to create vertical, domain specific applications, Turek said.
He presented a typical scenario in the energy discovery field – let’s say a geologist has been handed a rock sample that may indicate the presence of oil or gas.
“We say the human in this play is acting as an inference engine, and that inference engine is operating against a corpus of information the oil company has and is drawing a conclusion,” Turek said. “But how good is that, how complete is that, can we help there? So we said we’re going to take this corpus conversion tool and the knowledge graph generator and we’re going to suck in all of that company’s geological information, all of the published papers and non-published papers on geology related, or unrelated, to petroleum, and then we’re going to go out to the public databases, so that now my corpus is much bigger and it’s also organized in the construct of a (searchable) knowledge graph.”
He also cited a hypothetical material scientist, who believes him or herself to be an expert in the field. Yet there are 400,000 material science papers published per year, a mountain of literature beyond human scale. And what if there is fertile ground for new discovery by blending material science and, say, biology (see above re “emergent property”). A tool that can collect and relate massive amounts of data across both fields could have great value, Turek said.
“A lot of what’s going on today in engineering and science is a synthesis of knowledge across domains,” he said. “So you may know everything there is to know about material science, but you may not know all that much about biology. Bringing those two things together may be quite critical.”
We asked Turek how the new tools, due on the market the first half of next year, relate to Watson, IBM’s natural language query and information cognitive supercomputer.
“Some of the technology will be embedded into Watson going forward,” he said. “But our go-to-market strategy will be independent of Watson because we’re putting a lot of emphasis on scientific kinds of information. And we’re going to target it to specific kinds of customers in the scientific space.”