Rediscovering the Value of the Past

By Lisa Waddell, IBM Spectrum Computing

June 25, 2019

Some people would like to forget their past, perhaps for good reasons. But for business or research organizations, preserving institutional memory can be the key to thriving in the future.

As successful companies evolve, they innovate, make new discoveries, and pick up valuable insights about what works – and what doesn’t. It’s this accumulated wisdom that good companies build on to stay successful.

Failing to learn from the past – or adequately and effectively preserving institutional knowledge – can have substantial consequences. Analysts estimate that the average large US business loses $47 million in productivity each year as a direct result of inefficient knowledge sharing.1 But collecting, managing, and making readily accessible valuable institutional information isn’t easy. U.S. knowledge workers waste 5.3 hours every week either waiting for vital information from their colleagues or working to recreate existing institutional knowledge. That wasted time translates into delayed projects, missed opportunities, frustration among employees, and significant impact on the bottom line.

High performance computing (HPC) suffers from its own version of this information disorder. Researchers and data scientists often spend up to 80% of their time “wrangling” diverse data sets in diverse formats so they can be collected within a single data warehouse.2 Data scientists must cleanse, shape, and conform a kitchen sink of data before it can be analyzed for a variety of downstream uses, including business analytics and machine learning.

[Discover the 5 benefits AI brings to HPC.]

Let’s look at an example of the HPC version of this institutional knowledge problem. The design of an airplane wing is constantly being examined and refined to maximize airflow and minimize drag while also generating lift. Often, supercomputers are used to perform simulations on specific designs to give engineers a sense of how the wing will perform in the field. Before running these simulations, vast amounts of data must be discovered, collected, and cleansed – a time-intensive process compounded by the number of simulations scheduled to run. But what if some of the data is in an engineer’s head far-removed from the data center, or in a notebook in a folder in a filing cabinet? How do engineers leverage this disparate, scattered data in their simulations to yield better results?

Leading technology providers, including IBM Research, have begun developing solutions to lessen the pain of searching for and discovering relevant information available to the institution and then collecting and preparing these data sets for uses such as large-scale modeling and simulation runs or analytics powered by artificial intelligence (AI). Referred to as “cognitive discovery,” the objective is to improve data ingest at scale using integrated tools that help stockpile catalogues of scientific data that are automatically converted into a “knowledge graph” – a visual representation of the data’s relationships.

[Learn the key factors to consider when implementing a multi-cloud strategy.]

IBM researchers have used new cognitive discovery tools to build a knowledge graph of 40 million scientific documents in only 80 hours – a rate of 500,000 documents per hour.  The tools can ingest and interpret data formatted as PDFs, handwritten notebooks, spreadsheets, pictures, and more. They have built-in deep search capabilities enabling exploration of very complicated queries against the knowledge graph, along with relevance ratings of search results for the desired query. The tools help bring order to chaotic data and contribute to establishing a corporate memory of all the HPC work an organization has ever performed, something of critical importance as employees retire or leave.

This new intelligent software helps enterprises capture, store, catalog, and analyze their vast institutional knowledge to drive better business outcomes. Working with collaborators across industries, IBM has begun building domain-specific cognitive discovery applications. In data-heavy HPC operations, the tools can substantially reduce the time it takes to perform specific simulations.

For example, at the American Chemical Society’s 2018 meeting in Boston, IBM showcased a cognitive discovery tool called IBM RXN which predicts the outcomes of organic chemical reactions.3 The tool is available on the internet at no cost to use on the IBM Zurich system. It currently offers forward prediction and soon will provide optimized retro-synthesis. Leveraging the capabilities of IBM RXN, a research organization that specializes in chemistry reduced the time needed to complete data ingest for their research by an astounding 70%. At the same time, the organization was able to preserve their knowledge in multiple native formats, making it a valuable resource they can continue to leverage over time.

This is just one example of how valuable it can be for organizations to discover all the sources of their institutional knowledge and then collect that data within a single view and cleanse and format it for easy accessibility. But cognitive discovery tools don’t necessarily stop there. They can also reach out to external data sources and automatically add and catalog them against the existing knowledge graph. For example, a material scientist might believe he or she is an expert in the field. Yet there are 400,000 material science papers published each year, a mountain of literature beyond human scale. And what if there is fertile ground for new discovery by blending material science and, say, biology? A tool that can collect and relate massive amounts of data across both fields could have enormous value.4

Cognitive discovery is a new technology wave that literally connects the past and the future. And it can automatically expand current institutional knowledge bases in almost limitless directions. Previously, a substantial amount of the information an organization naturally generated could be lost or at least remain inaccessible or unusable. Now, thanks to new cognitive discovery technologies, smart companies are rediscovering their past knowledge – and leveraging it to create a brighter future.

 


References:

[1] Panopto: Inefficient Knowledge Sharing Costs Large Businesses $47 Million Per Year, July 2018 https://www.panopto.com/about/news/inefficient-knowledge-sharing-costs-large-businesses-47-million-per-year/

2 EnterpriseAI: Data Prep: Easing Data Scientists’ ‘Janitorial Work’, March 2017 https://www.enterpriseai.news/2017/03/15/google-cloud-trifacta-tackle-8020-data-prep-rule-data-scientists/

3 HPCwire: IBM’s AI-HPC Combine for ‘Intelligent Simulation’: Eliminating the Unnecessary, November 2018

IBM’s AI-HPC Combine for ‘Intelligent Simulation’: Eliminating the Unnecessary 

 

4 EnterpriseAI: Data Prep: Easing Data Scientists’ ‘Janitorial Work’, March 2017 https://www.enterpriseai.news/2017/03/15/google-cloud-trifacta-tackle-8020-data-prep-rule-data-scientists/

 

Return to Solution Channel Homepage
HPCwire