Visit additional Tabor Communication Publications
December 09, 2011
NEW YORK, N.Y., Dec 8 -- IBM (NYSE: IBM) today announced it is contributing a massive database of chemical data extracted from millions of patents and scientific literature to the National Institutes of Health. This contribution will allow researchers to more easily visualize important relationships among chemical compounds to aid in drug discovery and support advanced cancer research.
In collaboration with AstraZeneca, Bristol-Myers Squibb, DuPont and Pfizer, IBM is providing a database of more than 2.4 million chemical compounds extracted from about 4.7 million patents and 11 million biomedical journal abstracts from 1976 to 2000. The announcement was made at an IBM forum on U.S. economic competitiveness in the 21st century, exploring how private sector innovations and investment can be more easily shared in the public domain.
The publicly available chemical data can be used by researchers worldwide to gain new insights and enable new areas of research. It will also help researchers save time by more efficiently finding information buried in millions of pages of patent documents. Access to this data will also allow researchers to analyze far larger sets of documents than the traditional manual process, adding a whole new dimension to the ability to search intellectual property.
The data was extracted using the IBM business analytics and optimization strategic IP insight platform (SIIP), a combination of data and analytics delivered via the IBM SmartCloud, and developed by IBM Research in collaboration with several major life sciences organizations. This new cloud-driven method for curating and analyzing massive amounts of patents, scientific content and molecular data. It uses techniques such as automated image analysis and enhanced optical recognition of chemical images and symbols to extract information from patents and literature upon publication. This is a task that otherwise takes weeks and months to complete manually, but can be done rapidly using this new technology.
“Information overload continues to be a challenge in drug discovery and other areas of scientific research,” said Steve Heller, project director for the InChI Trust, a non-profit which supports the InChI international standard to represent chemical structures. “Rich data and content is often buried in patents, drawings, figures and scholarly articles. This contribution by IBM and its collaborators will make it easier for researchers to use this data, link to other data using the InChI structure representation and derive new insight.”
Over the past six years, several major life sciences organizations have worked on this project with IBM Research gaining access to a comprehensive chemical library extracted from worldwide patents and scientific abstracts. Public structure extraction tools developed by researchers at the National Institutes of Health were also used successfully in this project.
“The scientific community will receive enormous benefit from this advancement,” said Heller. “This is an important addition to the open chemistry data sets. The comprehensiveness of the data and the new ways researchers can look at these data and cross-link to other data associated with each chemical is expected to help with drug development to fight many forms of cancers and other human diseases, as well as the development of other chemical compounds.
The data will be contributed to the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine (NLM), and the Computer-Aided Drug Design (CADD) Group of the National Cancer Institute (NCI) at the National Institutes of Health. It will be incorporated in the NCBI’s PubChem, a public resource for the scientific community that serves as an aggregator for scientific results as well as in NCI CADD Group services such as the Chemical Structure Lookup Service and the Chemical Identifier Resolver.
The National Institutes of Health will make the content available on PubChem at http://pubchem.ncbi.nlm.nih.gov
Watch a video at: http://www.youtube.com/watch?v=GAzwu9AzVEg
More information about IBM SIIP is available at www.ibm.com/gbs/bao/siip
For more information about IBM, visit: http://www.ibm.com/smarterplanet
For more information about IBM Life Sciences, visit: http://www.ibm.com/lifesciences
In quieter times, sounding the bell of funding big science with big systems tends to resonate further than when ears are already burning with sour economic and national security news. For exascale's future, however, the time could be ripe to instill some sense of urgency....
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.