Nautilus Harnessed for Humanities Research, Future Prediction

By Nicole Hemsoth

September 9, 2011

The observer influences the events he observes by the mere act of observing them or by being there to observe them.

        –Isaac Asimov, Foundation’s Edge

Elements of science fiction have helped us venture guesses about what the future might look like—at least in terms of the technologies some suspect might be pervasive one day. Flying cars, robot housekeepers, and of course, supercomputers that can predict the future and answer humanity’s most pressing questions, are all staples.

This week news emerged that might bring the all-knowing “supercomputer as fortuneteller” trope into reality—or if nothing quite as dramatic, help us better understand the connections between the news and its tone in geographical context.

A recent project called “Culturomics 2.0: Forecasting Large-Scale Human Behavior Using Global News Media Tone in Time and Space” set about to find a way to use tone and geographical analyses methods to yield new insights about global society.  If the lead researcher behind the project is correct, this could not only provide opportunities for societal research at global scale—but could also act as a warning bell before crises occur.

Kalev H. Leetaru, Assistant Director for Text and Digital Media Analytics at the Institute for Computing in the Humanities, Arts and Social Science at the University of Illinois and Center Affiliate at NCSA spearheaded the Culturomics 2.0 project. He claims that his analytics experiment has already allowed him to successfully forecast recent revolutions in Tunisia, Egypt, and Libya. Leetaru also says that he has been able to foresee stability in Saudi Arabia (at least through May 2011), and retroactively estimate Osama Bin Laden’s likely hiding place within a 200-kilometer radius.

Whereas initial Culturomics (1.0) studies focused on the frequency of a particular set of words from digitized books, he says that mere frequency isn’t enough to gain real-time, imminently useful information that reflects the modern world.

Shedding the word frequency element that defined version 1.0 of Culturomics, Leetaru set to take deep analytics to a new level by moving past frequency altogether and opting instead to sharpen the focus on tone, geography and the associations these two factors produced.

The project received funding from the National Science Foundation and was managed in part by the University of Tennessee’s Remote Data Analysis and Visualization Center (RDAV) and the National Institute for Computational Science (NICS). Leetaru was granted time on the large shared memory supercomputer Nautilus as part of the Extreme Science and Engineering Discovery Environment (XSEDE) program.

Leetaru says using a large shared memory system like the Nautilus was the key to achieving his research goals. The 1,024 Intel Nehalem core, 8.2 teraflop system with 4 terabytes available for big data workloads was manufactured by SGI as part of their UltraViolet product line. A system like this allows researchers more flexibility as they seek to take advantage of vast computing power to analyze “big data” in innovative ways.

Leetaru’s goals with this project represent a perfect example of a data-intensive problem in research. To arrive at his results, Leetaru needed to gather 100 million news articles stretching back half a century. From this point, the process required a staged approach, which began with a data mining algorithm that extracted important terms—people, places and events—to create a base network of 10 billion “nodes” in the network of news history.

With a mere 10 billion elements left following extraction, Leetaru next set about seeking out relationships that connected these nodes to begin building a second network. He said that when this was complete, he was left with a total of 100 trillion relationships, yielding a network that was about 2.4 petabytes in size.

Few machines have that kind of disk space let alone memory so he then found that to process the data, he needed to break the project up into pieces. He would look carefully at key pieces, generate that network on the fly using the shared memory system to begin the process of refining—a task he said wouldn’t be possible without Nautilus or another large shared memory system.

With the connections established, Leetaru then ran tools to seek out patterns to find interesting differences in tone in different countries or regions. Using 1500 dimensions of analysis that fall under the banner of “tone mining” which determines the positive or negative “score” of a dictionary of words from existing sources, Leetaru was able to build a profile of more profound connections.

These variances in tone of global news were matched with geographic mining efforts, which places the nodes and tones via an algorithm that seeks to determine where the news sources are talking about. Leetaru explains that this is not a simple algorithm since there are many cities called “Cairo” in the world. The algorithm must mine for contextual references to nearby places or elements to correctly place the coordinates.

The final element is the network analysis or modularity finding step. Leetaru takes his network and looks for nodes that are more tightly connected to each other than the rest of the network to find out how nations are related—an analytics project that yields a well-defined set of seven civilizations on Earth. To get this kind of network requires taking every city, every article that has ever referenced it, and each city then becomes a node with its own complex network of tones, meanings and potential for new findings.

With all of these stages in place, Leetaru says the possibilities are endless. One can watch change over time and create reproducible models—or even go back to look at past events to see how closely one can predict the end result. In the full paper, Leetaru hits on some of his successes showing how major crises have played out in a particular set of ways—offering a chance for researchers to predict the future.

Leetaru pointed to the benefits of using the shared memory system Nautilus with the example that has generated a lot of buzz this week—that his methods led to a retroactive map that pinpointed Bin Laden’s location within 200 km.

“One of the beauties of using a large shared memory machine is that for example I could see an interesting pattern (like the Bin Laden portion where I was assuming there was enough information to pinpoint where he was hiding) and then begin exploring different techniques, including writing quick little Perl scripts that would wrap a small network on the fly actually and process that material and basically make a quick chart or table or map.”

He went on to note:

“With a large shared memory machine, you don’t have to worry about memory—I never had to worry about writing MPI code to distribute memory across nodes; it’s like it was infinite–with a quick script I could grab all locations that mentioned “Bin Laden” since he first started to appear in the news around 10 years ago, and map it over time or in different ways. It boiled down to writing easy Perl scripts, running in a matter of minutes—if I didn’t have all that memory it would have taken weeks or months with each iteration so one benefit is that leveraging that much hardware allows you to do simple things.”

Leetaru says that even as an undergraduate at NCSA working with some of the first iterations of web-scale web mining, he has been fascinated with the possibilities of deep analytics. While his goal with the Culturomics 2.0 project was to forecast large-scale human behavior using global news media tone in time and space but along the way he stumbled upon a few other unexpected findings, including the fact that indeed, the news is becoming “more negative” in terms of general tone and also that the United States tends to favor itself in its own news filings.

In this era of deep analytics that harness real-time news and sentiment, the Foundation series from Isaac Asimov is never far from the mind. For those who haven’t read the books, in a very small nutshell, mathematical formulas allow civilization to predict the future course of history…and madness ensues.

All arguments about potential for chaos or leaps forward for civilization aside, advances in analytics and high-performance computing like those produced on the Nautilus supercomputer have brought this series of classic science fiction tales into the realm of possibility.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

TACC Researchers Test AI Traffic Monitoring Tool in Austin

December 13, 2017

Traffic jams and mishaps are often painful and sometimes dangerous facts of life. At this week’s IEEE International Conference on Big Data being held in Boston, researchers from TACC and colleagues will present a new Read more…

AMD Wins Another: Baidu to Deploy EPYC on Single Socket Servers

December 13, 2017

When AMD introduced its EPYC chip line in June, the company said a portion of the line was specifically designed to re-invigorate a single socket segment in what has become an overwhelmingly two-socket landscape in the d Read more…

By John Russell

Microsoft Wants to Speed Quantum Development

December 12, 2017

Quantum computing continues to make headlines in what remains of 2017 as tech giants jockey to establish a pole position in the race toward commercialization of quantum. This week, Microsoft took the next step in advanci Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

Explore the Origins of Space with COSMOS and Memory-Driven Computing

From the formation of black holes to the origins of space, data is the key to unlocking the secrets of the early universe. Read more…

ESnet Now Moving More Than 1 Petabyte/wk

December 12, 2017

Optimizing ESnet (Energy Sciences Network), the world's fastest network for science, is an ongoing process. Recently a two-year collaboration by ESnet users – the Petascale DTN Project – achieved its ambitious goal t Read more…

AMD Wins Another: Baidu to Deploy EPYC on Single Socket Servers

December 13, 2017

When AMD introduced its EPYC chip line in June, the company said a portion of the line was specifically designed to re-invigorate a single socket segment in wha Read more…

By John Russell

Microsoft Wants to Speed Quantum Development

December 12, 2017

Quantum computing continues to make headlines in what remains of 2017 as tech giants jockey to establish a pole position in the race toward commercialization of Read more…

By Tiffany Trader

HPC Iron, Soft, Data, People – It Takes an Ecosystem!

December 11, 2017

Cutting edge advanced computing hardware (aka big iron) does not stand by itself. These computers are the pinnacle of a myriad of technologies that must be care Read more…

By Alex R. Larzelere

IBM Begins Power9 Rollout with Backing from DOE, Google

December 6, 2017

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers, Summit a Read more…

By Tiffany Trader

Microsoft Spins Cycle Computing into Core Azure Product

December 5, 2017

Last August, cloud giant Microsoft acquired HPC cloud orchestration pioneer Cycle Computing. Since then the focus has been on integrating Cycle’s organization Read more…

By John Russell

GlobalFoundries, Ayar Labs Team Up to Commercialize Optical I/O

December 4, 2017

GlobalFoundries (GF) and Ayar Labs, a startup focused on using light, instead of electricity, to transfer data between chips, today announced they've entered in Read more…

By Tiffany Trader

HPE In-Memory Platform Comes to COSMOS

November 30, 2017

Hewlett Packard Enterprise is on a mission to accelerate space research. In August, it sent the first commercial-off-the-shelf HPC system into space for testing Read more…

By Tiffany Trader

SC17 Cluster Competition: Who Won and Why? Results Analyzed and Over-Analyzed

November 28, 2017

Everyone by now knows that Nanyang Technological University of Singapore (NTU) took home the highest LINPACK Award and the Overall Championship from the recently concluded SC17 Student Cluster Competition. We also already know how the teams did in the Highest LINPACK and Highest HPCG competitions, with Nanyang grabbing bragging rights for both benchmarks. Read more…

By Dan Olds

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

NERSC Scales Scientific Deep Learning to 15 Petaflops

August 28, 2017

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according Read more…

By Rob Farber

Oracle Layoffs Reportedly Hit SPARC and Solaris Hard

September 7, 2017

Oracle’s latest layoffs have many wondering if this is the end of the line for the SPARC processor and Solaris OS development. As reported by multiple sources Read more…

By John Russell

AMD Showcases Growing Portfolio of EPYC and Radeon-based Systems at SC17

November 13, 2017

AMD’s charge back into HPC and the datacenter is on full display at SC17. Having launched the EPYC processor line in June along with its MI25 GPU the focus he Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Japan Unveils Quantum Neural Network

November 22, 2017

The U.S. and China are leading the race toward productive quantum computing, but it's early enough that ultimate leadership is still something of an open questi Read more…

By Tiffany Trader

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last w Read more…

By John Russell

Leading Solution Providers

Amazon Debuts New AMD-based GPU Instances for Graphics Acceleration

September 12, 2017

Last week Amazon Web Services (AWS) streaming service, AppStream 2.0, introduced a new GPU instance called Graphics Design intended to accelerate graphics. The Read more…

By John Russell

Perspective: What Really Happened at SC17?

November 22, 2017

SC is over. Now comes the myriad of follow-ups. Inboxes are filled with templated emails from vendors and other exhibitors hoping to win a place in the post-SC thinking of booth visitors. Attendees of tutorials, workshops and other technical sessions will be inundated with requests for feedback. Read more…

By Andrew Jones

EU Funds 20 Million Euro ARM+FPGA Exascale Project

September 7, 2017

At the Barcelona Supercomputer Centre on Wednesday (Sept. 6), 16 partners gathered to launch the EuroEXA project, which invests €20 million over three-and-a-half years into exascale-focused research and development. Led by the Horizon 2020 program, EuroEXA picks up the banner of a triad of partner projects — ExaNeSt, EcoScale and ExaNoDe — building on their work... Read more…

By Tiffany Trader

IBM Begins Power9 Rollout with Backing from DOE, Google

December 6, 2017

After over a year of buildup, IBM is unveiling its first Power9 system based on the same architecture as the Department of Energy CORAL supercomputers, Summit a Read more…

By Tiffany Trader

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Tensors Come of Age: Why the AI Revolution Will Help HPC

November 13, 2017

Thirty years ago, parallel computing was coming of age. A bitter battle began between stalwart vector computing supporters and advocates of various approaches to parallel computing. IBM skeptic Alan Karp, reacting to announcements of nCUBE’s 1024-microprocessor system and Thinking Machines’ 65,536-element array, made a public $100 wager that no one could get a parallel speedup of over 200 on real HPC workloads. Read more…

By John Gustafson & Lenore Mullin

Flipping the Flops and Reading the Top500 Tea Leaves

November 13, 2017

The 50th edition of the Top500 list, the biannual publication of the world’s fastest supercomputers based on public Linpack benchmarking results, was released Read more…

By Tiffany Trader

Intel Launches Software Tools to Ease FPGA Programming

September 5, 2017

Field Programmable Gate Arrays (FPGAs) have a reputation for being difficult to program, requiring expertise in specialty languages, like Verilog or VHDL. Easin Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This