Nautilus Harnessed for Humanities Research, Future Prediction

By Nicole Hemsoth

September 9, 2011

The observer influences the events he observes by the mere act of observing them or by being there to observe them.

        –Isaac Asimov, Foundation’s Edge

Elements of science fiction have helped us venture guesses about what the future might look like—at least in terms of the technologies some suspect might be pervasive one day. Flying cars, robot housekeepers, and of course, supercomputers that can predict the future and answer humanity’s most pressing questions, are all staples.

This week news emerged that might bring the all-knowing “supercomputer as fortuneteller” trope into reality—or if nothing quite as dramatic, help us better understand the connections between the news and its tone in geographical context.

A recent project called “Culturomics 2.0: Forecasting Large-Scale Human Behavior Using Global News Media Tone in Time and Space” set about to find a way to use tone and geographical analyses methods to yield new insights about global society.  If the lead researcher behind the project is correct, this could not only provide opportunities for societal research at global scale—but could also act as a warning bell before crises occur.

Kalev H. Leetaru, Assistant Director for Text and Digital Media Analytics at the Institute for Computing in the Humanities, Arts and Social Science at the University of Illinois and Center Affiliate at NCSA spearheaded the Culturomics 2.0 project. He claims that his analytics experiment has already allowed him to successfully forecast recent revolutions in Tunisia, Egypt, and Libya. Leetaru also says that he has been able to foresee stability in Saudi Arabia (at least through May 2011), and retroactively estimate Osama Bin Laden’s likely hiding place within a 200-kilometer radius.

Whereas initial Culturomics (1.0) studies focused on the frequency of a particular set of words from digitized books, he says that mere frequency isn’t enough to gain real-time, imminently useful information that reflects the modern world.

Shedding the word frequency element that defined version 1.0 of Culturomics, Leetaru set to take deep analytics to a new level by moving past frequency altogether and opting instead to sharpen the focus on tone, geography and the associations these two factors produced.

The project received funding from the National Science Foundation and was managed in part by the University of Tennessee’s Remote Data Analysis and Visualization Center (RDAV) and the National Institute for Computational Science (NICS). Leetaru was granted time on the large shared memory supercomputer Nautilus as part of the Extreme Science and Engineering Discovery Environment (XSEDE) program.

Leetaru says using a large shared memory system like the Nautilus was the key to achieving his research goals. The 1,024 Intel Nehalem core, 8.2 teraflop system with 4 terabytes available for big data workloads was manufactured by SGI as part of their UltraViolet product line. A system like this allows researchers more flexibility as they seek to take advantage of vast computing power to analyze “big data” in innovative ways.

Leetaru’s goals with this project represent a perfect example of a data-intensive problem in research. To arrive at his results, Leetaru needed to gather 100 million news articles stretching back half a century. From this point, the process required a staged approach, which began with a data mining algorithm that extracted important terms—people, places and events—to create a base network of 10 billion “nodes” in the network of news history.

With a mere 10 billion elements left following extraction, Leetaru next set about seeking out relationships that connected these nodes to begin building a second network. He said that when this was complete, he was left with a total of 100 trillion relationships, yielding a network that was about 2.4 petabytes in size.

Few machines have that kind of disk space let alone memory so he then found that to process the data, he needed to break the project up into pieces. He would look carefully at key pieces, generate that network on the fly using the shared memory system to begin the process of refining—a task he said wouldn’t be possible without Nautilus or another large shared memory system.

With the connections established, Leetaru then ran tools to seek out patterns to find interesting differences in tone in different countries or regions. Using 1500 dimensions of analysis that fall under the banner of “tone mining” which determines the positive or negative “score” of a dictionary of words from existing sources, Leetaru was able to build a profile of more profound connections.

These variances in tone of global news were matched with geographic mining efforts, which places the nodes and tones via an algorithm that seeks to determine where the news sources are talking about. Leetaru explains that this is not a simple algorithm since there are many cities called “Cairo” in the world. The algorithm must mine for contextual references to nearby places or elements to correctly place the coordinates.

The final element is the network analysis or modularity finding step. Leetaru takes his network and looks for nodes that are more tightly connected to each other than the rest of the network to find out how nations are related—an analytics project that yields a well-defined set of seven civilizations on Earth. To get this kind of network requires taking every city, every article that has ever referenced it, and each city then becomes a node with its own complex network of tones, meanings and potential for new findings.

With all of these stages in place, Leetaru says the possibilities are endless. One can watch change over time and create reproducible models—or even go back to look at past events to see how closely one can predict the end result. In the full paper, Leetaru hits on some of his successes showing how major crises have played out in a particular set of ways—offering a chance for researchers to predict the future.

Leetaru pointed to the benefits of using the shared memory system Nautilus with the example that has generated a lot of buzz this week—that his methods led to a retroactive map that pinpointed Bin Laden’s location within 200 km.

“One of the beauties of using a large shared memory machine is that for example I could see an interesting pattern (like the Bin Laden portion where I was assuming there was enough information to pinpoint where he was hiding) and then begin exploring different techniques, including writing quick little Perl scripts that would wrap a small network on the fly actually and process that material and basically make a quick chart or table or map.”

He went on to note:

“With a large shared memory machine, you don’t have to worry about memory—I never had to worry about writing MPI code to distribute memory across nodes; it’s like it was infinite–with a quick script I could grab all locations that mentioned “Bin Laden” since he first started to appear in the news around 10 years ago, and map it over time or in different ways. It boiled down to writing easy Perl scripts, running in a matter of minutes—if I didn’t have all that memory it would have taken weeks or months with each iteration so one benefit is that leveraging that much hardware allows you to do simple things.”

Leetaru says that even as an undergraduate at NCSA working with some of the first iterations of web-scale web mining, he has been fascinated with the possibilities of deep analytics. While his goal with the Culturomics 2.0 project was to forecast large-scale human behavior using global news media tone in time and space but along the way he stumbled upon a few other unexpected findings, including the fact that indeed, the news is becoming “more negative” in terms of general tone and also that the United States tends to favor itself in its own news filings.

In this era of deep analytics that harness real-time news and sentiment, the Foundation series from Isaac Asimov is never far from the mind. For those who haven’t read the books, in a very small nutshell, mathematical formulas allow civilization to predict the future course of history…and madness ensues.

All arguments about potential for chaos or leaps forward for civilization aside, advances in analytics and high-performance computing like those produced on the Nautilus supercomputer have brought this series of classic science fiction tales into the realm of possibility.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

HPC Startup Advances Auto-Parallelization’s Promise

January 23, 2017

The shift from single core to multicore hardware has made finding parallelism in codes more important than ever, but that hasn’t made the task of parallel programming any easier. Read more…

By Tiffany Trader

Answered Prayers for High Frequency Traders? Latency Cut to 20 Nanoseconds

January 23, 2017

“You can buy your way out of bandwidth problems. But latency is divine.”

This sentiment, from Intel Technical Computing Group CTO Mark Seager, seems as old as the Bible, a truth universally acknowledged. Read more…

By Doug Black

CMU’s Latest “Card Shark” – Libratus – is Beating the Poker Pros (Again)

January 20, 2017

It’s starting to look like Carnegie Mellon University has a gambling problem – can’t stay away from the poker table. Read more…

By John Russell

IDG to Be Bought by Chinese Investors; IDC to Spin Out HPC Group

January 19, 2017

US-based publishing and investment firm International Data Group, Inc. (IDG) will be acquired by a pair of Chinese investors, China Oceanwide Holdings Group Co., Ltd. Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

Enhancing Patient Care with Next-Generation Sequencing

In the ever-evolving world of life sciences, speed, accuracy, and savings are more important than ever. Today’s scientists and healthcare professionals are leveraging high-performance computing (HPC) solutions to solve the world’s greatest health problems and accelerate the diagnoses and treatment of a variety of medical conditions. Read more…

Weekly Twitter Roundup (Jan. 19, 2017)

January 19, 2017

Here at HPCwire, we aim to keep the HPC community apprised of the most relevant and interesting news items that get tweeted throughout the week. Read more…

By Thomas Ayres

France’s CEA and Japan’s RIKEN to Partner on ARM and Exascale

January 19, 2017

France’s CEA and Japan’s RIKEN institute announced a multi-faceted five-year collaboration to advance HPC generally and prepare for exascale computing. Among the particulars are efforts to: build out the ARM ecosystem; work on code development and code sharing on the existing and future platforms; share expertise in specific application areas (material and seismic sciences for example); improve techniques for using numerical simulation with big data; and expand HPC workforce training. It seems to be a very full agenda. Read more…

By Nishi Katsuya and John Russell

ARM Waving: Attention, Deployments, and Development

January 18, 2017

It’s been a heady two weeks for the ARM HPC advocacy camp. At this week’s Mont-Blanc Project meeting held at the Barcelona Supercomputer Center, Cray announced plans to build an ARM-based supercomputer in the U.K. while Mont-Blanc selected Cavium’s ThunderX2 ARM chip for its third phase of development. Last week, France’s CEA and Japan’s Riken announced a deep collaboration aimed largely at fostering the ARM ecosystem. This activity follows a busy 2016 when SoftBank acquired ARM, OpenHPC announced ARM support, ARM released its SVE spec, Fujistu chose ARM for the post K machine, and ARM acquired HPC tool provider Allinea in December. Read more…

By John Russell

Women Coders from Russia, Italy, and Poland Top Study

January 17, 2017

According to a study posted on HackerRank today the best women coders as judged by performance on HackerRank challenges come from Russia, Italy, and Poland. Read more…

By John Russell

HPC Startup Advances Auto-Parallelization’s Promise

January 23, 2017

The shift from single core to multicore hardware has made finding parallelism in codes more important than ever, but that hasn’t made the task of parallel programming any easier. Read more…

By Tiffany Trader

Answered Prayers for High Frequency Traders? Latency Cut to 20 Nanoseconds

January 23, 2017

“You can buy your way out of bandwidth problems. But latency is divine.”

This sentiment, from Intel Technical Computing Group CTO Mark Seager, seems as old as the Bible, a truth universally acknowledged. Read more…

By Doug Black

IDG to Be Bought by Chinese Investors; IDC to Spin Out HPC Group

January 19, 2017

US-based publishing and investment firm International Data Group, Inc. (IDG) will be acquired by a pair of Chinese investors, China Oceanwide Holdings Group Co., Ltd. Read more…

By Tiffany Trader

France’s CEA and Japan’s RIKEN to Partner on ARM and Exascale

January 19, 2017

France’s CEA and Japan’s RIKEN institute announced a multi-faceted five-year collaboration to advance HPC generally and prepare for exascale computing. Among the particulars are efforts to: build out the ARM ecosystem; work on code development and code sharing on the existing and future platforms; share expertise in specific application areas (material and seismic sciences for example); improve techniques for using numerical simulation with big data; and expand HPC workforce training. It seems to be a very full agenda. Read more…

By Nishi Katsuya and John Russell

ARM Waving: Attention, Deployments, and Development

January 18, 2017

It’s been a heady two weeks for the ARM HPC advocacy camp. At this week’s Mont-Blanc Project meeting held at the Barcelona Supercomputer Center, Cray announced plans to build an ARM-based supercomputer in the U.K. while Mont-Blanc selected Cavium’s ThunderX2 ARM chip for its third phase of development. Last week, France’s CEA and Japan’s Riken announced a deep collaboration aimed largely at fostering the ARM ecosystem. This activity follows a busy 2016 when SoftBank acquired ARM, OpenHPC announced ARM support, ARM released its SVE spec, Fujistu chose ARM for the post K machine, and ARM acquired HPC tool provider Allinea in December. Read more…

By John Russell

Spurred by Global Ambitions, Inspur in Joint HPC Deal with DDN

January 17, 2017

Inspur, the fast-growth cloud computing and server vendor from China that has several systems on the current Top500 list, and DDN, a leader in high-end storage, have announced a joint sales and marketing agreement to produce solutions based on DDN storage platforms integrated with servers, networking, software and services from Inspur. Read more…

By Doug Black

For IBM/OpenPOWER: Success in 2017 = (Volume) Sales

January 11, 2017

To a large degree IBM and the OpenPOWER Foundation have done what they said they would – assembling a substantial and growing ecosystem and bringing Power-based products to market, all in about three years. Read more…

By John Russell

UberCloud Cites Progress in HPC Cloud Computing

January 10, 2017

200 HPC cloud experiments, 80 case studies, and a ton of hands-on experience gained, that’s the harvest of four years of UberCloud HPC Experiments. Read more…

By Wolfgang Gentzsch and Burak Yenier

AWS Beats Azure to K80 General Availability

September 30, 2016

Amazon Web Services has seeded its cloud with Nvidia Tesla K80 GPUs to meet the growing demand for accelerated computing across an increasingly-diverse range of workloads. The P2 instance family is a welcome addition for compute- and data-focused users who were growing frustrated with the performance limitations of Amazon's G2 instances, which are backed by three-year-old Nvidia GRID K520 graphics cards. Read more…

By Tiffany Trader

For IBM/OpenPOWER: Success in 2017 = (Volume) Sales

January 11, 2017

To a large degree IBM and the OpenPOWER Foundation have done what they said they would – assembling a substantial and growing ecosystem and bringing Power-based products to market, all in about three years. Read more…

By John Russell

US, China Vie for Supercomputing Supremacy

November 14, 2016

The 48th edition of the TOP500 list is fresh off the presses and while there is no new number one system, as previously teased by China, there are a number of notable entrants from the US and around the world and significant trends to report on. Read more…

By Tiffany Trader

Container App ‘Singularity’ Eases Scientific Computing

October 20, 2016

HPC container platform Singularity is just six months out from its 1.0 release but already is making inroads across the HPC research landscape. It's in use at Lawrence Berkeley National Laboratory (LBNL), where Singularity founder Gregory Kurtzer has worked in the High Performance Computing Services (HPCS) group for 16 years. Read more…

By Tiffany Trader

Dell EMC Engineers Strategy to Democratize HPC

September 29, 2016

The freshly minted Dell EMC division of Dell Technologies is on a mission to take HPC mainstream with a strategy that hinges on engineered solutions, beginning with a focus on three industry verticals: manufacturing, research and life sciences. "Unlike traditional HPC where everybody bought parts, assembled parts and ran the workloads and did iterative engineering, we want folks to focus on time to innovation and let us worry about the infrastructure," said Jim Ganthier, senior vice president, validated solutions organization at Dell EMC Converged Platforms Solution Division. Read more…

By Tiffany Trader

Lighting up Aurora: Behind the Scenes at the Creation of the DOE’s Upcoming 200 Petaflops Supercomputer

December 1, 2016

In April 2015, U.S. Department of Energy Undersecretary Franklin Orr announced that Intel would be the prime contractor for Aurora: Read more…

By Jan Rowell

D-Wave SC16 Update: What’s Bo Ewald Saying These Days

November 18, 2016

Tucked in a back section of the SC16 exhibit hall, quantum computing pioneer D-Wave has been talking up its new 2000-qubit processor announced in September. Forget for a moment the criticism sometimes aimed at D-Wave. This small Canadian company has sold several machines including, for example, ones to Lockheed and NASA, and has worked with Google on mapping machine learning problems to quantum computing. In July Los Alamos National Laboratory took possession of a 1000-quibit D-Wave 2X system that LANL ordered a year ago around the time of SC15. Read more…

By John Russell

Enlisting Deep Learning in the War on Cancer

December 7, 2016

Sometime in Q2 2017 the first ‘results’ of the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) will become publicly available according to Rick Stevens. He leads one of three JDACS4C pilot projects pressing deep learning (DL) into service in the War on Cancer. Read more…

By John Russell

Leading Solution Providers

CPU Benchmarking: Haswell Versus POWER8

June 2, 2015

With OpenPOWER activity ramping up and IBM’s prominent role in the upcoming DOE machines Summit and Sierra, it’s a good time to look at how the IBM POWER CPU stacks up against the x86 Xeon Haswell CPU from Intel. Read more…

By Tiffany Trader

Nvidia Sees Bright Future for AI Supercomputing

November 23, 2016

Graphics chipmaker Nvidia made a strong showing at SC16 in Salt Lake City last week. Read more…

By Tiffany Trader

Vectors: How the Old Became New Again in Supercomputing

September 26, 2016

Vector instructions, once a powerful performance innovation of supercomputing in the 1970s and 1980s became an obsolete technology in the 1990s. But like the mythical phoenix bird, vector instructions have arisen from the ashes. Here is the history of a technology that went from new to old then back to new. Read more…

By Lynd Stringer

Beyond von Neumann, Neuromorphic Computing Steadily Advances

March 21, 2016

Neuromorphic computing – brain inspired computing – has long been a tantalizing goal. The human brain does with around 20 watts what supercomputers do with megawatts. And power consumption isn’t the only difference. Fundamentally, brains ‘think differently’ than the von Neumann architecture-based computers. While neuromorphic computing progress has been intriguing, it has still not proven very practical. Read more…

By John Russell

BioTeam’s Berman Charts 2017 HPC Trends in Life Sciences

January 4, 2017

Twenty years ago high performance computing was nearly absent from life sciences. Today it’s used throughout life sciences and biomedical research. Genomics and the data deluge from modern lab instruments are the main drivers, but so is the longer-term desire to perform predictive simulation in support of Precision Medicine (PM). There’s even a specialized life sciences supercomputer, ‘Anton’ from D.E. Shaw Research, and the Pittsburgh Supercomputing Center is standing up its second Anton 2 and actively soliciting project proposals. There’s a lot going on. Read more…

By John Russell

Dell Knights Landing Machine Sets New STAC Records

November 2, 2016

The Securities Technology Analysis Center, commonly known as STAC, has released a new report characterizing the performance of the Knight Landing-based Dell PowerEdge C6320p server on the STAC-A2 benchmarking suite, widely used by the financial services industry to test and evaluate computing platforms. The Dell machine has set new records for both the baseline Greeks benchmark and the large Greeks benchmark. Read more…

By Tiffany Trader

The Exascale Computing Project Awards $39.8M to 22 Projects

September 7, 2016

The Department of Energy’s Exascale Computing Project (ECP) hit an important milestone today with the announcement of its first round of funding, moving the nation closer to its goal of reaching capable exascale computing by 2023. Read more…

By Tiffany Trader

What Knights Landing Is Not

June 18, 2016

As we get ready to launch the newest member of the Intel Xeon Phi family, code named Knights Landing, it is natural that there be some questions and potentially some confusion. Read more…

By James Reinders, Intel

  • arrow
  • Click Here for More Headlines
  • arrow
Share This