Will Web-based solutions comprise the future of academia? The benefits of a unified, sharable, scalable dataset make for a compelling case. A recent article from Mark Hahnel, the founder of Figshare, notes that the cloud “is becoming more and more attractive as a place where all products of academia should live.”
The cloud, with its fast load times, scalability, automated application deployment, multiple back-ups and constantly updated hardware, means that institutions need not create their own server centres with associated running costs and rapid dating of technology. We are moving beyond the merely ‘making academic data openly available’ phase, to one where we can derive new insight from larger data sources. At this stage, the ability of any academic developer to access the processing power of thousands of servers at the click of a button also demonstrates the inherent power of scale that commercial cloud services can provide.
The trend is not limited to academia, either. Analyst house IDC just came out with a report that showed the cloud use for firms operating high performance computer (HPC) systems has nearly doubled in two years, rising from 13.8 percent in 2011 up to 23.5 percent in 2013.
With scientific data at governments and academic institutions growing by leaps and bounds, the problem becomes how to store, curate and disseminate all this output. The cloud acts as a central hub with many spokes to enable an efficient and effective workflow. There is huge potential for the cloud to become this central repository, however remarks Hahnel, there is still a need for technology to make this a complete solution. The main restriction isn’t the storage space, but bandwidth limitations. The so-called consumer Internet is far too slow for “research cloud” status. Most HPC hubs, however, enjoy far faster connections and even their own high-speed backbone, such as Internet2.
When it comes to cloud providers’ positioning with the science space, Hahnel points out that Microsoft Research has actively recruited research groups onto its Azure platform, while AWS hosts publicly available datasets, including genomics data, at no charge. The providers hope that by establishing strong relationships with select groups, they will become paying customers at some point.
Going forward, Hahnel would like to see “specific web-based apps being developed and applied over these large, open data sets, or even multiple data sets being pulled in via APIs from different persistent locations.” This approach would relieve the community of the inefficiency of having numerous redundant copies stored in siloed institutions around the world.
“The potential that linked open data has to revolutionise the efficiency of drug discovery and academic progress in general, cannot be underestimated,” Hahnel remarks. “The real remaining question is whose responsibility is it to build these browser tools and apps?”