Visit additional Tabor Communication Publications
January 13, 2010
Jan. 13 -- In 2008, the e-IRG decided to launch a task force to investigate the numerous European activities related to the management of scientific data, and to contribute to the definition of common and shared policies in this field. After months of intensive discussion and work, the task force released its final report investigating the issue in a comprehensive way, and setting a few recommendations. The final report and the recommendations were jointly endorsed by the e-IRG, on Nov. 30, 2009, and by ESFRI, on Dec. 11, 2009.
A fundamental paradigm shift known as Data Intensive Science is changing the way science and research in most disciplines is being conducted. The new paradigm that is emerging is based on access and analysis of large amounts of new and existing data. These challenges have been thoroughly discussed during the 2007-2008 e-IRG workshops, and particular focus has been put on data initiatives linked with the new research infrastructures identified by the European Strategy Forum on Research Infrastructures (ESFRI), created in 2002 by the European Council of Ministers. The e-IRG delegates recognised the importance of data management for the future of research infrastructures and, as a result, established in 2008 the e-IRG Data Management Task Force (DMTF) that also received recognition and support from ESFRI. To conduct its work, the DMTF has been organized in three sub-task forces, each focusing on a special task.
Existing Data Management Initiatives
The first group conducted a survey describing the landscape of the existing projects and initiatives related to data management. This survey analysed the opportunities, synergies and gaps presented by these initiatives and their potential impact. The survey is divided into three main fields of science: arts and humanities -- social sciences; health sciences; and natural sciences and engineering. The analysis of 18 social sciences, 12 health sciences and 33 natural sciences and engineering initiatives gives a global view of the existing data initiatives in Europe.
Metadata and quality
The second group covered the basic principles and requirements for metadata descriptions and the quality of the resources to be stored in accessible repositories. The principles and requirements specified are considered to be baselines for all research infrastructures, and as such are independent of the scientific research field. Key findings of this part of the report focus on metadata flexibility to allow for the addition of new elements, for using different types of selections and for the possibility of using elements of different sets and re-using existing elements/sets. Metadata topics such as usage, scope, provenance, persistence, aggregation, standardisation, interoperability, quality, earliness and availability are discussed in detail. Quality of data resources is also covered in the context of sharing data and quality assurance, assessing the quality of research data, and data consumers.
Interoperability issues in data management
The third group sought to propose guidelines for improving interoperability between various archives and repositories. The issue of interoperability is very important to ensure that scientific data is reachable and useful to other scientific fields, i.e. to enable cross-disciplinary Data Intensive Science. At the moment interoperability-related activities are mainly contained within individual communities but with the advent of e-Science, data interoperability needs to be extended to groups of different communities. In addition to providing details of these opportunities and challenges, this part of the document presents several levels and types of interoperability: resource-level operability, general semantic operability and syntactic versus semantic interoperability. The different layers, ranging from device level to communications, middleware and deployment of resource interoperability are analysed in detail. Semantic interoperability is also discussed in terms of data integration, ontology support, simplicity, transcoding and metadata, representation information, conceptual modelling, and distributed systems. Some use cases in the medical field, linguistics, e-humanities ecosystem, earth sciences, astronomy and space science, and particle physics are presented and, in each instance, use cases solutions, tendencies and needs are identified and put forward.
This report is obviously not intended as the final word in the area of data management, rather it aims to put together several starting points to encourage future efforts in this domain. The participating authors sincerely hope that interested parties will take on board the findings of this document and craft them into a group of concrete, well-aligned initiatives, which fulfil the promises of Data Intensive Research.
About the e-Infrastructure Reflection Group
The e-Infrastructure Reflection Group is an inter-governmental policy body comprising national delegates from more than 30 European countries. Its work is supported by the e-IRG Support Programme 2 (e-IRGSP2), a project financed by the European Commission’s 7th Framework Programme, which includes seven partner institutions: CSC, NCF, ETL, GRNET, AUEB-RC, Genias BV and PSNC.
ESFRI, the European Strategy Forum on Research Infrastructures, is a strategic instrument to develop the scientific integration of Europe and to strengthen its international outreach. The mission of ESFRI is to support a coherent and strategy-led approach to policy-making on research infrastructures in Europe, and to facilitate multilateral initiatives leading to the better use and development of research infrastructures, at EU and international level. ESFRI's delegates are nominated by the Research Ministers of the Member and Associate Countries, and include a representative of the Commission, working together to develop a joint vision and a common strategy.
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.