In the fourth of a four-part installation, Jay Etchings, director of operations for research computing and senior HPC architect at Arizona State University, traces the drivers that have led to the emergence of Research-as-a-Service as a viable model.
As far back as the launch to the cloud with applications like Salesforce in 1999 and Amazon Web Services in 2002 the notion of health informatics applications and data in the cloud have existed. Early attempts at health informatics in the cloud or as-a-service were challenged by security and privacy issues that truly prevented their adoption.
As a former Medicaid-Medicare recovery audit contractor, the audits were nothing short of brutal and founded on technologies a decade old. At the time just the thought of a virtual local network (VLAN) or a storage area network (SAN) where the same physical devices were shared broke policy.
Many technical healthcare professionals have long recognized cloud computing benefits in terms of elasticity, flexibility and potential cost reduction in both capital and operational outlay. Considerations such as data security, patient privacy, network performance and economics have led to a hybrid cloud model adopted even unknowingly by many large scale operations.
Research and healthcare applications will make natural migration to the cloud model as the cloud model matures and becomes more pervasively available through Web front ends and the market drives data intensive analytics processing, and makes the leap to support multiple devices in the new Internet of Things (IOT) world that is now upon us.
Leveraging the XaaS model, Internet2 Innovation Platform, workload deterministic provisioning and open big data; Research-as-a-Service (RaaS) now has the tools to overcome challenges of the past.
A significant challenge throughout the history of research compute has been what is known as the Tragedy of Reproducibility in Science; or more aptly named the “Tragedy of the Irreproducibility in Science.” With the advent of the Internet, research scientists have benefited from new avenues for potential peer reviews. One of the stumbling blocks RaaS aims to overcome has more to do with the physical components in the experiment rather than the science. Countless hours are spent replicating an experimental environment so that a specific outcome could be validated. With a lifecycle ranging from months to a couple years the rapid changes in technology manufacture additional unplanned obstacles.
RaaS at its core is a collection of containerized components that perform a research task. Then as a validated, saved, archived project, the software layer can be shared to for peer review or more pragmatically utilized within a pipeline to meet a larger holistically defined goal. With discrepancies around operating system, kernel version, libraries, network and other configurations removed, the debates can focus on the outcomes of the science.
Hadoop is a fundamental component in the offering in both physical and virtual instantiations with graphical management layers available to researchers. Hadoop delivered in this manner allows for elasticity of the compute component of the Hadoop cluster (since it is decoupled from the storage) and we support multi-tenant access to the underlying HDFS file system, which is owned and managed by the greater data nodes.
The service also includes access to GPUs for computation – called GPGPU (General Purpose GPU) and enterprise class GPU in physical and virtualized workloads. The overall demand may dictate need for popular high-end GPU cards. For researchers interested in Xeon Phis, we have a single test bed in near line production. Intel Xeon Phi is currently known NOT to work with ESX in pass-through (VM Direct Path I/O) mode. Work is ongoing in this area and preliminary results look very good. No date has been set for when the required platform changes will appear in shipping product.
Web Application Federated Access Portals (WFP)
Federated application services coupled with single sign capabilities is a much sought after component to any business in the cloud services application space. A one-time “nice to have” has become a requirement for access controls, compliance and ease of use.
Internet2 supplies SHIBBOLETH access packages for single sign on preserving both privacy (Identity Management) and security (Asset Management). Shibboleth is a standards based, open source software package for Web single sign-on across or within organizational boundaries. It allows sites to make informed authorization decisions for individual access of protected online resources in a privacy-preserving manner. The logical workflow model is depicted in the graphic below:
The NGCC Web portal supplies cloud-based access to the complete portfolio of tools and recipes for researchers to quickly access, build, deploy and reset if desired. Web service relies on some of the same underlying HTTP and Web-based architecture as common Web applications, and it is susceptible to similar threats and vulnerabilities. Web services security is based on several important concepts, including:
Identification and Authentication: Verifying the identity of a user, process, or device, often as a prerequisite to allowing access to resources in an information system using the ASURITE identification process.
Authorization: The permission to use a computer resource, granted, directly or indirectly, by an application or system owner.
Integrity: The property that data has not been altered in an unauthorized manner while in storage, during processing, or in transit.
Non-repudiation: Assurance that the sender of information is provided with proof of delivery and the recipient is provided with proof of the sender’s identity, so neither can later deny having processed the information.
(The above definitions are taken from NIST IR 7298, Glossary of Key Information Security Terms and NIST SP 800-100.)
Parts one, two and three of this series are available here, here and here.
Director of Operations, Research Computing, and Senior HPC Architect at Arizona State University, Jay Etchings is a well-known industry professional with 20 years of progressively versatile, cross-platform experience in management of open systems architecture. With the bulk of a 10 year technical consulting career spent in gaming and connected lotteries, data relationship analysis has been a longtime passion for Etchings. He is well versed in all phases of cutting edge analytics and research computing. A former recovery audit contractor for the centers for Medicaid/ Medicare (CMS-RAC) positions him in alignment with the new ‘precision medicine’ healthcare field that is currently emerging.
Additional contribution provided by…
Dr. Kenneth Buetow also contributed to this article series. Buetow serves as director of Computational Sciences and Informatics program for Complex Adaptive Systems at Arizona State University (CAS@ASU) and is a professor in the School of Life Sciences in ASU’s College of Liberal Arts and Sciences. CAS@ASU is creating a Next Generation Cyber Capability (NGCC) to address the challenges and opportunities afforded by “Big Data” and the emergence of 4th Paradigm Data Science. This capability brings state-of-the-art computational approaches to CAS@ASU’s trans-disciplinary, use-inspired research efforts.