Considerations for the Cryptographic Cloud

By Seny Kamara and Kristin Lauter -- Microsoft Research Cryptography Group

March 11, 2011

With the prospect of increasing amounts of data being collected by a proliferation of internet –connected devices and the task of organizing, storing, and accessing such data looming, we face the challenge of how to leverage the power of the cloud running in our data centers to make information accessible in a secure and privacy-preserving manner.  For many scenarios, in other words, we would like to have a public cloud which we can trust with our private data, and yet we would like to have that data still be accessible to us in an organized and useful way.

One approach to this problem is to envision a world in which all data is preprocessed by a client device before being uploaded to the cloud; the preprocessing signs and encrypts the data in such a way that its functionality is preserved, allowing, for example, for the cloud to search or compute over the encrypted data and to prove its integrity to the client (without the client having to download it). We refer to this type of solution as Cryptographic Cloud Storage.  

Cryptographic cloud storage is achievable with current technologies and can help bootstrap trust in public clouds.  It can also form the foundation for future cryptographic cloud solutions where an increasing amount of computation on encrypted data is possible and efficient.  We will explain cryptographic cloud storage and what role it might play as cloud becomes a more dominant force.

Applications of the Cryptographic Cloud

Storage services based on public clouds such as Microsoft’s Azure storage service and Amazon’s S3 provide customers with scalable and dynamic storage. By moving their data to the cloud customers can avoid the costs of building and maintaining a private storage infrastructure, opting instead to pay a service provider as a function of its needs. For most customers, this provides several benefits including availability (i.e., being able to access data from anywhere) and reliability (i.e., not having to worry about backups) at a relatively low cost.  While the benefits of using a public cloud infrastructure are clear, it introduces significant security and privacy risks. In fact, it seems that the biggest hurdle to the adoption of cloud storage (and cloud computing in general) is concern over the confidentiality and integrity of data. 

While, so far, consumers have been willing to trade privacy for the convenience of software services (e.g., for web-based email, calendars, pictures etc…), this is not the case for enterprises and government organizations. This reluctance can be attributed to several factors that range from a desire to protect mission-critical data to regulatory obligations to preserve the confidentiality and integrity of data. The latter can occur when the customer is responsible for keeping personally identifiable information (PII), or medical and financial records. So while cloud storage has enormous promise, unless the issues of confidentiality and integrity are addressed many potential customers will be reluctant to make the move.

In addition to simple storage, many enterprises will have a need for some associated services.  These services can include any number of business processes including sharing of data among trusted partners, litigation support, monitoring and compliance, back-up, archive and audit logs.   A cryptographic storage service can be endowed with some subset of these services to provide value to enterprises, for example in complying with government regulations for handling of sensitive data, geographic considerations relating to data provenance,  to help mitigate the cost of security breaches, lower the cost of electronic discovery for litigation support, or alleviate the burden of complying with subpoenas.

For example, a specific type of data which is especially sensitive is personal medical data.  The recent move towards electronic health records promises to reduce medical errors, save lives and decrease the cost of healthcare. Given the importance and sensitivity of health-related data, it is clear that any cloud storage platform for health records will need to provide strong confidentiality and integrity guarantees to patients and care givers, which can be enabled with cryptographic cloud storage. 

Another arena where a cryptographic cloud storage system could be useful is interactive scientific publishing. As scientists continue to produce large data sets which have broad value for the scientific community, demand will increase for a storage infrastructure to make such data accessible and sharable.  To incent scientists to share their data, scientific could establish a publication forum for data sets in partnership with hosted data centers.  Such an interactive publication forum would need to provide strong guarantees to authors on how their data sets may be accessed and used by others, and could be built on a cryptographic cloud storage system. 

Cryptographic Cloud Storage

The core properties of a cryptographic storage service are that control of the data is maintained by the customer and the security properties are derived from cryptography, as opposed to legal mechanisms, physical security, or access control.   A cryptographic cloud service should guarantee confidentiality and integrity of the data while maintaining the availability, reliability, and efficient retrieval of the data and allowing for flexible policies of data sharing.

A cryptographic storage service can be built from three main components: a data processor (DP), that processes data before it is sent to the cloud; a data verifier (DV), that checks whether the data in the cloud has been tampered with; and a token generator (TG), that generates tokens which enable the cloud storage provider to retrieve segments of customer data.  We describe designs for both consumer and enterprise scenarios.

A Consumer Architecture

Typical consumer scenarios include hosted email services or content storage or back-up.  Consider three parties: a user Alice that stores her data in the cloud; a user Bob with whom Alice wants to share data; and a cloud storage provider that stores Alice’s data. To use the service, Alice and Bob begin by downloading a client application that consists of a data processor, a data verifier and a token generator. Upon its first execution, Alice’s application generates a cryptographic key. We will refer to this key as a master key and assume it is stored locally on Alice’s system and that it is kept secret from the cloud storage provider.

Whenever Alice wishes to upload data to the cloud, the data processor attaches some metadata (e.g., current time, size, keywords etc…) and encrypts and encodes the data and metadata with a variety of cryptographic primitives. Whenever Alice wants to verify the integrity of her data, the data verifier is invoked. The latter uses Alice’s master key to interact with the cloud storage provider and ascertain the integrity of the data. When Alice wants to retrieve data (e.g., all files tagged with keyword “urgent”) the token generator is invoked to create a token and a decryption key. The token is sent to the cloud storage provider who uses it to retrieve the appropriate (encrypted) files which it returns to Alice. Alice then uses the decryption key to decrypt the files.

Whenever Alice wishes to share data with Bob, the token generator is invoked to create a token and a decryption key which are both sent to Bob. He then sends the token to the provider who uses it to retrieve and return the appropriate encrypted documents. Bob then uses the decryption key to recover the files. This process is illustrated in Figure 1. 

Figure 1: (1) Alice’s data processor prepares the data before sending it to the cloud; (2) Bob asks Alice for permission to search for a keyword; (3) Alice’s token generator sends a token for the keyword and a decryption key back to Bob; (4) Bob sends the token to the cloud; (5) the cloud uses the token to find the appropriate encrypted documents and returns them to Bob. At any point in time, Alice’s data verifier can verify the integrity of the data.

An Enterprise Architecture

In the enterprise scenario we consider an enterprise MegaCorp that stores its data in the cloud; a business partner PartnerCorp with whom MegaCorp wants to share data; and a cloud storage provider that stores MegaCorp’s data. To handle enterprise customers, we introduce an extra component: a credential generator. The credential generator implements an access control policy by issuing credentials to parties inside and outside MegaCorp.

To use the service, MegaCorp deploys dedicated machines within its network to run components which make use of a master secret key, so it is important that they be adequately protected. The dedicated machines include a data processor, a data verifier, a token generator and a credential generator. To begin, each MegaCorp and PartnerCorp employee receives a credential from the credential generator. These credentials reflect some relevant information about the employees such as their organization or team or role.  

Figure 2: (1) Each MegaCorp and PartnerCorp employee receives a credential; (2) MegaCorp employees send their data to the dedicated machine; (3) the latter processes the data using the data processor before sending it to the cloud; (4) the PartnerCorp employee sends a keyword to MegaCorp’s dedicated machine ; (5) the dedicated machine returns a token; (6) the PartnerCorp employee sends the token to the cloud; (7) the cloud uses the token to find the appropriate encrypted documents and returns them to the employee. At any point in time, MegaCorp’s data verifier can verify the integrity of MegaCorp’s data.

generates data that needs to be stored in the cloud, it sends the data together with an associated decryption policy to the dedicated machine for processing. The decryption policy specifies the type of credentials necessary to decrypt the data (e.g., only members of a particular team). To retrieve data from the cloud (e.g., all files generated by a particular employee), an employee requests an appropriate token from the dedicated machine. The employee then sends the token to the cloud provider who uses it to find and return the appropriate encrypted files which the employee decrypts using his credentials.  

If a PartnerCorp employee needs access to MegaCorp’s data, the employee authenticates itself to MegaCorp’s dedicated machine and sends it a keyword. The latter verifies that the particular search is allowed for this PartnerCorp employee. If so, the dedicated machine returns an appropriate token which the employee uses to recover the appropriate files from the service provider. It then uses its credentials to decrypt the file. This process is illustrated in Figure 2.

Implementing the Core Cryptographic Components

The core components of a cryptographic storage service can be implemented using a variety of techniques, some of which were developed specifically for cloud computing.  When preparing data for storage in the cloud, the data processor begins by indexing it and encrypting it with a symmetric encryption scheme (for example the government approved block cipher AES) under a unique key. It then encrypts the index using a searchable encryption scheme and encrypts the unique key with an attribute-based encryption scheme under an appropriate policy.  Finally, it encodes the encrypted data and index in such a way that the data verifier can later verify their integrity using a proof of storage.

In the following we provide high level descriptions of these new cryptographic primitives. While traditional techniques like encryption and digital signatures could be used to implement the core components, they would do so at considerable cost in communication and computation. To see why, consider the example of an organization that encrypts and signs its data before storing it in the cloud. While this clearly preserves confidentiality and integrity it has the following limitations.

To enable searching over the data, the customer has to either store an index locally, or download all the (encrypted) data, decrypt it and search locally. The first approach obviously negates the benefits of cloud storage (since indexes can grow large) while the second scales poorly.   With respect to integrity, note that the organization would have to retrieve all the data first in order to verify the signatures. If the data is large, this verification procedure is obviously undesirable. Various solutions based on (keyed) hash functions could also be used, but all such approaches only allow a fixed number of verifications.

Searchable Encryption

A searchable encryption scheme provides a way to encrypt a search index so that its contents are hidden except to a party that is given appropriate tokens. More precisely, consider a search index generated over a collection of files (this could be a full-text index or just a keyword index). Using a searchable encryption scheme, the index is encrypted in such a way that (1) given a token for a keyword one can retrieve pointers to the encrypted files that contain the keyword; and (2) without a token the contents of the index are hidden. In addition, the tokens can only be generated with knowledge of a secret key and the retrieval procedure reveals nothing about the files or the keywords except that the files contain a keyword in common.

Symmetric searchable encryption (SSE) is appropriate in any setting where the party that searches over the data is also the one who generates it.  The main advantages of SSE are efficiency and security while the main disadvantage is functionality. SSE schemes are efficient both for the party doing the encryption and (in some cases) for the party performing the search. Encryption is efficient because most SSE schemes are based on symmetric primitives like block ciphers and pseudo-random functions. Search can be efficient because the typical usage scenarios for SSE allow the data to be pre-processed and stored in efficient data structures.

Attribute-based Encryption

Another set of cryptographic techniques that has emerged recently allows the specification of a decryption policy to be associated with a ciphertext. More precisely, in a ciphertext-policy attribute-based encryption scheme each user in the system is provided with a decryption key that has a set of attributes associated with it.  A user can then encrypt a message under a public key and a policy.  Decryption will only work if the attributes associated with the decryption key match the policy used to encrypt the message. Attributes are qualities of a party that can be established through relevant credentials such as being an employee of a certain company or living in Washington State. 
Proofs of Storage 

A proof of storage is a protocol executed between a client and a server with which the server can prove to the client that it did not tamper with its data. The client begins by encoding the data before storing it in the cloud. From that point on, whenever it wants to verify the integrity of the data it runs a proof of storage protocol with the server. The main benefits of a proof of storage are that (1) they can be executed an arbitrary number of times; and (2) the amount of information exchanged between the client and the server is extremely small and independent of the size of the data.

Trends and future potential

Extensions to cryptographic cloud storage and services are possible based on current and emerging cryptographic research.  This new work will bear fruit in enlarging the range of operations which can be efficiently performed on encrypted data, enriching the business scenarios which can be enabled through cryptographic cloud storage.

About the Authors

Kristin Lauter is a Principal Researcher and the head of the Cryptography Group at Microsoft Research. She directs the group’s research activities in theoretical and applied cryptography and in the related math fields of number theory and algebraic geometry. Group members publish basic research in prestigious journals and conferences and collaborate with academia through joint publications, and by helping to organize conferences and serve on program committees. The group also works closely with product groups, providing consulting services and technology transfer. The group maintains an active program of post-docs, interns, and visiting scholars. Her personal research interests include algorithmic number theory, elliptic curve cryptography, hash functions, and security protocols.

Seny Kamara is a researcher in the Crypto Group at Microsoft Research in Redmond and completed a Ph.D. in Computer Science at Johns Hopkins University under the supervision of Fabian Monrose. At Hopkins Dr. Kamara was a member of the Security and Privacy Applied Research (SPAR) Lab. Seny Kamara spent the Fall of 2006 at UCLA’s IPAM and the summer of 2003 at CMU’s CyLab. Main research interests are in cryptography and security and recent work has been in cloud cryptography, focusing on the design of new models and techniques to alleviate security and privacy concerns that arise in the context of cloud computing.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

AI-Focused ‘Genius’ Supercomputer Installed at KU Leuven

April 24, 2018

Hewlett Packard Enterprise has deployed a new half-petaflops research supercomputer, named Genius, at Flemish research university KU Leuven. The system is built to run artificial intelligence (AI) workloads and, as part Read more…

By Tiffany Trader

New Exascale System for Earth Simulation Introduced

April 23, 2018

After four years of development, the Energy Exascale Earth System Model (E3SM) will be unveiled today and released to the broader scientific community this month. The E3SM project is supported by the Department of Energy Read more…

By Staff

RSC Reports 500Tflops, Hot Water Cooled System Deployed at JINR

April 18, 2018

RSC, developer of supercomputers and advanced HPC systems based in Russia, today reported deployment of “the world's first 100% ‘hot water’ liquid cooled supercomputer” at Joint Institute for Nuclear Research (JI Read more…

By Staff

HPE Extreme Performance Solutions

Hybrid HPC is Speeding Time to Insight and Revolutionizing Medicine

High performance computing (HPC) is a key driver of success in many verticals today, and health and life science industries are extensively leveraging these capabilities. Read more…

New Device Spots Quantum Particle ‘Fingerprint’

April 18, 2018

Majorana particles have been observed by university researchers employing a device consisting of layers of magnetic insulators on a superconducting material. The advance opens the door to controlling the elusive particle Read more…

By George Leopold

AI-Focused ‘Genius’ Supercomputer Installed at KU Leuven

April 24, 2018

Hewlett Packard Enterprise has deployed a new half-petaflops research supercomputer, named Genius, at Flemish research university KU Leuven. The system is built Read more…

By Tiffany Trader

Cray Rolls Out AMD-Based CS500; More to Follow?

April 18, 2018

Cray was the latest OEM to bring AMD back into the fold with introduction today of a CS500 option based on AMD’s Epyc processor line. The move follows Cray’ Read more…

By John Russell

IBM: Software Ecosystem for OpenPOWER is Ready for Prime Time

April 16, 2018

With key pieces of the IBM/OpenPOWER versus Intel/x86 gambit settling into place – e.g., the arrival of Power9 chips and Power9-based systems, hyperscaler sup Read more…

By John Russell

US Plans $1.8 Billion Spend on DOE Exascale Supercomputing

April 11, 2018

On Monday, the United States Department of Energy announced its intention to procure up to three exascale supercomputers at a cost of up to $1.8 billion with th Read more…

By Tiffany Trader

Cloud-Readiness and Looking Beyond Application Scaling

April 11, 2018

There are two aspects to consider when determining if an application is suitable for running in the cloud. The first, which we will discuss here under the title Read more…

By Chris Downing

Transitioning from Big Data to Discovery: Data Management as a Keystone Analytics Strategy

April 9, 2018

The past 10-15 years has seen a stark rise in the density, size, and diversity of scientific data being generated in every scientific discipline in the world. Key among the sciences has been the explosion of laboratory technologies that generate large amounts of data in life-sciences and healthcare research. Large amounts of data are now being stored in very large storage name spaces, with little to no organization and a general unease about how to approach analyzing it. Read more…

By Ari Berman, BioTeam, Inc.

IBM Expands Quantum Computing Network

April 5, 2018

IBM is positioning itself as a first mover in establishing the era of commercial quantum computing. The company believes in order for quantum to work, taming qu Read more…

By Tiffany Trader

FY18 Budget & CORAL-2 – Exascale USA Continues to Move Ahead

April 2, 2018

It was not pretty. However, despite some twists and turns, the federal government’s Fiscal Year 2018 (FY18) budget is complete and ended with some very positi Read more…

By Alex R. Larzelere

Inventor Claims to Have Solved Floating Point Error Problem

January 17, 2018

"The decades-old floating point error problem has been solved," proclaims a press release from inventor Alan Jorgensen. The computer scientist has filed for and Read more…

By Tiffany Trader

Researchers Measure Impact of ‘Meltdown’ and ‘Spectre’ Patches on HPC Workloads

January 17, 2018

Computer scientists from the Center for Computational Research, State University of New York (SUNY), University at Buffalo have examined the effect of Meltdown Read more…

By Tiffany Trader

Russian Nuclear Engineers Caught Cryptomining on Lab Supercomputer

February 12, 2018

Nuclear scientists working at the All-Russian Research Institute of Experimental Physics (RFNC-VNIIEF) have been arrested for using lab supercomputing resources to mine crypto-currency, according to a report in Russia’s Interfax News Agency. Read more…

By Tiffany Trader

How the Cloud Is Falling Short for HPC

March 15, 2018

The last couple of years have seen cloud computing gradually build some legitimacy within the HPC world, but still the HPC industry lies far behind enterprise I Read more…

By Chris Downing

Chip Flaws ‘Meltdown’ and ‘Spectre’ Loom Large

January 4, 2018

The HPC and wider tech community have been abuzz this week over the discovery of critical design flaws that impact virtually all contemporary microprocessors. T Read more…

By Tiffany Trader

How Meltdown and Spectre Patches Will Affect HPC Workloads

January 10, 2018

There have been claims that the fixes for the Meltdown and Spectre security vulnerabilities, named the KPTI (aka KAISER) patches, are going to affect applicatio Read more…

By Rosemary Francis

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Fast Forward: Five HPC Predictions for 2018

December 21, 2017

What’s on your list of high (and low) lights for 2017? Volta 100’s arrival on the heels of the P100? Appearance, albeit late in the year, of IBM’s Power9? Read more…

By John Russell

Leading Solution Providers

Deep Learning at 15 PFlops Enables Training for Extreme Weather Identification at Scale

March 19, 2018

Petaflop per second deep learning training performance on the NERSC (National Energy Research Scientific Computing Center) Cori supercomputer has given climate Read more…

By Rob Farber

Lenovo Unveils Warm Water Cooled ThinkSystem SD650 in Rampup to LRZ Install

February 22, 2018

This week Lenovo took the wraps off the ThinkSystem SD650 high-density server with third-generation direct water cooling technology developed in tandem with par Read more…

By Tiffany Trader

HPC and AI – Two Communities Same Future

January 25, 2018

According to Al Gara (Intel Fellow, Data Center Group), high performance computing and artificial intelligence will increasingly intertwine as we transition to Read more…

By Rob Farber

AI Cloud Competition Heats Up: Google’s TPUs, Amazon Building AI Chip

February 12, 2018

Competition in the white hot AI (and public cloud) market pits Google against Amazon this week, with Google offering AI hardware on its cloud platform intended Read more…

By Doug Black

New Blueprint for Converging HPC, Big Data

January 18, 2018

After five annual workshops on Big Data and Extreme-Scale Computing (BDEC), a group of international HPC heavyweights including Jack Dongarra (University of Te Read more…

By John Russell

US Plans $1.8 Billion Spend on DOE Exascale Supercomputing

April 11, 2018

On Monday, the United States Department of Energy announced its intention to procure up to three exascale supercomputers at a cost of up to $1.8 billion with th Read more…

By Tiffany Trader

Momentum Builds for US Exascale

January 9, 2018

2018 looks to be a great year for the U.S. exascale program. The last several months of 2017 revealed a number of important developments that help put the U.S. Read more…

By Alex R. Larzelere

Google Chases Quantum Supremacy with 72-Qubit Processor

March 7, 2018

Google pulled ahead of the pack this week in the race toward "quantum supremacy," with the introduction of a new 72-qubit quantum processor called Bristlecone. Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This