Considerations for the Cryptographic Cloud

By Seny Kamara and Kristin Lauter -- Microsoft Research Cryptography Group

March 11, 2011

With the prospect of increasing amounts of data being collected by a proliferation of internet –connected devices and the task of organizing, storing, and accessing such data looming, we face the challenge of how to leverage the power of the cloud running in our data centers to make information accessible in a secure and privacy-preserving manner.  For many scenarios, in other words, we would like to have a public cloud which we can trust with our private data, and yet we would like to have that data still be accessible to us in an organized and useful way.

One approach to this problem is to envision a world in which all data is preprocessed by a client device before being uploaded to the cloud; the preprocessing signs and encrypts the data in such a way that its functionality is preserved, allowing, for example, for the cloud to search or compute over the encrypted data and to prove its integrity to the client (without the client having to download it). We refer to this type of solution as Cryptographic Cloud Storage.  

Cryptographic cloud storage is achievable with current technologies and can help bootstrap trust in public clouds.  It can also form the foundation for future cryptographic cloud solutions where an increasing amount of computation on encrypted data is possible and efficient.  We will explain cryptographic cloud storage and what role it might play as cloud becomes a more dominant force.

Applications of the Cryptographic Cloud

Storage services based on public clouds such as Microsoft’s Azure storage service and Amazon’s S3 provide customers with scalable and dynamic storage. By moving their data to the cloud customers can avoid the costs of building and maintaining a private storage infrastructure, opting instead to pay a service provider as a function of its needs. For most customers, this provides several benefits including availability (i.e., being able to access data from anywhere) and reliability (i.e., not having to worry about backups) at a relatively low cost.  While the benefits of using a public cloud infrastructure are clear, it introduces significant security and privacy risks. In fact, it seems that the biggest hurdle to the adoption of cloud storage (and cloud computing in general) is concern over the confidentiality and integrity of data. 

While, so far, consumers have been willing to trade privacy for the convenience of software services (e.g., for web-based email, calendars, pictures etc…), this is not the case for enterprises and government organizations. This reluctance can be attributed to several factors that range from a desire to protect mission-critical data to regulatory obligations to preserve the confidentiality and integrity of data. The latter can occur when the customer is responsible for keeping personally identifiable information (PII), or medical and financial records. So while cloud storage has enormous promise, unless the issues of confidentiality and integrity are addressed many potential customers will be reluctant to make the move.

In addition to simple storage, many enterprises will have a need for some associated services.  These services can include any number of business processes including sharing of data among trusted partners, litigation support, monitoring and compliance, back-up, archive and audit logs.   A cryptographic storage service can be endowed with some subset of these services to provide value to enterprises, for example in complying with government regulations for handling of sensitive data, geographic considerations relating to data provenance,  to help mitigate the cost of security breaches, lower the cost of electronic discovery for litigation support, or alleviate the burden of complying with subpoenas.

For example, a specific type of data which is especially sensitive is personal medical data.  The recent move towards electronic health records promises to reduce medical errors, save lives and decrease the cost of healthcare. Given the importance and sensitivity of health-related data, it is clear that any cloud storage platform for health records will need to provide strong confidentiality and integrity guarantees to patients and care givers, which can be enabled with cryptographic cloud storage. 

Another arena where a cryptographic cloud storage system could be useful is interactive scientific publishing. As scientists continue to produce large data sets which have broad value for the scientific community, demand will increase for a storage infrastructure to make such data accessible and sharable.  To incent scientists to share their data, scientific could establish a publication forum for data sets in partnership with hosted data centers.  Such an interactive publication forum would need to provide strong guarantees to authors on how their data sets may be accessed and used by others, and could be built on a cryptographic cloud storage system. 

Cryptographic Cloud Storage

The core properties of a cryptographic storage service are that control of the data is maintained by the customer and the security properties are derived from cryptography, as opposed to legal mechanisms, physical security, or access control.   A cryptographic cloud service should guarantee confidentiality and integrity of the data while maintaining the availability, reliability, and efficient retrieval of the data and allowing for flexible policies of data sharing.

A cryptographic storage service can be built from three main components: a data processor (DP), that processes data before it is sent to the cloud; a data verifier (DV), that checks whether the data in the cloud has been tampered with; and a token generator (TG), that generates tokens which enable the cloud storage provider to retrieve segments of customer data.  We describe designs for both consumer and enterprise scenarios.

A Consumer Architecture

Typical consumer scenarios include hosted email services or content storage or back-up.  Consider three parties: a user Alice that stores her data in the cloud; a user Bob with whom Alice wants to share data; and a cloud storage provider that stores Alice’s data. To use the service, Alice and Bob begin by downloading a client application that consists of a data processor, a data verifier and a token generator. Upon its first execution, Alice’s application generates a cryptographic key. We will refer to this key as a master key and assume it is stored locally on Alice’s system and that it is kept secret from the cloud storage provider.

Whenever Alice wishes to upload data to the cloud, the data processor attaches some metadata (e.g., current time, size, keywords etc…) and encrypts and encodes the data and metadata with a variety of cryptographic primitives. Whenever Alice wants to verify the integrity of her data, the data verifier is invoked. The latter uses Alice’s master key to interact with the cloud storage provider and ascertain the integrity of the data. When Alice wants to retrieve data (e.g., all files tagged with keyword “urgent”) the token generator is invoked to create a token and a decryption key. The token is sent to the cloud storage provider who uses it to retrieve the appropriate (encrypted) files which it returns to Alice. Alice then uses the decryption key to decrypt the files.

Whenever Alice wishes to share data with Bob, the token generator is invoked to create a token and a decryption key which are both sent to Bob. He then sends the token to the provider who uses it to retrieve and return the appropriate encrypted documents. Bob then uses the decryption key to recover the files. This process is illustrated in Figure 1. 


   
Figure 1: (1) Alice’s data processor prepares the data before sending it to the cloud; (2) Bob asks Alice for permission to search for a keyword; (3) Alice’s token generator sends a token for the keyword and a decryption key back to Bob; (4) Bob sends the token to the cloud; (5) the cloud uses the token to find the appropriate encrypted documents and returns them to Bob. At any point in time, Alice’s data verifier can verify the integrity of the data.

An Enterprise Architecture

In the enterprise scenario we consider an enterprise MegaCorp that stores its data in the cloud; a business partner PartnerCorp with whom MegaCorp wants to share data; and a cloud storage provider that stores MegaCorp’s data. To handle enterprise customers, we introduce an extra component: a credential generator. The credential generator implements an access control policy by issuing credentials to parties inside and outside MegaCorp.

To use the service, MegaCorp deploys dedicated machines within its network to run components which make use of a master secret key, so it is important that they be adequately protected. The dedicated machines include a data processor, a data verifier, a token generator and a credential generator. To begin, each MegaCorp and PartnerCorp employee receives a credential from the credential generator. These credentials reflect some relevant information about the employees such as their organization or team or role.  


 
Figure 2: (1) Each MegaCorp and PartnerCorp employee receives a credential; (2) MegaCorp employees send their data to the dedicated machine; (3) the latter processes the data using the data processor before sending it to the cloud; (4) the PartnerCorp employee sends a keyword to MegaCorp’s dedicated machine ; (5) the dedicated machine returns a token; (6) the PartnerCorp employee sends the token to the cloud; (7) the cloud uses the token to find the appropriate encrypted documents and returns them to the employee. At any point in time, MegaCorp’s data verifier can verify the integrity of MegaCorp’s data.

generates data that needs to be stored in the cloud, it sends the data together with an associated decryption policy to the dedicated machine for processing. The decryption policy specifies the type of credentials necessary to decrypt the data (e.g., only members of a particular team). To retrieve data from the cloud (e.g., all files generated by a particular employee), an employee requests an appropriate token from the dedicated machine. The employee then sends the token to the cloud provider who uses it to find and return the appropriate encrypted files which the employee decrypts using his credentials.  

If a PartnerCorp employee needs access to MegaCorp’s data, the employee authenticates itself to MegaCorp’s dedicated machine and sends it a keyword. The latter verifies that the particular search is allowed for this PartnerCorp employee. If so, the dedicated machine returns an appropriate token which the employee uses to recover the appropriate files from the service provider. It then uses its credentials to decrypt the file. This process is illustrated in Figure 2.

Implementing the Core Cryptographic Components

The core components of a cryptographic storage service can be implemented using a variety of techniques, some of which were developed specifically for cloud computing.  When preparing data for storage in the cloud, the data processor begins by indexing it and encrypting it with a symmetric encryption scheme (for example the government approved block cipher AES) under a unique key. It then encrypts the index using a searchable encryption scheme and encrypts the unique key with an attribute-based encryption scheme under an appropriate policy.  Finally, it encodes the encrypted data and index in such a way that the data verifier can later verify their integrity using a proof of storage.

In the following we provide high level descriptions of these new cryptographic primitives. While traditional techniques like encryption and digital signatures could be used to implement the core components, they would do so at considerable cost in communication and computation. To see why, consider the example of an organization that encrypts and signs its data before storing it in the cloud. While this clearly preserves confidentiality and integrity it has the following limitations.

To enable searching over the data, the customer has to either store an index locally, or download all the (encrypted) data, decrypt it and search locally. The first approach obviously negates the benefits of cloud storage (since indexes can grow large) while the second scales poorly.   With respect to integrity, note that the organization would have to retrieve all the data first in order to verify the signatures. If the data is large, this verification procedure is obviously undesirable. Various solutions based on (keyed) hash functions could also be used, but all such approaches only allow a fixed number of verifications.

Searchable Encryption

A searchable encryption scheme provides a way to encrypt a search index so that its contents are hidden except to a party that is given appropriate tokens. More precisely, consider a search index generated over a collection of files (this could be a full-text index or just a keyword index). Using a searchable encryption scheme, the index is encrypted in such a way that (1) given a token for a keyword one can retrieve pointers to the encrypted files that contain the keyword; and (2) without a token the contents of the index are hidden. In addition, the tokens can only be generated with knowledge of a secret key and the retrieval procedure reveals nothing about the files or the keywords except that the files contain a keyword in common.

Symmetric searchable encryption (SSE) is appropriate in any setting where the party that searches over the data is also the one who generates it.  The main advantages of SSE are efficiency and security while the main disadvantage is functionality. SSE schemes are efficient both for the party doing the encryption and (in some cases) for the party performing the search. Encryption is efficient because most SSE schemes are based on symmetric primitives like block ciphers and pseudo-random functions. Search can be efficient because the typical usage scenarios for SSE allow the data to be pre-processed and stored in efficient data structures.

Attribute-based Encryption

Another set of cryptographic techniques that has emerged recently allows the specification of a decryption policy to be associated with a ciphertext. More precisely, in a ciphertext-policy attribute-based encryption scheme each user in the system is provided with a decryption key that has a set of attributes associated with it.  A user can then encrypt a message under a public key and a policy.  Decryption will only work if the attributes associated with the decryption key match the policy used to encrypt the message. Attributes are qualities of a party that can be established through relevant credentials such as being an employee of a certain company or living in Washington State. 
 
Proofs of Storage 

A proof of storage is a protocol executed between a client and a server with which the server can prove to the client that it did not tamper with its data. The client begins by encoding the data before storing it in the cloud. From that point on, whenever it wants to verify the integrity of the data it runs a proof of storage protocol with the server. The main benefits of a proof of storage are that (1) they can be executed an arbitrary number of times; and (2) the amount of information exchanged between the client and the server is extremely small and independent of the size of the data.

Trends and future potential

Extensions to cryptographic cloud storage and services are possible based on current and emerging cryptographic research.  This new work will bear fruit in enlarging the range of operations which can be efficiently performed on encrypted data, enriching the business scenarios which can be enabled through cryptographic cloud storage.

About the Authors

Kristin Lauter is a Principal Researcher and the head of the Cryptography Group at Microsoft Research. She directs the group’s research activities in theoretical and applied cryptography and in the related math fields of number theory and algebraic geometry. Group members publish basic research in prestigious journals and conferences and collaborate with academia through joint publications, and by helping to organize conferences and serve on program committees. The group also works closely with product groups, providing consulting services and technology transfer. The group maintains an active program of post-docs, interns, and visiting scholars. Her personal research interests include algorithmic number theory, elliptic curve cryptography, hash functions, and security protocols.

Seny Kamara is a researcher in the Crypto Group at Microsoft Research in Redmond and completed a Ph.D. in Computer Science at Johns Hopkins University under the supervision of Fabian Monrose. At Hopkins Dr. Kamara was a member of the Security and Privacy Applied Research (SPAR) Lab. Seny Kamara spent the Fall of 2006 at UCLA’s IPAM and the summer of 2003 at CMU’s CyLab. Main research interests are in cryptography and security and recent work has been in cloud cryptography, focusing on the design of new models and techniques to alleviate security and privacy concerns that arise in the context of cloud computing.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Cray Introduces All Flash Lustre Storage Solution Targeting HPC

June 19, 2018

Citing the rise of IOPS-intensive workflows and more affordable flash technology, Cray today introduced the L300F, a scalable all-flash storage solution whose primary use case is to support high IOPS rates to/from a scra Read more…

By John Russell

Lenovo to Debut ‘Neptune’ Cooling Technologies at ISC

June 19, 2018

Lenovo today announced a set of cooling technologies, dubbed Neptune, that include direct to node (DTN) warm water cooling, rear door heat exchanger (RDHX), and hybrid solutions that combine air and liquid cooling. Lenov Read more…

By John Russell

World Cup is Lame Compared to This Competition

June 18, 2018

So you think World Cup soccer is a big deal? While I’m sure it’s very compelling to watch a bunch of athletes kick a ball around, World Cup misses the boat because it doesn’t include teams putting together their ow Read more…

By Dan Olds

HPE Extreme Performance Solutions

HPC and AI Convergence is Accelerating New Levels of Intelligence

Data analytics is the most valuable tool in the digital marketplace – so much so that organizations are employing high performance computing (HPC) capabilities to rapidly collect, share, and analyze endless streams of data. Read more…

IBM Accelerated Insights

Banks Boost Infrastructure to Tackle GDPR

As banks become more digital and data-driven, their IT managers are challenged with fast growing data volumes and lines-of-businesses’ (LoBs’) seemingly limitless appetite for analytics. Read more…

IBM Demonstrates Deep Neural Network Training with Analog Memory Devices

June 18, 2018

From smarter, more personalized apps to seemingly-ubiquitous Google Assistant and Alexa devices, AI adoption is showing no signs of slowing down – and yet, the hardware used for AI is far from perfect. Currently, GPUs Read more…

By Oliver Peckham

Cray Introduces All Flash Lustre Storage Solution Targeting HPC

June 19, 2018

Citing the rise of IOPS-intensive workflows and more affordable flash technology, Cray today introduced the L300F, a scalable all-flash storage solution whose p Read more…

By John Russell

Sandia to Take Delivery of World’s Largest Arm System

June 18, 2018

While the enterprise remains circumspect on prospects for Arm servers in the datacenter, the leadership HPC community is taking a bolder, brighter view of the x86 server CPU alternative. Amongst current and planned Arm HPC installations – i.e., the innovative Mont-Blanc project, led by Bull/Atos, the 'Isambard’ Cray XC50 going into the University of Bristol, and commitments from both Japan and France among others -- HPE is announcing that it will be supply the United States National Nuclear Security Administration (NNSA) with a 2.3 petaflops peak Arm-based system, named Astra. Read more…

By Tiffany Trader

The Machine Learning Hype Cycle and HPC

June 14, 2018

Like many other HPC professionals I’m following the hype cycle around machine learning/deep learning with interest. I subscribe to the view that we’re probably approaching the ‘peak of inflated expectation’ but not quite yet starting the descent into the ‘trough of disillusionment. This still raises the probability that... Read more…

By Dairsie Latimer

Xiaoxiang Zhu Receives the 2018 PRACE Ada Lovelace Award for HPC

June 13, 2018

Xiaoxiang Zhu, who works for the German Aerospace Center (DLR) and Technical University of Munich (TUM), was awarded the 2018 PRACE Ada Lovelace Award for HPC for her outstanding contributions in the field of high performance computing (HPC) in Europe. Read more…

By Elizabeth Leake

U.S Considering Launch of National Quantum Initiative

June 11, 2018

Sometime this month the U.S. House Science Committee will introduce legislation to launch a 10-year National Quantum Initiative, according to a recent report by Read more…

By John Russell

ORNL Summit Supercomputer Is Officially Here

June 8, 2018

Oak Ridge National Laboratory (ORNL) together with IBM and Nvidia celebrated the official unveiling of the Department of Energy (DOE) Summit supercomputer toda Read more…

By Tiffany Trader

Exascale USA – Continuing to Move Forward

June 6, 2018

The end of May 2018, saw several important events that continue to advance the Department of Energy’s (DOE) Exascale Computing Initiative (ECI) for the United Read more…

By Alex R. Larzelere

Exascale for the Rest of Us: Exaflops Systems Capable for Industry

June 6, 2018

Enterprise advanced scale computing – or HPC in the enterprise – is an entity unto itself, situated between (and with characteristics of) conventional enter Read more…

By Doug Black

MLPerf – Will New Machine Learning Benchmark Help Propel AI Forward?

May 2, 2018

Let the AI benchmarking wars begin. Today, a diverse group from academia and industry – Google, Baidu, Intel, AMD, Harvard, and Stanford among them – releas Read more…

By John Russell

How the Cloud Is Falling Short for HPC

March 15, 2018

The last couple of years have seen cloud computing gradually build some legitimacy within the HPC world, but still the HPC industry lies far behind enterprise I Read more…

By Chris Downing

US Plans $1.8 Billion Spend on DOE Exascale Supercomputing

April 11, 2018

On Monday, the United States Department of Energy announced its intention to procure up to three exascale supercomputers at a cost of up to $1.8 billion with th Read more…

By Tiffany Trader

Deep Learning at 15 PFlops Enables Training for Extreme Weather Identification at Scale

March 19, 2018

Petaflop per second deep learning training performance on the NERSC (National Energy Research Scientific Computing Center) Cori supercomputer has given climate Read more…

By Rob Farber

Lenovo Unveils Warm Water Cooled ThinkSystem SD650 in Rampup to LRZ Install

February 22, 2018

This week Lenovo took the wraps off the ThinkSystem SD650 high-density server with third-generation direct water cooling technology developed in tandem with par Read more…

By Tiffany Trader

ORNL Summit Supercomputer Is Officially Here

June 8, 2018

Oak Ridge National Laboratory (ORNL) together with IBM and Nvidia celebrated the official unveiling of the Department of Energy (DOE) Summit supercomputer toda Read more…

By Tiffany Trader

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

HPE Wins $57 Million DoD Supercomputing Contract

February 20, 2018

Hewlett Packard Enterprise (HPE) today revealed details of its massive $57 million HPC contract with the U.S. Department of Defense (DoD). The deal calls for HP Read more…

By Tiffany Trader

Leading Solution Providers

SC17 Booth Video Tours Playlist

Altair @ SC17

Altair

AMD @ SC17

AMD

ASRock Rack @ SC17

ASRock Rack

CEJN @ SC17

CEJN

DDN Storage @ SC17

DDN Storage

Huawei @ SC17

Huawei

IBM @ SC17

IBM

IBM Power Systems @ SC17

IBM Power Systems

Intel @ SC17

Intel

Lenovo @ SC17

Lenovo

Mellanox Technologies @ SC17

Mellanox Technologies

Microsoft @ SC17

Microsoft

Penguin Computing @ SC17

Penguin Computing

Pure Storage @ SC17

Pure Storage

Supericro @ SC17

Supericro

Tyan @ SC17

Tyan

Univa @ SC17

Univa

Hennessy & Patterson: A New Golden Age for Computer Architecture

April 17, 2018

On Monday June 4, 2018, 2017 A.M. Turing Award Winners John L. Hennessy and David A. Patterson will deliver the Turing Lecture at the 45th International Sympo Read more…

By Staff

Google Chases Quantum Supremacy with 72-Qubit Processor

March 7, 2018

Google pulled ahead of the pack this week in the race toward "quantum supremacy," with the introduction of a new 72-qubit quantum processor called Bristlecone. Read more…

By Tiffany Trader

Google I/O 2018: AI Everywhere; TPU 3.0 Delivers 100+ Petaflops but Requires Liquid Cooling

May 9, 2018

All things AI dominated discussion at yesterday’s opening of Google’s I/O 2018 developers meeting covering much of Google's near-term product roadmap. The e Read more…

By John Russell

Nvidia Ups Hardware Game with 16-GPU DGX-2 Server and 18-Port NVSwitch

March 27, 2018

Nvidia unveiled a raft of new products from its annual technology conference in San Jose today, and despite not offering up a new chip architecture, there were still a few surprises in store for HPC hardware aficionados. Read more…

By Tiffany Trader

Pattern Computer – Startup Claims Breakthrough in ‘Pattern Discovery’ Technology

May 23, 2018

If it weren’t for the heavy-hitter technology team behind start-up Pattern Computer, which emerged from stealth today in a live-streamed event from San Franci Read more…

By John Russell

Part One: Deep Dive into 2018 Trends in Life Sciences HPC

March 1, 2018

Life sciences is an interesting lens through which to see HPC. It is perhaps not an obvious choice, given life sciences’ relative newness as a heavy user of H Read more…

By John Russell

Intel Pledges First Commercial Nervana Product ‘Spring Crest’ in 2019

May 24, 2018

At its AI developer conference in San Francisco yesterday, Intel embraced a holistic approach to AI and showed off a broad AI portfolio that includes Xeon processors, Movidius technologies, FPGAs and Intel’s Nervana Neural Network Processors (NNPs), based on the technology it acquired in 2016. Read more…

By Tiffany Trader

Google Charts Two-Dimensional Quantum Course

April 26, 2018

Quantum error correction, essential for achieving universal fault-tolerant quantum computation, is one of the main challenges of the quantum computing field and it’s top of mind for Google’s John Martinis. At a presentation last week at the HPC User Forum in Tucson, Martinis, one of the world's foremost experts in quantum computing, emphasized... Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This