Considerations for the Cryptographic Cloud

By Seny Kamara and Kristin Lauter -- Microsoft Research Cryptography Group

March 11, 2011

With the prospect of increasing amounts of data being collected by a proliferation of internet –connected devices and the task of organizing, storing, and accessing such data looming, we face the challenge of how to leverage the power of the cloud running in our data centers to make information accessible in a secure and privacy-preserving manner.  For many scenarios, in other words, we would like to have a public cloud which we can trust with our private data, and yet we would like to have that data still be accessible to us in an organized and useful way.

One approach to this problem is to envision a world in which all data is preprocessed by a client device before being uploaded to the cloud; the preprocessing signs and encrypts the data in such a way that its functionality is preserved, allowing, for example, for the cloud to search or compute over the encrypted data and to prove its integrity to the client (without the client having to download it). We refer to this type of solution as Cryptographic Cloud Storage.  

Cryptographic cloud storage is achievable with current technologies and can help bootstrap trust in public clouds.  It can also form the foundation for future cryptographic cloud solutions where an increasing amount of computation on encrypted data is possible and efficient.  We will explain cryptographic cloud storage and what role it might play as cloud becomes a more dominant force.

Applications of the Cryptographic Cloud

Storage services based on public clouds such as Microsoft’s Azure storage service and Amazon’s S3 provide customers with scalable and dynamic storage. By moving their data to the cloud customers can avoid the costs of building and maintaining a private storage infrastructure, opting instead to pay a service provider as a function of its needs. For most customers, this provides several benefits including availability (i.e., being able to access data from anywhere) and reliability (i.e., not having to worry about backups) at a relatively low cost.  While the benefits of using a public cloud infrastructure are clear, it introduces significant security and privacy risks. In fact, it seems that the biggest hurdle to the adoption of cloud storage (and cloud computing in general) is concern over the confidentiality and integrity of data. 

While, so far, consumers have been willing to trade privacy for the convenience of software services (e.g., for web-based email, calendars, pictures etc…), this is not the case for enterprises and government organizations. This reluctance can be attributed to several factors that range from a desire to protect mission-critical data to regulatory obligations to preserve the confidentiality and integrity of data. The latter can occur when the customer is responsible for keeping personally identifiable information (PII), or medical and financial records. So while cloud storage has enormous promise, unless the issues of confidentiality and integrity are addressed many potential customers will be reluctant to make the move.

In addition to simple storage, many enterprises will have a need for some associated services.  These services can include any number of business processes including sharing of data among trusted partners, litigation support, monitoring and compliance, back-up, archive and audit logs.   A cryptographic storage service can be endowed with some subset of these services to provide value to enterprises, for example in complying with government regulations for handling of sensitive data, geographic considerations relating to data provenance,  to help mitigate the cost of security breaches, lower the cost of electronic discovery for litigation support, or alleviate the burden of complying with subpoenas.

For example, a specific type of data which is especially sensitive is personal medical data.  The recent move towards electronic health records promises to reduce medical errors, save lives and decrease the cost of healthcare. Given the importance and sensitivity of health-related data, it is clear that any cloud storage platform for health records will need to provide strong confidentiality and integrity guarantees to patients and care givers, which can be enabled with cryptographic cloud storage. 

Another arena where a cryptographic cloud storage system could be useful is interactive scientific publishing. As scientists continue to produce large data sets which have broad value for the scientific community, demand will increase for a storage infrastructure to make such data accessible and sharable.  To incent scientists to share their data, scientific could establish a publication forum for data sets in partnership with hosted data centers.  Such an interactive publication forum would need to provide strong guarantees to authors on how their data sets may be accessed and used by others, and could be built on a cryptographic cloud storage system. 

Cryptographic Cloud Storage

The core properties of a cryptographic storage service are that control of the data is maintained by the customer and the security properties are derived from cryptography, as opposed to legal mechanisms, physical security, or access control.   A cryptographic cloud service should guarantee confidentiality and integrity of the data while maintaining the availability, reliability, and efficient retrieval of the data and allowing for flexible policies of data sharing.

A cryptographic storage service can be built from three main components: a data processor (DP), that processes data before it is sent to the cloud; a data verifier (DV), that checks whether the data in the cloud has been tampered with; and a token generator (TG), that generates tokens which enable the cloud storage provider to retrieve segments of customer data.  We describe designs for both consumer and enterprise scenarios.

A Consumer Architecture

Typical consumer scenarios include hosted email services or content storage or back-up.  Consider three parties: a user Alice that stores her data in the cloud; a user Bob with whom Alice wants to share data; and a cloud storage provider that stores Alice’s data. To use the service, Alice and Bob begin by downloading a client application that consists of a data processor, a data verifier and a token generator. Upon its first execution, Alice’s application generates a cryptographic key. We will refer to this key as a master key and assume it is stored locally on Alice’s system and that it is kept secret from the cloud storage provider.

Whenever Alice wishes to upload data to the cloud, the data processor attaches some metadata (e.g., current time, size, keywords etc…) and encrypts and encodes the data and metadata with a variety of cryptographic primitives. Whenever Alice wants to verify the integrity of her data, the data verifier is invoked. The latter uses Alice’s master key to interact with the cloud storage provider and ascertain the integrity of the data. When Alice wants to retrieve data (e.g., all files tagged with keyword “urgent”) the token generator is invoked to create a token and a decryption key. The token is sent to the cloud storage provider who uses it to retrieve the appropriate (encrypted) files which it returns to Alice. Alice then uses the decryption key to decrypt the files.

Whenever Alice wishes to share data with Bob, the token generator is invoked to create a token and a decryption key which are both sent to Bob. He then sends the token to the provider who uses it to retrieve and return the appropriate encrypted documents. Bob then uses the decryption key to recover the files. This process is illustrated in Figure 1. 


   
Figure 1: (1) Alice’s data processor prepares the data before sending it to the cloud; (2) Bob asks Alice for permission to search for a keyword; (3) Alice’s token generator sends a token for the keyword and a decryption key back to Bob; (4) Bob sends the token to the cloud; (5) the cloud uses the token to find the appropriate encrypted documents and returns them to Bob. At any point in time, Alice’s data verifier can verify the integrity of the data.

An Enterprise Architecture

In the enterprise scenario we consider an enterprise MegaCorp that stores its data in the cloud; a business partner PartnerCorp with whom MegaCorp wants to share data; and a cloud storage provider that stores MegaCorp’s data. To handle enterprise customers, we introduce an extra component: a credential generator. The credential generator implements an access control policy by issuing credentials to parties inside and outside MegaCorp.

To use the service, MegaCorp deploys dedicated machines within its network to run components which make use of a master secret key, so it is important that they be adequately protected. The dedicated machines include a data processor, a data verifier, a token generator and a credential generator. To begin, each MegaCorp and PartnerCorp employee receives a credential from the credential generator. These credentials reflect some relevant information about the employees such as their organization or team or role.  


 
Figure 2: (1) Each MegaCorp and PartnerCorp employee receives a credential; (2) MegaCorp employees send their data to the dedicated machine; (3) the latter processes the data using the data processor before sending it to the cloud; (4) the PartnerCorp employee sends a keyword to MegaCorp’s dedicated machine ; (5) the dedicated machine returns a token; (6) the PartnerCorp employee sends the token to the cloud; (7) the cloud uses the token to find the appropriate encrypted documents and returns them to the employee. At any point in time, MegaCorp’s data verifier can verify the integrity of MegaCorp’s data.

generates data that needs to be stored in the cloud, it sends the data together with an associated decryption policy to the dedicated machine for processing. The decryption policy specifies the type of credentials necessary to decrypt the data (e.g., only members of a particular team). To retrieve data from the cloud (e.g., all files generated by a particular employee), an employee requests an appropriate token from the dedicated machine. The employee then sends the token to the cloud provider who uses it to find and return the appropriate encrypted files which the employee decrypts using his credentials.  

If a PartnerCorp employee needs access to MegaCorp’s data, the employee authenticates itself to MegaCorp’s dedicated machine and sends it a keyword. The latter verifies that the particular search is allowed for this PartnerCorp employee. If so, the dedicated machine returns an appropriate token which the employee uses to recover the appropriate files from the service provider. It then uses its credentials to decrypt the file. This process is illustrated in Figure 2.

Implementing the Core Cryptographic Components

The core components of a cryptographic storage service can be implemented using a variety of techniques, some of which were developed specifically for cloud computing.  When preparing data for storage in the cloud, the data processor begins by indexing it and encrypting it with a symmetric encryption scheme (for example the government approved block cipher AES) under a unique key. It then encrypts the index using a searchable encryption scheme and encrypts the unique key with an attribute-based encryption scheme under an appropriate policy.  Finally, it encodes the encrypted data and index in such a way that the data verifier can later verify their integrity using a proof of storage.

In the following we provide high level descriptions of these new cryptographic primitives. While traditional techniques like encryption and digital signatures could be used to implement the core components, they would do so at considerable cost in communication and computation. To see why, consider the example of an organization that encrypts and signs its data before storing it in the cloud. While this clearly preserves confidentiality and integrity it has the following limitations.

To enable searching over the data, the customer has to either store an index locally, or download all the (encrypted) data, decrypt it and search locally. The first approach obviously negates the benefits of cloud storage (since indexes can grow large) while the second scales poorly.   With respect to integrity, note that the organization would have to retrieve all the data first in order to verify the signatures. If the data is large, this verification procedure is obviously undesirable. Various solutions based on (keyed) hash functions could also be used, but all such approaches only allow a fixed number of verifications.

Searchable Encryption

A searchable encryption scheme provides a way to encrypt a search index so that its contents are hidden except to a party that is given appropriate tokens. More precisely, consider a search index generated over a collection of files (this could be a full-text index or just a keyword index). Using a searchable encryption scheme, the index is encrypted in such a way that (1) given a token for a keyword one can retrieve pointers to the encrypted files that contain the keyword; and (2) without a token the contents of the index are hidden. In addition, the tokens can only be generated with knowledge of a secret key and the retrieval procedure reveals nothing about the files or the keywords except that the files contain a keyword in common.

Symmetric searchable encryption (SSE) is appropriate in any setting where the party that searches over the data is also the one who generates it.  The main advantages of SSE are efficiency and security while the main disadvantage is functionality. SSE schemes are efficient both for the party doing the encryption and (in some cases) for the party performing the search. Encryption is efficient because most SSE schemes are based on symmetric primitives like block ciphers and pseudo-random functions. Search can be efficient because the typical usage scenarios for SSE allow the data to be pre-processed and stored in efficient data structures.

Attribute-based Encryption

Another set of cryptographic techniques that has emerged recently allows the specification of a decryption policy to be associated with a ciphertext. More precisely, in a ciphertext-policy attribute-based encryption scheme each user in the system is provided with a decryption key that has a set of attributes associated with it.  A user can then encrypt a message under a public key and a policy.  Decryption will only work if the attributes associated with the decryption key match the policy used to encrypt the message. Attributes are qualities of a party that can be established through relevant credentials such as being an employee of a certain company or living in Washington State. 
 
Proofs of Storage 

A proof of storage is a protocol executed between a client and a server with which the server can prove to the client that it did not tamper with its data. The client begins by encoding the data before storing it in the cloud. From that point on, whenever it wants to verify the integrity of the data it runs a proof of storage protocol with the server. The main benefits of a proof of storage are that (1) they can be executed an arbitrary number of times; and (2) the amount of information exchanged between the client and the server is extremely small and independent of the size of the data.

Trends and future potential

Extensions to cryptographic cloud storage and services are possible based on current and emerging cryptographic research.  This new work will bear fruit in enlarging the range of operations which can be efficiently performed on encrypted data, enriching the business scenarios which can be enabled through cryptographic cloud storage.

About the Authors

Kristin Lauter is a Principal Researcher and the head of the Cryptography Group at Microsoft Research. She directs the group’s research activities in theoretical and applied cryptography and in the related math fields of number theory and algebraic geometry. Group members publish basic research in prestigious journals and conferences and collaborate with academia through joint publications, and by helping to organize conferences and serve on program committees. The group also works closely with product groups, providing consulting services and technology transfer. The group maintains an active program of post-docs, interns, and visiting scholars. Her personal research interests include algorithmic number theory, elliptic curve cryptography, hash functions, and security protocols.

Seny Kamara is a researcher in the Crypto Group at Microsoft Research in Redmond and completed a Ph.D. in Computer Science at Johns Hopkins University under the supervision of Fabian Monrose. At Hopkins Dr. Kamara was a member of the Security and Privacy Applied Research (SPAR) Lab. Seny Kamara spent the Fall of 2006 at UCLA’s IPAM and the summer of 2003 at CMU’s CyLab. Main research interests are in cryptography and security and recent work has been in cloud cryptography, focusing on the design of new models and techniques to alleviate security and privacy concerns that arise in the context of cloud computing.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “pre-exascale” award), parsed out additional information ab Read more…

By Tiffany Trader

Tsinghua Crowned Eight-Time Student Cluster Champions at ISC

June 22, 2017

Always a hard-fought competition, the Student Cluster Competition awards were announced Wednesday, June 21, at the ISC High Performance Conference 2017. Amid whoops and hollers from the crowd, Thomas Sterling presented t Read more…

By Kim McMahon

GPUs, Power9, Figure Prominently in IBM’s Bet on Weather Forecasting

June 22, 2017

IBM jumped into the weather forecasting business roughly a year and a half ago by purchasing The Weather Company. This week at ISC 2017, Big Blue rolled out plans to push deeper into climate science and develop more gran Read more…

By John Russell

Intersect 360 at ISC: HPC Industry at $44B by 2021

June 22, 2017

The care, feeding and sustained growth of the HPC industry increasingly is in the hands of the commercial market sector – in particular, it’s the hyperscale companies and their embrace of AI and deep learning – tha Read more…

By Doug Black

HPE Extreme Performance Solutions

Creating a Roadmap for HPC Innovation at ISC 2017

In an era where technological advancements are driving innovation to every sector, and powering major economic and scientific breakthroughs, high performance computing (HPC) is crucial to tackle the challenges of today and tomorrow. Read more…

At ISC – Goh on Go: Humans Can’t Scale, the Data-Centric Learning Machine Can

June 22, 2017

I've seen the future this week at ISC, it’s on display in prototype or Powerpoint form, and it’s going to dumbfound you. The future is an AI neural network designed to emulate and compete with the human brain. In thi Read more…

By Doug Black

Cray Brings AI and HPC Together on Flagship Supers

June 20, 2017

Cray took one more step toward the convergence of big data and high performance computing (HPC) today when it announced that it’s adding a full suite of big data and artificial intelligence software to its top-of-the-l Read more…

By Alex Woodie

AMD Charges Back into the Datacenter and HPC Workflows with EPYC Processor

June 20, 2017

AMD is charging back into the enterprise datacenter and select HPC workflows with its new EPYC 7000 processor line, code-named Naples, announced today at a “global” launch event in Austin TX. In many ways it was a fu Read more…

By John Russell

Hyperion: Deep Learning, AI Helping Drive Healthy HPC Industry Growth

June 20, 2017

To be at the ISC conference in Frankfurt this week is to experience deep immersion in deep learning. Users want to learn about it, vendors want to talk about it, analysts and journalists want to report on it. Deep learni Read more…

By Doug Black

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Tsinghua Crowned Eight-Time Student Cluster Champions at ISC

June 22, 2017

Always a hard-fought competition, the Student Cluster Competition awards were announced Wednesday, June 21, at the ISC High Performance Conference 2017. Amid wh Read more…

By Kim McMahon

GPUs, Power9, Figure Prominently in IBM’s Bet on Weather Forecasting

June 22, 2017

IBM jumped into the weather forecasting business roughly a year and a half ago by purchasing The Weather Company. This week at ISC 2017, Big Blue rolled out pla Read more…

By John Russell

Intersect 360 at ISC: HPC Industry at $44B by 2021

June 22, 2017

The care, feeding and sustained growth of the HPC industry increasingly is in the hands of the commercial market sector – in particular, it’s the hyperscale Read more…

By Doug Black

At ISC – Goh on Go: Humans Can’t Scale, the Data-Centric Learning Machine Can

June 22, 2017

I've seen the future this week at ISC, it’s on display in prototype or Powerpoint form, and it’s going to dumbfound you. The future is an AI neural network Read more…

By Doug Black

Cray Brings AI and HPC Together on Flagship Supers

June 20, 2017

Cray took one more step toward the convergence of big data and high performance computing (HPC) today when it announced that it’s adding a full suite of big d Read more…

By Alex Woodie

AMD Charges Back into the Datacenter and HPC Workflows with EPYC Processor

June 20, 2017

AMD is charging back into the enterprise datacenter and select HPC workflows with its new EPYC 7000 processor line, code-named Naples, announced today at a “g Read more…

By John Russell

Hyperion: Deep Learning, AI Helping Drive Healthy HPC Industry Growth

June 20, 2017

To be at the ISC conference in Frankfurt this week is to experience deep immersion in deep learning. Users want to learn about it, vendors want to talk about it Read more…

By Doug Black

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Just how close real-wo Read more…

By John Russell

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the cam Read more…

By John Russell

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its a Read more…

By Tiffany Trader

Google Pulls Back the Covers on Its First Machine Learning Chip

April 6, 2017

This week Google released a report detailing the design and performance characteristics of the Tensor Processing Unit (TPU), its custom ASIC for the inference Read more…

By Tiffany Trader

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Nvidia’s Mammoth Volta GPU Aims High for AI, HPC

May 10, 2017

At Nvidia's GPU Technology Conference (GTC17) in San Jose, Calif., this morning, CEO Jensen Huang announced the company's much-anticipated Volta architecture a Read more…

By Tiffany Trader

Facebook Open Sources Caffe2; Nvidia, Intel Rush to Optimize

April 18, 2017

From its F8 developer conference in San Jose, Calif., today, Facebook announced Caffe2, a new open-source, cross-platform framework for deep learning. Caffe2 is the successor to Caffe, the deep learning framework developed by Berkeley AI Research and community contributors. Read more…

By Tiffany Trader

Leading Solution Providers

MIT Mathematician Spins Up 220,000-Core Google Compute Cluster

April 21, 2017

On Thursday, Google announced that MIT math professor and computational number theorist Andrew V. Sutherland had set a record for the largest Google Compute Engine (GCE) job. Sutherland ran the massive mathematics workload on 220,000 GCE cores using preemptible virtual machine instances. Read more…

By Tiffany Trader

Google Debuts TPU v2 and will Add to Google Cloud

May 25, 2017

Not long after stirring attention in the deep learning/AI community by revealing the details of its Tensor Processing Unit (TPU), Google last week announced the Read more…

By John Russell

US Supercomputing Leaders Tackle the China Question

March 15, 2017

Joint DOE-NSA report responds to the increased global pressures impacting the competitiveness of U.S. supercomputing. Read more…

By Tiffany Trader

Russian Researchers Claim First Quantum-Safe Blockchain

May 25, 2017

The Russian Quantum Center today announced it has overcome the threat of quantum cryptography by creating the first quantum-safe blockchain, securing cryptocurrencies like Bitcoin, along with classified government communications and other sensitive digital transfers. Read more…

By Doug Black

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of Read more…

By Alex Woodie

DOE Supercomputer Achieves Record 45-Qubit Quantum Simulation

April 13, 2017

In order to simulate larger and larger quantum systems and usher in an age of “quantum supremacy,” researchers are stretching the limits of today’s most advanced supercomputers. Read more…

By Tiffany Trader

Messina Update: The US Path to Exascale in 16 Slides

April 26, 2017

Paul Messina, director of the U.S. Exascale Computing Project, provided a wide-ranging review of ECP’s evolving plans last week at the HPC User Forum. Read more…

By John Russell

Knights Landing Processor with Omni-Path Makes Cloud Debut

April 18, 2017

HPC cloud specialist Rescale is partnering with Intel and HPC resource provider R Systems to offer first-ever cloud access to Xeon Phi "Knights Landing" processors. The infrastructure is based on the 68-core Intel Knights Landing processor with integrated Omni-Path fabric (the 7250F Xeon Phi). Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This