Considerations for the Cryptographic Cloud

By Seny Kamara and Kristin Lauter -- Microsoft Research Cryptography Group

March 11, 2011

With the prospect of increasing amounts of data being collected by a proliferation of internet –connected devices and the task of organizing, storing, and accessing such data looming, we face the challenge of how to leverage the power of the cloud running in our data centers to make information accessible in a secure and privacy-preserving manner.  For many scenarios, in other words, we would like to have a public cloud which we can trust with our private data, and yet we would like to have that data still be accessible to us in an organized and useful way.

One approach to this problem is to envision a world in which all data is preprocessed by a client device before being uploaded to the cloud; the preprocessing signs and encrypts the data in such a way that its functionality is preserved, allowing, for example, for the cloud to search or compute over the encrypted data and to prove its integrity to the client (without the client having to download it). We refer to this type of solution as Cryptographic Cloud Storage.  

Cryptographic cloud storage is achievable with current technologies and can help bootstrap trust in public clouds.  It can also form the foundation for future cryptographic cloud solutions where an increasing amount of computation on encrypted data is possible and efficient.  We will explain cryptographic cloud storage and what role it might play as cloud becomes a more dominant force.

Applications of the Cryptographic Cloud

Storage services based on public clouds such as Microsoft’s Azure storage service and Amazon’s S3 provide customers with scalable and dynamic storage. By moving their data to the cloud customers can avoid the costs of building and maintaining a private storage infrastructure, opting instead to pay a service provider as a function of its needs. For most customers, this provides several benefits including availability (i.e., being able to access data from anywhere) and reliability (i.e., not having to worry about backups) at a relatively low cost.  While the benefits of using a public cloud infrastructure are clear, it introduces significant security and privacy risks. In fact, it seems that the biggest hurdle to the adoption of cloud storage (and cloud computing in general) is concern over the confidentiality and integrity of data. 

While, so far, consumers have been willing to trade privacy for the convenience of software services (e.g., for web-based email, calendars, pictures etc…), this is not the case for enterprises and government organizations. This reluctance can be attributed to several factors that range from a desire to protect mission-critical data to regulatory obligations to preserve the confidentiality and integrity of data. The latter can occur when the customer is responsible for keeping personally identifiable information (PII), or medical and financial records. So while cloud storage has enormous promise, unless the issues of confidentiality and integrity are addressed many potential customers will be reluctant to make the move.

In addition to simple storage, many enterprises will have a need for some associated services.  These services can include any number of business processes including sharing of data among trusted partners, litigation support, monitoring and compliance, back-up, archive and audit logs.   A cryptographic storage service can be endowed with some subset of these services to provide value to enterprises, for example in complying with government regulations for handling of sensitive data, geographic considerations relating to data provenance,  to help mitigate the cost of security breaches, lower the cost of electronic discovery for litigation support, or alleviate the burden of complying with subpoenas.

For example, a specific type of data which is especially sensitive is personal medical data.  The recent move towards electronic health records promises to reduce medical errors, save lives and decrease the cost of healthcare. Given the importance and sensitivity of health-related data, it is clear that any cloud storage platform for health records will need to provide strong confidentiality and integrity guarantees to patients and care givers, which can be enabled with cryptographic cloud storage. 

Another arena where a cryptographic cloud storage system could be useful is interactive scientific publishing. As scientists continue to produce large data sets which have broad value for the scientific community, demand will increase for a storage infrastructure to make such data accessible and sharable.  To incent scientists to share their data, scientific could establish a publication forum for data sets in partnership with hosted data centers.  Such an interactive publication forum would need to provide strong guarantees to authors on how their data sets may be accessed and used by others, and could be built on a cryptographic cloud storage system. 

Cryptographic Cloud Storage

The core properties of a cryptographic storage service are that control of the data is maintained by the customer and the security properties are derived from cryptography, as opposed to legal mechanisms, physical security, or access control.   A cryptographic cloud service should guarantee confidentiality and integrity of the data while maintaining the availability, reliability, and efficient retrieval of the data and allowing for flexible policies of data sharing.

A cryptographic storage service can be built from three main components: a data processor (DP), that processes data before it is sent to the cloud; a data verifier (DV), that checks whether the data in the cloud has been tampered with; and a token generator (TG), that generates tokens which enable the cloud storage provider to retrieve segments of customer data.  We describe designs for both consumer and enterprise scenarios.

A Consumer Architecture

Typical consumer scenarios include hosted email services or content storage or back-up.  Consider three parties: a user Alice that stores her data in the cloud; a user Bob with whom Alice wants to share data; and a cloud storage provider that stores Alice’s data. To use the service, Alice and Bob begin by downloading a client application that consists of a data processor, a data verifier and a token generator. Upon its first execution, Alice’s application generates a cryptographic key. We will refer to this key as a master key and assume it is stored locally on Alice’s system and that it is kept secret from the cloud storage provider.

Whenever Alice wishes to upload data to the cloud, the data processor attaches some metadata (e.g., current time, size, keywords etc…) and encrypts and encodes the data and metadata with a variety of cryptographic primitives. Whenever Alice wants to verify the integrity of her data, the data verifier is invoked. The latter uses Alice’s master key to interact with the cloud storage provider and ascertain the integrity of the data. When Alice wants to retrieve data (e.g., all files tagged with keyword “urgent”) the token generator is invoked to create a token and a decryption key. The token is sent to the cloud storage provider who uses it to retrieve the appropriate (encrypted) files which it returns to Alice. Alice then uses the decryption key to decrypt the files.

Whenever Alice wishes to share data with Bob, the token generator is invoked to create a token and a decryption key which are both sent to Bob. He then sends the token to the provider who uses it to retrieve and return the appropriate encrypted documents. Bob then uses the decryption key to recover the files. This process is illustrated in Figure 1. 


   
Figure 1: (1) Alice’s data processor prepares the data before sending it to the cloud; (2) Bob asks Alice for permission to search for a keyword; (3) Alice’s token generator sends a token for the keyword and a decryption key back to Bob; (4) Bob sends the token to the cloud; (5) the cloud uses the token to find the appropriate encrypted documents and returns them to Bob. At any point in time, Alice’s data verifier can verify the integrity of the data.

An Enterprise Architecture

In the enterprise scenario we consider an enterprise MegaCorp that stores its data in the cloud; a business partner PartnerCorp with whom MegaCorp wants to share data; and a cloud storage provider that stores MegaCorp’s data. To handle enterprise customers, we introduce an extra component: a credential generator. The credential generator implements an access control policy by issuing credentials to parties inside and outside MegaCorp.

To use the service, MegaCorp deploys dedicated machines within its network to run components which make use of a master secret key, so it is important that they be adequately protected. The dedicated machines include a data processor, a data verifier, a token generator and a credential generator. To begin, each MegaCorp and PartnerCorp employee receives a credential from the credential generator. These credentials reflect some relevant information about the employees such as their organization or team or role.  


 
Figure 2: (1) Each MegaCorp and PartnerCorp employee receives a credential; (2) MegaCorp employees send their data to the dedicated machine; (3) the latter processes the data using the data processor before sending it to the cloud; (4) the PartnerCorp employee sends a keyword to MegaCorp’s dedicated machine ; (5) the dedicated machine returns a token; (6) the PartnerCorp employee sends the token to the cloud; (7) the cloud uses the token to find the appropriate encrypted documents and returns them to the employee. At any point in time, MegaCorp’s data verifier can verify the integrity of MegaCorp’s data.

generates data that needs to be stored in the cloud, it sends the data together with an associated decryption policy to the dedicated machine for processing. The decryption policy specifies the type of credentials necessary to decrypt the data (e.g., only members of a particular team). To retrieve data from the cloud (e.g., all files generated by a particular employee), an employee requests an appropriate token from the dedicated machine. The employee then sends the token to the cloud provider who uses it to find and return the appropriate encrypted files which the employee decrypts using his credentials.  

If a PartnerCorp employee needs access to MegaCorp’s data, the employee authenticates itself to MegaCorp’s dedicated machine and sends it a keyword. The latter verifies that the particular search is allowed for this PartnerCorp employee. If so, the dedicated machine returns an appropriate token which the employee uses to recover the appropriate files from the service provider. It then uses its credentials to decrypt the file. This process is illustrated in Figure 2.

Implementing the Core Cryptographic Components

The core components of a cryptographic storage service can be implemented using a variety of techniques, some of which were developed specifically for cloud computing.  When preparing data for storage in the cloud, the data processor begins by indexing it and encrypting it with a symmetric encryption scheme (for example the government approved block cipher AES) under a unique key. It then encrypts the index using a searchable encryption scheme and encrypts the unique key with an attribute-based encryption scheme under an appropriate policy.  Finally, it encodes the encrypted data and index in such a way that the data verifier can later verify their integrity using a proof of storage.

In the following we provide high level descriptions of these new cryptographic primitives. While traditional techniques like encryption and digital signatures could be used to implement the core components, they would do so at considerable cost in communication and computation. To see why, consider the example of an organization that encrypts and signs its data before storing it in the cloud. While this clearly preserves confidentiality and integrity it has the following limitations.

To enable searching over the data, the customer has to either store an index locally, or download all the (encrypted) data, decrypt it and search locally. The first approach obviously negates the benefits of cloud storage (since indexes can grow large) while the second scales poorly.   With respect to integrity, note that the organization would have to retrieve all the data first in order to verify the signatures. If the data is large, this verification procedure is obviously undesirable. Various solutions based on (keyed) hash functions could also be used, but all such approaches only allow a fixed number of verifications.

Searchable Encryption

A searchable encryption scheme provides a way to encrypt a search index so that its contents are hidden except to a party that is given appropriate tokens. More precisely, consider a search index generated over a collection of files (this could be a full-text index or just a keyword index). Using a searchable encryption scheme, the index is encrypted in such a way that (1) given a token for a keyword one can retrieve pointers to the encrypted files that contain the keyword; and (2) without a token the contents of the index are hidden. In addition, the tokens can only be generated with knowledge of a secret key and the retrieval procedure reveals nothing about the files or the keywords except that the files contain a keyword in common.

Symmetric searchable encryption (SSE) is appropriate in any setting where the party that searches over the data is also the one who generates it.  The main advantages of SSE are efficiency and security while the main disadvantage is functionality. SSE schemes are efficient both for the party doing the encryption and (in some cases) for the party performing the search. Encryption is efficient because most SSE schemes are based on symmetric primitives like block ciphers and pseudo-random functions. Search can be efficient because the typical usage scenarios for SSE allow the data to be pre-processed and stored in efficient data structures.

Attribute-based Encryption

Another set of cryptographic techniques that has emerged recently allows the specification of a decryption policy to be associated with a ciphertext. More precisely, in a ciphertext-policy attribute-based encryption scheme each user in the system is provided with a decryption key that has a set of attributes associated with it.  A user can then encrypt a message under a public key and a policy.  Decryption will only work if the attributes associated with the decryption key match the policy used to encrypt the message. Attributes are qualities of a party that can be established through relevant credentials such as being an employee of a certain company or living in Washington State. 
 
Proofs of Storage 

A proof of storage is a protocol executed between a client and a server with which the server can prove to the client that it did not tamper with its data. The client begins by encoding the data before storing it in the cloud. From that point on, whenever it wants to verify the integrity of the data it runs a proof of storage protocol with the server. The main benefits of a proof of storage are that (1) they can be executed an arbitrary number of times; and (2) the amount of information exchanged between the client and the server is extremely small and independent of the size of the data.

Trends and future potential

Extensions to cryptographic cloud storage and services are possible based on current and emerging cryptographic research.  This new work will bear fruit in enlarging the range of operations which can be efficiently performed on encrypted data, enriching the business scenarios which can be enabled through cryptographic cloud storage.

About the Authors

Kristin Lauter is a Principal Researcher and the head of the Cryptography Group at Microsoft Research. She directs the group’s research activities in theoretical and applied cryptography and in the related math fields of number theory and algebraic geometry. Group members publish basic research in prestigious journals and conferences and collaborate with academia through joint publications, and by helping to organize conferences and serve on program committees. The group also works closely with product groups, providing consulting services and technology transfer. The group maintains an active program of post-docs, interns, and visiting scholars. Her personal research interests include algorithmic number theory, elliptic curve cryptography, hash functions, and security protocols.

Seny Kamara is a researcher in the Crypto Group at Microsoft Research in Redmond and completed a Ph.D. in Computer Science at Johns Hopkins University under the supervision of Fabian Monrose. At Hopkins Dr. Kamara was a member of the Security and Privacy Applied Research (SPAR) Lab. Seny Kamara spent the Fall of 2006 at UCLA’s IPAM and the summer of 2003 at CMU’s CyLab. Main research interests are in cryptography and security and recent work has been in cloud cryptography, focusing on the design of new models and techniques to alleviate security and privacy concerns that arise in the context of cloud computing.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Study Identifies Best Practices for Public-Private HPC Engagement

August 22, 2017

What's the best way for HPC centers in the public sphere to engage with private industry partners to boost the competitiveness of the companies and the larger communities? That question is at the heart of a new study pub Read more…

By Tiffany Trader

Google Launches Site to Share its NYC-based Algorithm Research

August 22, 2017

Much of Google’s algorithm development occurs in groups scattered throughout New York City. Yesterday, Google launched a single website - NYC Algorithms and Optimization Team page - to provide a deeper view into all of Read more…

By John Russell

Dell Strikes Reseller Deal with Atos; Supplants SGI

August 22, 2017

Dell EMC and Atos announced a reseller deal today in which Dell will offer Atos’ high-end 8- and 16-socket Bullion servers. Some move from Dell had been expected following Hewlett Packard Enterprise’s purchase of SGI Read more…

By John Russell

HPE Extreme Performance Solutions

Leveraging Deep Learning for Fraud Detection

Advancements in computing technologies and the expanding use of e-commerce platforms have dramatically increased the risk of fraud for financial services companies and their customers. Read more…

Glimpses of Today’s Total Solar Eclipse

August 21, 2017

Here are a few arresting images posted by NASA of today’s total solar eclipse. Such astronomical events have always captured our imagination and it’s not hard to understand why such occurrences were often greeted wit Read more…

By John Russell

Study Identifies Best Practices for Public-Private HPC Engagement

August 22, 2017

What's the best way for HPC centers in the public sphere to engage with private industry partners to boost the competitiveness of the companies and the larger c Read more…

By Tiffany Trader

Tech Giants Outline Battle Plans for Future HPC Market

August 21, 2017

Four companies engaged in a cage fight for leadership in the emerging HPC market of the 2020s are, despite deep differences in some areas, in violent agreement Read more…

By Doug Black

Microsoft Bolsters Azure With Cloud HPC Deal

August 15, 2017

Microsoft has acquired cloud computing software vendor Cycle Computing in a move designed to bring orchestration tools along with high-end computing access capabilities to the cloud. Terms of the acquisition were not disclosed. Read more…

By George Leopold

HPE Ships Supercomputer to Space Station, Final Destination Mars

August 14, 2017

With a manned mission to Mars on the horizon, the demand for space-based supercomputing is at hand. Today HPE and NASA sent the first off-the-shelf HPC system i Read more…

By Tiffany Trader

AMD EPYC Video Takes Aim at Intel’s Broadwell

August 14, 2017

Let the benchmarking begin. Last week, AMD posted a YouTube video in which one of its EPYC-based systems outperformed a ‘comparable’ Intel Broadwell-based s Read more…

By John Russell

Deep Learning Thrives in Cancer Moonshot

August 8, 2017

The U.S. War on Cancer, certainly a worthy cause, is a collection of programs stretching back more than 40 years and abiding under many banners. The latest is t Read more…

By John Russell

IBM Raises the Bar for Distributed Deep Learning

August 8, 2017

IBM is announcing today an enhancement to its PowerAI software platform aimed at facilitating the practical scaling of AI models on today’s fastest GPUs. Scal Read more…

By Tiffany Trader

IBM Storage Breakthrough Paves Way for 330TB Tape Cartridges

August 3, 2017

IBM announced yesterday a new record for magnetic tape storage that it says will keep tape storage density on a Moore's law-like path far into the next decade. Read more…

By Tiffany Trader

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Nvidia’s Mammoth Volta GPU Aims High for AI, HPC

May 10, 2017

At Nvidia's GPU Technology Conference (GTC17) in San Jose, Calif., this morning, CEO Jensen Huang announced the company's much-anticipated Volta architecture a Read more…

By Tiffany Trader

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

Russian Researchers Claim First Quantum-Safe Blockchain

May 25, 2017

The Russian Quantum Center today announced it has overcome the threat of quantum cryptography by creating the first quantum-safe blockchain, securing cryptocurrencies like Bitcoin, along with classified government communications and other sensitive digital transfers. Read more…

By Doug Black

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

Quantum Bits: D-Wave and VW; Google Quantum Lab; IBM Expands Access

March 21, 2017

For a technology that’s usually characterized as far off and in a distant galaxy, quantum computing has been steadily picking up steam. Just how close real-wo Read more…

By John Russell

Groq This: New AI Chips to Give GPUs a Run for Deep Learning Money

April 24, 2017

CPUs and GPUs, move over. Thanks to recent revelations surrounding Google’s new Tensor Processing Unit (TPU), the computing world appears to be on the cusp of Read more…

By Alex Woodie

HPC Compiler Company PathScale Seeks Life Raft

March 23, 2017

HPCwire has learned that HPC compiler company PathScale has fallen on difficult times and is asking the community for help or actively seeking a buyer for its a Read more…

By Tiffany Trader

Leading Solution Providers

Google Debuts TPU v2 and will Add to Google Cloud

May 25, 2017

Not long after stirring attention in the deep learning/AI community by revealing the details of its Tensor Processing Unit (TPU), Google last week announced the Read more…

By John Russell

Trump Budget Targets NIH, DOE, and EPA; No Mention of NSF

March 16, 2017

President Trump’s proposed U.S. fiscal 2018 budget issued today sharply cuts science spending while bolstering military spending as he promised during the cam Read more…

By John Russell

CPU-based Visualization Positions for Exascale Supercomputing

March 16, 2017

In this contributed perspective piece, Intel’s Jim Jeffers makes the case that CPU-based visualization is now widely adopted and as such is no longer a contrarian view, but is rather an exascale requirement. Read more…

By Jim Jeffers, Principal Engineer and Engineering Leader, Intel

Six Exascale PathForward Vendors Selected; DoE Providing $258M

June 15, 2017

The much-anticipated PathForward awards for hardware R&D in support of the Exascale Computing Project were announced today with six vendors selected – AMD Read more…

By John Russell

Top500 Results: Latest List Trends and What’s in Store

June 19, 2017

Greetings from Frankfurt and the 2017 International Supercomputing Conference where the latest Top500 list has just been revealed. Although there were no major Read more…

By Tiffany Trader

IBM Clears Path to 5nm with Silicon Nanosheets

June 5, 2017

Two years since announcing the industry’s first 7nm node test chip, IBM and its research alliance partners GlobalFoundries and Samsung have developed a proces Read more…

By Tiffany Trader

Messina Update: The US Path to Exascale in 16 Slides

April 26, 2017

Paul Messina, director of the U.S. Exascale Computing Project, provided a wide-ranging review of ECP’s evolving plans last week at the HPC User Forum. Read more…

By John Russell

Graphcore Readies Launch of 16nm Colossus-IPU Chip

July 20, 2017

A second $30 million funding round for U.K. AI chip developer Graphcore sets up the company to go to market with its “intelligent processing unit” (IPU) in Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Share This