Field Encryption Keys for SAAS/Cloud computing

Entry for the RSA BSafe Share security competition,  June 2009.
Anthony Berglas,
Andrew Baker

Abstract

This project demonstrates a novel mechanism that enables large numbers of individual data fields to be encrypted without the need to store the associated large number of encryption keys.  This has then been used to create a novel architecture that can provide real role separation in a SAAS/Cloud computing environment.  The implementation also includes a general purpose JSSE client/servlet toolkit which sends property lists between authenticated clients and servers.

The Problem

Cloud computing and Software As A Service (SAAS) has seen keen interest by many enterprises.  However,  it raises serious security issues as corporations may want to store sensitive personal data in applications which are managed by third parties.   One way to avoid the need to trust the cloud provider is to encrypt the information stored in the cloud, and then decrypt it on the client.  A policy engine can then determine which fields of which records can be decrypted by which users.
architecture
This is illustrated in the diagram above.  The Cloud Server provides the cipher text while a separate Policy Server provides the individual Keys.  This provides real role separation because the Cloud should never sees the keys, the Policy Server never sees the data, and the client can only decrypt data that they have access to.  A Client must be authorized by both the the Cloud and the Policy servers in order to access both the cipher text and the encryption keys.

Different data elements may have different access rights.  For example, an Employee may be able to see all fields of their own record, their department manager may see most fields of the employees in their department, but only the personnel department may see the health fields.  This can be achieved by using different Field Encryption Keys (FEK) for each protected data element.
Use Cases
In the example illustrated above, the user has been authorized to view data element 1, but not element 3.  So if they ask the Policy Server for element 1's key it the request will be granted.  However, if they ask for the key to element 3, the Policy Server will will decline the request.  Any attempt by the user to decode element 3 with the key to element 1 will fail because element 3 was encrypted with field 3.  (This assumes that the SAAS server has been compromized and is granting the user access to element 3's cipher text.)
 
In order to facilitate a fine level of control, each sensitive field of each individual record needs to be encrypted with a different Field Encryption Key.  So if there are 1,000,000 records and 5 sensitive fields there would be 5,000,000 FEKs.

One way to achieve that would be to store FEKs for each element of each record in the key manager or policy engine.  However, that would require the key manager or policy engine to have a very large and volatile database which is not  practical.
 
Another way would be to wrap the FEKs with a master Key Encrypting Key (KEK) and store the wrapped key with each record on the Server.  The Client could then pass the wrapped FEKs to the policy engine for unwrapping.  However, the need to store the wrapped keys would add considerable overhead to the storage of the records if small fields are used.
 
Further, in either of the above approaches the policy engine needs to be able to stop the client from obtaining the FEK of a field to which it does not have access to.  This means that signed policy data such as the employee number and department number need to stored with each individual wrapped FEK.  An integrity digest should also be stored.  These would make the overhead much larger than 16 bytes for each encrypted field in each record.  Thus in a realistic system the key and policy data could end up being much larger than the plain text that is being protected.

The Solution

We have a different approach which avoids the need to store the FEKs anywhere.  We do this by enabling the policy engine that can securely (re)generate the FEKs each time that they are required.  This is achieved by encrypting a digest of the record number, field name and other policy parameters, and using the resultin cipher text as the FEK.
FEK

This is illustrated in the diagram above.  The client retrieves an encrypted field of a particular record from the server.  The client also retrieves the Field Encrypting Key from the Policy Server, so that it can decrypt just that field of that record.  The policy server examines Client user's credentials to authorize the access.  It then encrypts a digest of the Record Nr and Field Name with the Master Key that is stored in the Key Manager.  This results in a unpredictable series of bits which is then used as the Field Encryption Key.

The Policy Server actually obtains the Record Nr, Field Name and Policy Parameters (such as Department Id) directly from the client.  A hostile client could forge these values in order to mislead a Policy Server to provide it with a FEK to which it is not authorized.  However,  the resulting keys would not be useful because the Digest includes all the parameters used to make the policy decision.  Thus the Policy Server would return a FEK, but it would not be the one required to decrypt the cipher text.  (If a policy parameter such as Department Id were to change then all fields on the relevant record would need to be re-encrypted.)

In a full system there is no need to rely on a single master key.  For added security different master keys could protect different types of information.  For greater protection the Master Keys could be stored in a Hardware Security Module that is attached to the Policy Server.  This would mean that the Policy Server itself never sees the unwrapped master key.  Further, multiple key managers and policy servers could also be provided, with the FEK be the XOR of the keys returned by each one.  Thus an attacker would have to defeat the Cloud Server (to obtain access to the cipher text) plus all of the Policy Servers.

The actual policies as to which users can access which data should be implemented by both the Cloud Server and Policy Server.  The Policy Server cannot ask the Cloud Server for policy information, as that would defeat the role separation.  However, depending on the application, the task of maintaining the policy could be simplified by having either the Policy Server or the Cloud Server implement a simplified, more liberal set of policies.  Many architectural variations on this theme are possible.

Thus the novel arrangement provided by this submission provides a high degree of protection without the performance overheads of the key storage.   The fact that the Field Encryption Keys need not be stored enables a large number of them to be used in practice.  This in turn enables fine grained encryption which can be tightly coupled to authorization policy.  Perhaps more importantly, FEKs that are never stored can never be lost, so it greatly improves the integrity of the system.

Implementation

Architecture

There are four main modules in the system, each stored in a different Java Package.  They are:-
The ant task runs the Client, Cloud and Policy Servers which communicate via SSL.  For convenience they actually run in the same process, but still communicate via SSL.  It is also easy to run them in separate processes or separate machines by simply running the Main classes in each of the Client, Cloud and Policy servers.  And for development, they can be run in a single process with direct communication without SSL.  See crytpo.rsacomp.Main for details.

Field Encryption Keys

The Client authenticates with the Cloud and Policy servers using a client certificate.  The Policy server then uses this certificate's Subject to index into a simple authorization database of users.  For the purposes of this demonstration, the Cloud server performs no authorization in order to be able to demonstrate the use of encryption to provide security.

The Policy Server then calculates the FEK using a SHA256 digest of the Field Name, Record Id and Department Id.  The result is then folded into 16 bytes and encrypted with the master key and AES128.  This result is then returned this to the client as the FEK.  Including the Department Id in the digest enables it to be securely used in policies such as Managers can see the Salaries of all employees  within their department.

The clients then decrypt the fields to which they have authorized access.  The encrypted fields include a 4 byte folding of a SHA256 of the plain text to ensure the integrity of the data.

Note also the way that the Employee record has been defined, using the Field class.  This generalized approach is similar to what is used in simpleorm.org, and provides flexibility with simplicity and no reflection.  Methods such as getName could obveiously be added to make Employee look like a POJO externally.

Java Secure Socket Extension (JSSE)

The secure socket module implements Suite B encryption using the BSafe Share for Java library.  It provides an SSLServlet abstract class that is similar to HTTPServlet except that a new instance is created for each request (as it should be for HTTPServlet!).   The server provides concurrent requests using a thread pool, by default four threads are enabled.

The module sends java.util.Properties between the client and server.  It does not implement HTTP, but rather simply terminates each message with a null byte.  This is safe as unloaded Properties are always pure ASCII, with a special \uXXXX notation used for special characters (as tested by SSLMain).  A maximum message size is imposed, by default 10,000,000 bytes.  No attempt is made to reuse sockets for multiple messages, although sessions are reused correctly.

We would suggest that the JSSE examples in the J/Share package also be shown in this form.  The inverted SLLServlet structure is almost certainly more useful to the end user, and does not add much overhead in practice.  (We would also suggest that HTTP be properly implemented rather than the null byte terminator, we just did not need HTTP for our example.)

Security Analysis

A secure block cipher must provide total Diffusion and Confusion.  The Diffusion property means that the cipher text used for the FEK is totally unpredictable even though an attacker knows the exact plain text.  Thus it is easy to show that the basic FEK mechanism is secure provided that AES is secure.

(It would also be possible to skip the AES encryption  One would just add the Master Key to the inputs of the SHA256 digest, and then use the output of the digest as the FEK.  But the two step process enables the AES phase to be performed in a separate HSM, and so better protect the master key.  Further, the focus of digest cryptanalysis is on collisions, while the focus on cypers is upon key recovery, so AES feels slightly more secure.)

We do not use a (non-trivial) explicit Initialization Vectors (IV) for either the Master Key or the FEKs.  For the master key there is no point, as there would only ever be one IV for all the FEKs.  (In a more complete solution we might use a number of Master Keys for different types of FEKs.  But each master key would still only have one IV.)  

We do not use or store an IV for the FEKs because each field of each record is encrypted with a different FEK.  The only weakness is that the same FEK is used at different points in time.  Thus if an attacker knew the specific values of the cipher text / plain text pair of a specific field on a specific record, they could then determine if that field of that record was ever set back to the same specific value.  This is a very obscure leakage which would not normally justify an IV.

(Recall that knowing the values of one record's field provides no information about a different record.  Further, the 4 byte integrity  digest is prepended to the plain text.  In combination with CBC mode this essentially uses the entire value of field itself as an IV.)

For greater security an IV could easily be included, we just have made a business decission not to in our example.  A different one could be used for each field, or one could be shared for all fields in a record.  Because of the local scope of the FEK, a simple counter would suffice.  Storing an IV would also effectively shred the FEKs each time the data is (re)encrypted, which could be useful if a user's access rights are reduced.

If an IV is not used then we cannot use a stream cipher such as Galois Counter Mode for.  If we did, then an attacker that knew one plain text / cipher text pair for a specific field in a specific record could easily decipher any new value for that specific field in that specific record.  CBC mode is largely immune from that attack, and is completely immune given the prepended 4 byte integrity digest.  

(An insecure alternative would be to use the integrity digest as an explicit IV, and then a stream cipher could be used on the rest of the plain text.  However, that would enable an attacker to brute force the plain text against the exposed digest.)

A further refinement would be to encrypt small fields without needing to pad blocks.   It would be good if J/Safe could provide a variable block size algorithm.  A number are published, but are not Suite B.  However, it is easy to construct a variable block cipher using AES or SHA256 and a single stage Feistel network.

Conclusion

Cloud computing and Software As A Service provides new challenges for data protection.  Field Encryption Keys enable real role separation between the Cloud Server and the Policy Server.  However, the large number of FEKs that would be required would not be practical without our novel approach to which avoids the need to store them explicitly.

We have provided a sound proof of concept of the practical use of FEKs.  We have also produced a generic JSSE server package, and demonstrated the use of client certificates for authentication.

In combination these provide a solid platform for practical applications to be built that utilize this innovative technology.

End.