Description
Data at Rest Encryption
Issue
Riak as a whole does not encrypt data at rest.
https://www.tiot.jp/riak-docs/riak/cs/2.1.1/cookbooks/faqs/riak-cs/#does-riak-cs-encrypt-data-at-rest
https://quabase.sei.cmu.edu/mediawiki/index.php/Riak_Security_Features
Problem
The problem has three fronts:
- Many competing systems are offering encryption of data at rest i.e. marketing.
- Industries are starting to make encryption of data at rest a requirement i.e. usage viability.
- Hackers are becoming better at accessing file systems where non-encrypted data is at risk i.e. security.
Potential Pitfalls
- Partitions are moved between servers so an encryption key would have to be standardised between all nodes in the cluster to allow encryption and decryption i.e. if a partition is encrypted by one node with one key and then transferred to a node with a different key, it is now useless.
- As Riak runs inside an ErlangVM, the encryption key would most probably need to be stored in plain text format.
- Being open source, it should not be long before a potential attacker learns the location of the encryption key, copies it and is able to use it to decrypt data again.
- Keeping the encryption key on one server and sharing with the cluster at run time works until the server that stores the key goes offline.
Security Concepts
Security works in three ways:
- Who you are
- What you have
- What you know
"Who you are" is based on the user you are accessing the server as. If the attacker is root
or riak
then they can just ask Riak for the full database in human readable form, so let's ignore this one.
"What you have" would apply to the encryption key. Anybody with the encryption key and a bit of technical know how should be able to retrieve human readable data from the file system (data directory) of Riak even if it is only partial data.
"What you know" would be where real users can be differentiated from attackers. Real users will know things such as bucket names and key names. Attackers are thieves of opportunity and will be unlikely to know this information especially if all they have to work with is the file system of a single node.
Proposed Solution
Combining the above Security Concepts, we can see that the only difference between an attacker and a genuine user is the knowledge of the bucket and key names. I would like to propose the following solution to encrypting data at rest within Riak KV:
- Share an encryption key between all nodes in the cluster (probably user specified text file saved per node).
- Have an option in
riak.conf
to allow encryption to be used (some people might not want it yet). - Have a start up check and possibly a polled interval that makes sure all nodes in the cluster are using the same key and have encryption turned on.
- When writing data, use the encryption key as discussed but salt the key with the bucket name and the key name to encrypt the value stored.
- When reading data, decrypt with the same salted encryption key which should work as the user will be providing the bucket name and key name.
- Buckets are tagged as encrypted so that we can check what percentage of the data is encrypted
By salting the key with the bucket and key names, it would render the plain encryption key nearly worthless to a potential attacker as they would have to brute force all potential bucket/key name combinations to access the data.
Legacy and Migration
Assuming that the above functionality can be implemented, users may wish to encrypt an already existing Riak install or may decide they no longer wish to keep their data encrypted and wish to decrypt everything. I propose that a default read/re-write mode be created. This mode would check whether data is meant to be encrypted or not and every time Riak reads data that is not of the desired encryption type, it re-writes the exact same data but in the desired encryption type.
Through this method, users with high load clusters could wait for their data to naturally become encrypted/decrypted or users in a hurry could run a script to read every single bucket to force the encryption. Just in case clusters are subject to performance spikes, potentially add a user definable config flag e.g. 70% where read/re-write is temporarily disabled while it waits for the cluster to be under less than 70% load before resuming.
We might want to add an output to a command such as riak-admin cluster status
that would include the "Percentage encrypted" information for the cluster.
Additional Security
As security is all about layers, it might be worth looking at storing the key with extremely limited permissions i.e. to access the encryption key you would have to be the riak
user or preferably just root
but this would depend on how easy it would be to get the ErlangVM to access said encryption key.