Rearranging The Deckchairs

Frank O'Dwyer's blog

Backing Up Is Hard to Do

This is prompted by a recent discussion on twitter about whether it is better for a backup service provider to encrypt the data on the server side or the client side. My opinion is that server side encryption provides absolutely nothing for this problem, and leaves the server open to some catastrophic failure modes which are unrecoverable. It’s not that it causes these problems, it just fails to address them.\ \ It is easier to explain my position in a blog post than in the 140 chars provided by twitter so here it is. :-)\ \ First, some background. The original problem was termed by Joe as - and I’ve a feeling I’m going to see some weird google searches in my stats by typing this - the ‘Britney Spears Problem’. In other words, what if Britney Spears uses the service and one of the backup provider’s staff leaks her backup data to the press?\ \ So basically the requirement is to keep the backup secret, and the obvious solution for that is to encrypt it. Which then raises the question of where - should the client do it, or the server? If the client does it, then the obvious drawback is that if they forget their key, they are hosed and can’t recover their data. If the server does it, the question is what benefit does that provide, compared to just backing up clear data? \ \ My answer to this is none, which I’ll elaborate on here. The short version is that if you can’t trust the service to hold one secret then you can’t solve the problem by asking them manage two. Trying to do this on hardware you don’t control is like trying to build the cryptographic equivalent of a perpetual motion machine. (A classic example of an attempt to do this is when MS tried to prevent access to the SAM by encrypting it and obfuscating the key. Nobody bothered to try to recover the key, they just called the same API MS does to decrypt the SAM, and let that find the key and use it.) \ \ So, stepping back a bit, pretty much every security problem involves one or more of the holy trinity of confidentiality, integrity, and availability. And a backup service provider has the unenviable task of maximising all three of these, i.e.:\ 1. Confidentiality - you want your backup data private 2. Integrity - your backup data needs to be right 3. Availability - you need access to your backup data at any time

The last two, integrity and availability, are closely related - solutions for either tend to help with both - so they are sometimes lumped together as ‘reliability’, i.e. your backup needs to be confidential and reliable.\ \ Note that these requirements actually conflict and must be traded off against each other - it’s like ‘good, fast, cheap - pick any two’. Data that is public - e.g. in a newspaper - is much more available and it’s easy to check if you have a real copy - but then public data isn’t secret. Or, if you want to keep data secret, then that means you want it to be less public, which in turn necessarily means it is less available. The less available data is, the easier it is to spoof or change without detection, too - so in general it’s also harder to make sure that secret data is right. And any security control at all tends to increase complexity, which tends to reduce availability for the simple reason that there are more moving parts to go wrong. So from the get go you’re trying to solve conflicting requirements that can’t all be met. In my experience, this leads to a lot of circular conversations.\ \ So, with that limitation in mind let’s think about the problem of keeping the backup private. If the client can’t remember a key, then they necessarily can’t encrypt the data and they must give the clear data to the server. So the understandable reaction is ’but we must encrypt it - so we’ll do it on the server! It’s better than not encrypting it, right?’. Which then raises the question of where will you store the key - you can’t store it on the client. And you can’t encrypt the key because you’ll just shift the problem to the key used for that! Around this point the conversation starts to go like this:\ \ Q. So if the server encrypts where will you put the key?\ A. In some database\ \ Q. Won’t your staff have access to that?\ A. Yes but the same staff will not have access to both the key and the data (add litany of controls here…auditing, authentication, logging, firewalls etc, all intended to limit or detect access to the key).\ \ Q. Well if those controls are considered good enough to prevent access to the key then they should be good enough to limit access to the data. So why not use them to limit access to the data in the first place, and skip the encryption completely?\ A. But this way you need to get the key and the data…either on its own won’t do\ \ Q. Well aren’t the key and the data both stored in the same cloud? And won’t some server need to be able to automatically decrypt the data to provide it to the client? Can’t I just breach that and get it to decrypt the data for me without knowing the key?\ A. (repeat litany of controls designed to prevent this)\ \ Q. Well if those controls are considered good enough to prevent access to the server that has the data then they should be good enough to limit access to the data. So why not use them to limit access to the data in the first place, and skip the encryption completely?\ A. But if we don’t encrypt then the data is in the clear\ \ Q. But if the server encrypts, where will you store the key?\ \ …rinse and repeat…\ \ Of course client side encryption has its drawbacks too. There is the obvious one that if the user forgets the key they can’t recover the data - but for that to be a problem, they have to lose their data as well and it has to happen at the same time. If all they do is forget their key they can provide a new one and re-encrypt an encrypted master key which the server holds (this avoids having to transmit and re-encrypt gigabytes). If they just lose the data, well that’s what the service is for. Also, even if the worst happens - it’s only a problem for that customer. Whereas a server side breach is a problem for all the customers, and a problem for all your customers is a catastrophe for any business. In this case, client side encryption means never having to say you’re sorry. (And by the way, the server is not immune to the problem of losing keys, and if it does the impact will probably be all or multiple customers too.)\ \ Another drawback with client side encryption is the server loses the ability to store deltas of different file versions, so the backup takes up more room. However you could perhaps still remove duplicate files by having the client send a hash of the clear file and comparing these on the server. (There is some data leakage with this, but maybe not that much. For example, the server can tell if you have a particular application installed if it can see the hash of the binary.)\ \ Having said all that there are some reasons to do encryption on the server, just not for this problem - for example if the data storage is delegated by the service to a third party, then it would make sense for it to encrypt it and hang on to the key somewhere that the third party has no access to. However, the server is in the client role in that case, and so this is actually client side encryption. There are also some cases where it makes sense to use encryption for distributed access control and segregation of duties, but that too is a different problem.\ \ So, my advice would be to forget about server side encryption and offer the option to do it on the client (with heavy warnings about data loss if the customer chooses this). The customer key could be just a password or something stronger (but then you have to store it somewhere other than the main customer pc). Then no matter what they choose, implement separation of duties and heavy access control on the server side. Doing server encryption won’t hurt, but it probably won’t help either. \ \ So that was probably more than you wanted to know and certainly more than I intended to write :-) Hopefully it is of some use.\ \ image