I am storing some data for which I have legal liability. The data relates to human subjects research.
What's the current security best practice for storing web accessible personally identifying information (say, social or credit card or merchant token or billing address)? This is info that needs to be decrypted, not just hashed. I would guess AES is the best encryption option, but I have a few use case questions.
Say for ease of explanation I will be storing data in a flat file, with each line being (in practice this will be in an actual database, but for our conversation here say it's flat):
username | somethingsecret
What's the best practice?
1) Do I use one aes key for everyone, or separate aes keys? Assume the client can't store everything and I need to be able to do both encryption and decryption on the server.
2) What's a relatively future-proof key size currently?
3) If I need to implement, for example, a search on the encrypted field, is the best practice to temporarily decrypt the entire db, do the search, and then junk the decrypted copy? Privacy/security is more important than speed. Should I maybe do this on a ramdisk and then destroy the ramdisk? Yuck.
0. Make sure you don't implement any crypto yourself. Make sure you do understand how to use whatever implementation you go with.
0.a. I don't remember what the current state of the art AES mode is supposed to be. CTR? You want to encrypt-then-MAC. Again, you don't want to implement any of this yourself, but you should at least learn/understand what you're using.
0.b. For those reasons, it might be worth using NaCl::secret_box, which basically does the right thing has wrappers in many languages. e.g. ruby:
https://github.com/cryptosphere/rbnacl/wiki/Secret-Key-Encryption
0.c. However, this is only if you have to roll some technology yourself. You want to roll as little as you can.
1. The question is really, "what's the attack vector?" If it's just to secure data at rest, something like a single password to encrypted the HDD/SSD in question is probably sufficient. Have a data center? Same deal, except now you have multiple machines to secure instead of one. You're not realistically going to be able to protect that data from a malicious process on the machine no matter what you do. If you have different users accessing the data who should only see their subset of it, you probably want additional controls (e.g. per user passwords and ACLs, or something of that sort) to enforce that user A can't see user B's data. But again, it depends on what the attack vectors are that you're trying to defend against.
2. For symmetric key crypto, 256 bits (i.e. AES256) should be sufficient for almost any use case.
3. The common approach for
databases with PII that needs to be encrypted and yet still used is, AFAIK, to (A) encrypt the HDDs will full disk encryption, (B) secure the machines network wise (DMZ, separate application and data servers, etc), (C) secure the machines physically (locked, secure data center), (D) appeal to authority, i.e. what HIPAA does, and explain in writing with patient earnestness that you're doing everything you can whether that's true or not.
3.a. There's a decent amount of bleeding edge research on searching encrypted data in place (
http://outsourcedbits.org/2013/10/06/how-to-search-on-encrypted-data-part-1/), but I don't know of any software database solutions that implement such things. But it's also true that I haven't looked into this seriously in a few years, and something may have changed.
3.b. ... but if said something hasn't made it into Postgres yet, it's probably not worth relying on. It's certainly not worth trying to code, unless you're angling for that PhD.
edit: if my little brain dump is confusing or unhelpful,
Perhaps a combination of the two. Run the server on appengine which talks to a database on a different machine with encrypted disk etc
This is probably a good place to start.