Or: Why Disc Encryption Won’t Save You
There’s a persistent meme floating around that full-disc encryption of your VM’s discs will save you if some three-letter agency comes knocking on the door of your VM host and demands your data. This is futile. Let me explain why.
Your Host Has Your Keys
Let me make this as clear as I possibly can: when your VM is running, after you have unlocked the disc encryption, the only key I need to decrypt your encrypted partition is present in the RAM of your guest.0
What looks to you like the RAM of your guest VM is the memory of an ordinary process running on one of your VM host’s servers. In theory, if I have access to both the server your VM is running on and your encrypted disc image, I ought to be able to dig your master key out of your VM’s RAM, and decrypt your disc.
A couple of assumptions: you’ve made yourself a LUKS-encrypted Debian Wheezy VM with the default settings as provided by the Debian installer, running on a host which uses qemu. Nothing controversial here, I hope.
If you’re actually doing anything useful with your VM, you’ve had to unlock the disc for it to boot. Under any useful configuration, this means typing your passphrase in over a serial line.
Your host will have ensured that your serial access is over an encrypted channel, be that SSH or HTTPS. This protects you against miscreants outside your host’s organisation. Your host still needs access to your unencrypted keypresses to pass them off to your VM, so what if they were tapping that input and saving it off for later playback? They’d be able to take a copy of your disc, spin up a new VM instance with it, and play your keypresses back to unlock it.
Ah, but that would require them to be watching when you boot your machine. If you switch your VM on and then leave it running, as long as nobody knows to watch while you’re unlocking the disc, maybe you’re OK?
No, because of how LUKS must work. LUKS encrypts your data with an AES key: the Master Key. It encrypts the master key with your passphrase when you set up LUKS, and stores it in the first few sectors of your partition, ahead of where your data goes.
In everyday use, your guest kernel needs that master key unencrypted for any disc access, not just at boot, which means it has to keep it in memory.
Let’s see how I can get it.
Getting Your VM’s RAM
For this I use
gcore. It’s in the
gdb package on wheezy. Although
it would be slightly unusual1 for it to be installed on a production host
server, remember: if you don’t trust the host, those bets are off.
Generating a core dump containing the VM’s RAM looks like this:
$VM_PID is the process ID your VM is running as on the
Your master key is now somewhere in
Getting Your Encrypted Data
To read off the encrypted data, we need to first find the start of your partition within your disc image, then skip over the LUKS header.
/sbin/fdisk -l can tell us the first part.
Next, in theory we would need to query the LUKS header to find out the offset to the start of the encrypted data, but I’m going to cheat: I happen to know that with Debian Wheezy’s default settings, the LUKS header is precisely 2MB. This cheating doesn’t really affect the outcome here, since we could read the offset out of the header and do the relevant maths, but I can’t be bothered right now.
Here’s a python script to pull only the encrypted data out of the full
disc image with
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Call it like this:
encrypted.img will contain only the encrypted portion of
Generating Keys from Core
What’s the simplest thing we could do to find the key now? It’s to
search through every single possible 64-byte chunk of RAM in
qemu.core and to try decrypting
encrypted.img with it until we get
some content we recognise. For a VM with 1GB of RAM and, say, a 5GB
disc, that’s a lot of work: to check every possible key by trying to
decrypt the whole disc, we’d end up running 5 Exabytes through
AES. That’s absurd, but there are three tricks we can use to simplify
Trick One: the key is probably aligned to a 4-byte boundary in RAM. That divides the number of candidate keys down from a billion to 250 million.
Trick Two: the key is not likely to be a consecutive string of the same byte value, nor is it likely to have more than, say, four zero bytes in a row. On my test that drops the number of candidate keys from 250 million to about 25 million.
Trick Three: we don’t have to decrypt the whole disc. Assuming the encrypted blob contains an LVM physical volume, we know that by default the first sector is zeroed out. If we find zeroes when we try to decrypt the first 16 bytes of the blob, we can be relatively certain we’ve found our key.
All told, that means we need 25 million AES operations to have a very good chance of identifying the correct master key, which we can then use to decrypt the rest of the disc. That’s not a very big number at all: AES is designed to be fast.
Here’s a script to generate all the 64-byte chunks we might possibly
be interested in as potential keys from a core dump. You’ll need
pyelftools installed to run it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Run it like so:
and it will print all the possible keys to
stdout, one per line.
Finding Your Needle In The Haystack
Having generated a list of candidate keys, we need to try each one against your encrypted data to see if we get a match. By default, the data in a wheezy LUKS-encrypted partition is encrypted with AES in XTS mode. By happy chance, the CryptoPlus python package implements AES-XTS for us in a very easy-to-use way.
Here’s a script which pulls keys off
stdin one by one, testing each,
exiting when it decodes your first 16 bytes to a run of all zeros:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
To use this you’ll need to download and install CryptoPlus separately.
It requires pycrypto to be installed. CryptoPlus can’t be pip
python setup.py install Just Worked for me.
Our pipeline now looks like this:
We expect this to run for a while, then spit a single key to stdout.
All Your Data Are Belong To Me
Having got this far, decrypting the image is conceptually
straightforward, and computationally heavy. We have the key and the
data, so we just need to plug them together and loop until
done. Here’s a script which will take the master key from
trydecode.py, and use it to decrypt the encrypted blob:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
And here’s how to add it to our pipeline:
1 2 3
After this runs, you have a
decrypted.img file2. We’re not quite out
of the woods, because it’s not a filesystem image; it’s an LVM PV
image. Going from the latter to the former means stripping the LVM
metadata off which, again, means
dd and happening to know a magic
number - in this case, the length of the PV metadata chunk we want to
ignore. It’s 1MB by default.
ext4.img now contains your data. You can verify like so:
1 2 3 4 5 6 7
On my laptop, with a core dump from a 1GB VM and a 5GB encrypted disc image, the above pipeline takes an hour and a half to pull the key out of the core dump, and then another 7ish hours to decrypt the data. Bear in mind that this is noddy, shoddy, unoptimised CPython 2.7 here: the core decryption is in C, but there’s a lot of CPU overhead around that which could be trimmed if you wanted to make this run faster.
What I’ve shown here is nothing groundbreaking, or even particularly clever: it’s just a brute force search over a set of available keys, with a not-unrealistic set of assumptions around what data an attacker might have available.
If you have content you need protecting from prying eyes who might have access to your VM host, disc encryption will only help you if you can switch your VM off before they get there. A dedicated host would do better, if you’ve got a working case alarm to cut the power when anyone opens it. It’s harder to pull a core dump from a physical host, although I wouldn’t be surprised if some IPMI platforms could do it.
If you’re familiar with LUKS, you’ll note what we didn’t do. We didn’t go after the passphrase. The master key is protected on disc by sticking the passphrase through PBKDF2, an algorithm explicitly designed to make brute force cracks harder by making them slower. By going directly to what’s in RAM, we avoid having to do that, and instead lean on sticking a very small amount of data through AES: a fast operation.
TL;DR: if you have data which you can’t trust your VM host with, don’t give it to them. It’s that simple.
0: In working on this article, I found that the unencrypted passphrase was also in the VM’s memory. Depending on how qemu’s serial lines are set up, it looks like it can hang around in an uncleared serial buffer well after the VM has finished booting. Identifying it without knowing what it is in advance is harder than going for the master key, though.
Unusual but not rare.
gdb is genuinely useful if qemu is playing up
and you need to debug it.
2: Extra special bonus: once I have the master key, I can write my own passphrase into the LUKS header to give myself access at any point in the future, just in case you haven’t done anything incriminating enough yet.