My personal blog: file system

Just a few days ago I finally got a new server to replace a good old friend of mine which has been keeping my data safe since 2005. I was literally dying to get it up and running and move my data over when I realized it had been 8 years since I last setup dmcrypt on a server I only had ssh access to, and had no idea of what best current practices are.

So, let me start first by describing the environment. Like my previous server, this new machine is setup in a datacenter somewhere in Europe. I don't have any physical access to this machine, I can only ssh into it. I don't have a serial port I can connect to over the network, I don't have IPMI, nor something like intel kvm, but I really want to keep my data encrypted.

Having a laptop or desktop with your whole disk encrypted is pretty straightforward with modern linux systems. Your distro will boot up, kernel will be started, your scripts in the initrd will detect the encrypted partition, stop the boot process, ask you for a passphrase, decrypt your disk, and happily continue with the boot process.

But when your passphrase is asked, your network is not up yet, there is no ssh access. Either you sit in front of the monitor and type your passphrase, or there is really not that much you can do from a few thousand miles away.

To have encrypted partitions you can manage remotely, you pretty much need:

A "minimal" linux system to boot. Minimal enough that you can get your network up and running, and some protocol so you can connect and type your passphrase. I'll get back to this in a few paragraphs.
Some tool or script to mount your encrypted file systems and continue the boot process once you connect and enter your password.

Sounds easy, doesn't it? I spent some time looking around to see if I could find some pre-baked solution, like a simple package to install that would tweak my initrd and add ssh and the needed scripts, or some suggestion on how to do it in a smart way. In the end, I baked my own solution, exactly like 8 years ago.

So, here it is...

A minimal system to boot on...

Creating an initrd or a tiny partition to do the initial boot did not seem very attractive: for one, I do not want to keep the whole root encrypted. Root contains only tools and scripts downloaded from the Debian repositories. Configs really contain no sensitive data, and the kind of logs I care about do not end up in /var/log. In second instance, my experience with initrd is that it changes quite a bit over time: you need to generate a new initrd for every new kernel and set of drivers, which is tricky by itself, the tools have changed significantly over time, you need to compute (and install) the dependencies for any tool you need from the minimal root, and hook your stuff well enough with the generator so it keeps working over time.

Creating a minimal root outside of the initrd did not seem very attractive either: not fancying having two root partitions to keep up to date in terms of kernel, grub, updates, and so on. And again, I did not need to encrypt root.

The solution I used is pretty simple: have your root and boot in clear, boot from there as normal. From rc*.d, disable all the services that require my encrypted data (like mysql, apache, or my repositories), remove my encrypted partitions from fstab and crypttab (or mark them noauto in both), so the boot process does not stop to ask me for a passphrase, and make sure ssh is up and running at that point in time.

Once the system boots...

Once the system boots, have a script I can run manually that, in order:

decrypts all the partitions (...)
checks that the file systems are sane (remember the fsck run at boot?)
mounts them in the right location
starts all the other services that depend on that data

Encrypted partitions...

Let's start from the encrypted partitions. I've been using LVM and dmcrypt pretty much since they existed. I don't have hybrid systems, always linux only, and I like LVM much more than managing partition tables manually.

One common solution to have multiple volumes encrypted is to create an encrypted volume with LVM, decrypt it, and then use it as a physical volume for another volume group. So you end up, for example, with a system volume (like vg0), system/encrypted as a logical volume, and rather than have a simple file system in there, have another volume group with multiple encrypted sub volumes. I am not quite fond of this solution as it makes it hard to borrow space from encrypted space to clear text space and vice versa, and generally makes things more confusing.

What I tend to do is just have a single volume group, containing some encrypted and clear text logical volumes. This however means that each logical volume has to be decrypted independently, and most of the tools will ask you for a passphrase for each volume, which is annoying. Some wikis suggest you to keep keyfiles on disk, which is roughly what I do: I create an encrypted logical volume, keys, with a strong passphrase, that contains truly random keys, that I only mount when mounting the other volumes and then immediately unmount.

Using this mechanism, the "decrypt all the partitions" step I described above becomes:

ask for passphrase
decrypt keys volume
mount it
for each encrypted volume

load key file in keys partition
decrypt the volume

umount the keys volume
... continue with checking the filesystems yadda yadda ...

As an additional requirement, I wanted those steps to be idempotent: if a partition is already mounted, it should be skipped, if I run the script multiple times, it should just complete the work that wasn't done before.

My solution...

Back in 2005, together with a few friends with whom I was sharing the server, we wrote a small script to maintain those volumes and implement the steps above. The script is now checked in on github, you can find it here: https://github.com/ccontavalli/sys-scripts.

Setup

To get it running, you first need to install the tools, and create the volume where to store the keys, something like:

# Install sys-scripts.
mkdir -p /opt/{scripts,conf}
git clone https://github.com/ccontavalli/sys-scripts.git /opt/scripts

# Install the tools that are needed.
apt-get install cryptsetup lvm2

# Create a volume "encrypted-keys" in group "system", this would
# be "vg0" unless you changed the default.
lvcreate -L 20M -n encrypted-keys system

# Encrypt the partition and open it.
cryptsetup luksFormat /dev/system/encrypted-keys \
    --cipher=aes-cbc-essiv:sha256 --key-size=256 --verify-passphrase
cryptsetup luksOpen /dev/system/encrypted-keys cleartext-keys

# Put a file system on that partition.
mkfs.ext4 /dev/mapper/cleartext-keys

Note that if your volume is not called "system" but "vg0", you will need to edit /opt/scripts/ac-dmcrypt-manage and change cfg_key_volume to look like:

cfg_key_volume=${cfg_key_volume-vg0/encrypted-keys}

or remember to always call ac-dmcrypt-manage with the volume passed, like:

cfg_key_volume=vg0/encrypted-keys ac-dmcrypt-manage ...

Creating volumes

Now you are ready to create volumes. All you have to do is something like:

/opt/scripts/ac-dmcrypt-manage create-volume sytem \
    mysql 20G /opt/mysql ext4

for example, and follow the prompts. You can create as many volumes as you like.

If you want to try mount that volume, you can then run:

/opt/scripts/ac-dmcrypt-manage start

If you want to inspect the generated keys:

/opt/scripts/ac-dmcrypt-manage mount-keys

Just remember to umount them after a whlie, by using umount-keys.

You can also change the mount options of your partition by editing /opt/conf/ac-fstab, which has been generated automatically by create-volume.

Managing the boot process

Let's say now you want to mark mysql as a process that cannot be started until the encrypted partitions are mounted. What you have to do is:

/opt/scripts/ac-system-boot add mysql

The script will disable mysql from the normal boot, by running something like update-rc.d mysql disable.

When the system reboots

All you have to do is ssh on the system, and then run:

/opt/scripts/ac-system-boot start

Conclusions...

This set of scripts has served me well for several years, and will probably stick to them until I find a better mechanism for this kind of setup. Systems like ecryptfs or encfs look like viable alternatives for home directories or private data for individual users. But from what I have read so far dmcrypt still looks like the best option to keep system partitions encrypted, on a server.

Before using ac-system-boot, we tried using runlevels. Isn't this what they were meant for? The idea was to have a minimal network runlevel, and another runlevel with the system daemons to boot once the partitions are available. But between the various alternatives to SysV init that popped up in the last few years, the attention to speeding up the boot process, and various distribution scripts fiddling with rc*.d or assuming one setup or another, this did not work well.

Do you have better proposals? alternatives? let me know.

Have you ever been lost in conversations or threads about one or the other file system? which one is faster? which one is slower? is that feature stable? which file system to use for this or that payload?

I was recently surprised by seeing ext4 as the default file system on a new linux installation. Yes, I know, ext4 has been around for a good while, and it does offer some pretty nifty features. But when it comes to my personal laptop and my data, well, I must confess switching to something newer always sends shrives down my back.

Better performance? Are you sure it's really that important? I'm lucky enough that most of my coding & browsing can fit in RAM. And if I have to recompile the kernel, I can wait that extra minute. Is the additional slowness actually impacting your user experience? and productivity?

Larger files? Never had to store anything that ext2 could not support. Even with a 4Gb file limit, I've only rarely had problems (no, I don't use FAT32, but when dmcrypt/ecryptfs/encfs and friends did not exist, I used for years the good old CFS, which turned out to have a 2Gb file size limit).

Less fragmentation? More contiguous blocks? C'mon, how often have you had to worry about the fragmentation of your ext2 file system on your laptop?

What I generally worry about is the safety of my data. I want to be freaking sure that if I lose electric power, forget my laptop in suspend mode or my horrible wireless driver causes a kernel panic I don't lose any data. I don't want no freaking bug in the filesystem to cause any data loss or inconsistency. And of course, I want a good toolset to recover data in case the worst happens (fsck, debug.*fs, recovery tools, ...).

So, what do I do? I stick to older file systems for longer. At least, for as long as the old system is well maintained, and the new system doesn't have something I really really want (like a journal, when they started popping up).

What else? Well, talking about ext.* file system and my setup...

I use data=journal in fstab, whenever possible. For the root partition, be careful that you need to either add the option "rootflags=data=journal" to your grub / lilo configuration, or use something like tune2fs -o journal_data /dev/your/root/device, so that the file system is first mounted with data journaling enabled. If you are curious, the problem stems from the fact that you can't change the journaling mode when the file system is already mounted, the boot process on some distros will fail if you don't follow those steps.
I make sure barriers are enabled. Most modern disks cache your data in an internal memory to be faster. If you lose power, journal or not, that data will be lost. Journal or not, you risk corrupting data.
With barrier=1 in fstab you ensure that at least the journal entries are written properly to disk. This again can slow you down, but makes corruption significantly more unlikely.
keep the file systems I don't need to write to read only, with the hope that in case things go wrong, I will be at least able to boot and reduce the surface of damage.
other tunings to reduce battery use.

So, here's what my fstab looks like:

/dev/sda1 /boot ext2 nodiratime,noatime,ro,nodev,nosuid,noexec,sync 0 2

/dev/mapper/root / ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 1

/dev/mapper/opt /opt ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 1

/dev/mapper/media /opt/media ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 2

/dev/mapper/vms /opt/vms ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 2

Note that I use LVM underneath. It was hard for me to start, I was fearing an extra layer of indirection and possible caching would have complicated things :), but encryption and snapshotting features sold me, and I've been happily using it for years.

I use a similar setup on a raspberry pi in a small appliance that I cannot properly shutdown, and have been plugging and unplugging it directly without headaches for a while (luckily? maybe).

So, what's next? I'm looking forward for a logged file system or a stable file system that properly supports snapshots. Something like NILFS or maybe btrfs In the past, I had a script taking snapshots of my LVM partitions periodically and at boot, so if I screwed up with an update or accidentally removed a file I could easily go back in time.

I gave up on that as LVM snapshots turned out to be fairly buggy from kernel to kernel, not well supported by distributions (had at least one instance of initrd scripts getting confused by the presence of a persistent snapshot, and refusing to boot), and often lead more headaches than advantages, at least for personal use. I will probably give them a shot again in the near future :), but for now, I'm happy as it is.

My personal blog

Tuesday, April 16, 2013

Many encrypted volumes, a single passphrase?