Friday, June 28, 2013

An unwilling dive in xfce4 internals

I've always liked text consoles more than graphical ones. This at least until some time in 2005, when I realized I was spending a large chunk of my time in front of a browser, and elinks, lynx, links and friends did not seem that attractive anymore.

Nonetheless, I've kept things simple: at first I started x manually, with startx, on a need by need basis. I used ion (yes! ion) for a while, until it stopped working during some upgrade. Than I decided it was time to boot in a graphical interface, and started using slim. Despite some quirks, I've been happy since.

In terms of window managers, I really don't like personalizing or tweaking my graphical environment. I see it as a simple tool that should be zero overhead, require no maintenance, and not get in the way of what I want to do with a computer. I don't want to learn which buttons to click on, how to do transparency, which icons mean what, or where the settings I am looking for were moved to in the latest version.

So I started using xfce. Not because of a particularly well informed choice, just because it worked out of the box with a reasonably minimal interface, and was fast to load. And if all you need is a browser opened on a pane, and gnome-terminal on the other, this is a really good choice.

Today, however, it broke :( for the first time since I have installed it, and through years of unattended upgrades, I'm forced to write this post from a tiny window on the left top corner of my monitor.
And to fix it, I had to find out much more than I ever wanted to know about a window manager and xfce.

So, here are the symptoms: rebooted the laptop after a long long time (usually I just put it to sleep), got to the login prompt with slim, entered my username and password, and... sadness, I get a tiny terminal in my leftmost top corner, nothing else, my mouse looks like that X I hadn't seen in a long time, on black and white background. No sign of the usual desktop, tray or windows, which I can't even resize.

I really didn't want to deal with this, and had things I wanted to do on my laptop. So, what to do? For a while, I just used this tiny xterm. But this got boring pretty quickly: if I opened the browser, I had to close it to go back to the terminal. No alt+tab, no panes, couldn't move or resize windows. I finally decided it was time to fix the problem.

Here's the things I did and unwillingly learned in the process, just in case you end up in a similarly tragic event. Note that each debugging round is about 30 minutes long, the time it takes me to get home and get to work on public transport, minus some time to read emails or interact with other people around me.

First round...

I don't believe in reboots as the one size fits it all solution, but my first hope was that this was some sort of transient failure. Maybe something just went wrong during startup, so I tried the following things:

/etc/init.d/slim restart - to restart the login manager, and start xfce4 again, it did not help, no useful message on the console, no error whatsoever in the logs. Clean as a bottle of grappa in the winter.
/etc/init.d/slim stop, and startxfce4 - same problem as above, but at least ruled out a problem in slim, which is a good start.

From the tiny terminal, I then started xfce4-session, which is supposedly the component that starts up xfce4. Unfortunately, it just bailed out with error:

"xfce4-session: Another session manager is already running".

Which at least told me xfce4-session had been started already. I could confirm with "ps -C xfce4-session", but "ps faux" only showed my terminal as a child of xfce4-session, while from my past memories I believe I would see many more xfce components.

So.. why did xfce4-session not start anything else? I started poking around in log files, with no luck. Nothing logged at all. strace -fp `pidof xfce4-session` also showed it was just sitting there waiting for some syscall to complete.

I started with the idea that one of the components was not started properly, so I started manually fidgeting with the various pieces of xfce4.

Running xfwm4 manually gave me the naked window manager. At least I could now resize and move windows, victory! Still no start menu, still a single panel.

xfce4-panel gave me, well, the panels, and the "start buttons" at the bottom of the screen.

At this point, my graphical interface was in good enough shape to be usable again. Good, I could stop worrying about it, and do some real work :).

Second round...

Second day, second round. Let's assume that xfce4-session is not starting everything it should. How is it configured? according to the man page, beside a few caches it reads its configurations by using xfconf.
 xfce4-session reads its configuration from Xfconf.  xfce4-session stores its session data into $XDG_CACHE_HOME/sessions/.
man page also refers to a "sessions" subdirectory of $XDG_CACHE_HOME, by default in ~/.cache/, and a set of subdirectories in $XDG_CONFIG_HOME, by default in ~/.config/.

Let's start poking at xfconf-query. A simple:

$ xfconf-query
Channels:
  thunar-volman
  xfce4-mixer
  keyboards
  xfce4-desktop
  xfwm4
  xfce4-power-manager
  xfce4-settings-manager
  xfce4-panel
  xsettings
  xfce4-keyboard-shortcuts
  thunar
  pointers
  xfce4-session
returns a list of channels, and after reading xfconf-query --help, I tried:

$ xfconf-query -R -c xfce4-session -l
/general/FailsafeSessionName
/general/SaveOnExit
/general/SessionName
/sessions/Failsafe/Client0_Command
/sessions/Failsafe/Client0_PerScreen
[...]
which gave me the list of settings for xfce4-session. To fetch a variable, I can do:

$ xfconf-query -c xfce4-session -p /sessions/Failsafe/Count
5

But nothing particularly interesting turned out here. So let's poke at the $XDG_CACHE_HOME, and $XDG_CONFIG_HOME.

$XDG_CACHE_HOME seems just a collection of cached files, ranging from chrome to duplicity, and well, xfce.

In ~/.cache/sessions, aka $XDG_CACHE_HOME/sessions, referenced earlier, I see a list of files:
Thunar-xxxx-33af-41b8-80c9-xxxx
xfce4-session-joshua:0
xfce4-session-joshua:0.bak
xfwm4-xxxx-1ecf-41cf-8d02-yyyyy
xfwm4-xxxx-1ecf-41cf-8d02-xxxxx.state
let's look at xfce4-session-joshua:0 to start with, cat xfce4-session-joshua:0. Turns out it's a simple text file, providing settings for each program I had started during the last session of xfce4? seems plausible (some stuff replaced by xxx and yyy):
[Session: Default]
Client0_ClientId=xxxx
Client0_Hostname=local/joshua
Client0_CloneCommand=xfwm4,--display,:0.0
Client0_DiscardCommand=rm,-rf,/home/yyy/.cache/sessions/xfwm4-xxx.state
Client0_RestartCommand=xfwm4,--display,:0.0,--sm-client-id,xxxx
Client0_CurrentDirectory=/home/yyy
Client0_Program=xfwm4
Client0_UserId=yyy
Client0_Priority=15
Client0_RestartStyleHint=2
Client1_ClientId=zzzz
Client1_Hostname=local/joshua
Client1_CloneCommand=Thunar
[...]
Note the "CloneCommand" line above shows a command line to run. Let's look at it in more details:

$ grep CloneCommand ./xfce4-session-joshua\:0
Client0_CloneCommand=xfwm4,--display,:0.0
Client1_CloneCommand=Thunar
Client2_CloneCommand=xfce4-panel
Client3_CloneCommand=xfdesktop,--display,:0.0
Client4_CloneCommand=xfce4-settings-helper,--display,:0.0
Client5_CloneCommand=gnome-terminal
Note that 2 out of 6 commands (xfwm4, xfce4-panel) are the ones I had to run manually to get back some of the normal features of a desktop environment. Let's try to run some of the others:
  • Thunar - a file manager kind of window appears. Given that I had never seen it before, I just close it. Useless, you can do the same with a shell.
  • xfdesktop - yay! my background (a solid reddish thing) appears. Together with 3 icons. Overall, I can do without, but it's nice to have a familiar and uniform color as a background.
  • xfce4-settings-helper - doesn't seem to be installed on my system. Weird.
So.. what is xfce4-settings-helper? and what happened to it?
$ apt-cache search xfce4-settings-helper
turns out empty. Let's go with apt-file:
$ apt-file search xfce4-settings-helper
xfce4-settings: /usr/bin/xfce4-settings-helper
So it should be part of xfce4-settings. Let's look at it:
$ dpkg -L xfce4-settings |grep bin
/usr/bin/xfsettingsd
/usr/bin/xfce4-settings-manager
/usr/bin/xfce4-display-settings
/usr/bin/xfce4-mime-settings
/usr/bin/xfce4-mouse-settings
/usr/bin/xfce4-settings-editor
/usr/bin/xfce4-accessibility-settings
/usr/bin/xfce4-keyboard-settings
/usr/bin/xfce4-appearance-settings
Looks like xfce4-settings-helper has been replaced by something else recently? My apt-file index is probably a few  months old at this point. xfsettingsd seems useful, by the name of it. But turns out it's already running:
$ ps -C xfsettingsd
  PID TTY          TIME CMD
 4990 ?        00:00:03 xfsettingsd
and it's been running since my first attempt at fixing the system. If I run the xfce4-... commands in xfce4-settings manually, I see some sort of control panels to change the settings of keyboard, mouse, ... Not surprising :).

So, what about the other files in ~/.cache/sessions? I am not interested in Thunar.*, so let's look at the xfwm4 files:
cat xfwm4-2d1adf3c0-1ecf-41cf-8d02-0f70f2f2f5eb
[CLIENT] 0x2400004
  [CLIENT_ID] 2c734204f-71b3-4e34-916d-a3367d9c329f
  [CLIENT_LEADER] 0x2400001
  [WINDOW_ROLE] gnome-terminal-window-3521-307132299-1301003369
  [RES_NAME] gnome-terminal
  [RES_CLASS] Gnome-terminal
  [WM_NAME] ccontavalli@joshua: /var/log
  [WM_COMMAND] (1) "gnome-terminal"
  [GEOMETRY] (0,15,1280,785)
  [GEOMETRY-MAXIMIZED] (3,15,577,335)
  [SCREEN] 0
  [DESK] 1
  [FLAGS] 0x10300
[CLIENT] 0x1a0006a
  [CLIENT_ID] 2cde246f4-a13f-4060-bc80-638880912489
  [CLIENT_LEADER] 0x1a00001
  [WINDOW_ROLE] browser
  [RES_NAME] Navigator
  [RES_CLASS] Iceweasel
  [WM_NAME] slim xfce4 consolekit debian - Google Search - Iceweasel
  [WM_COMMAND] (1) "firefox-bin"
  [GEOMETRY] (0,15,1280,785)
  [GEOMETRY-MAXIMIZED] (0,15,1280,785)
  [SCREEN] 0
  [DESK] 0
  [FLAGS] 0x10300
This looks an awful lot like the screen I had when I last used xfce4, this is probably where xfwm4 stores my last session. Overall not very interesting.

Let's move back to exploring $XDG_CONFIG_HOME, ~/.config/. Here there seems to be a directory for each software I've used in X in the last few months. Not surprisingly, there is a xfce4 and xfce4-session subdirectory. Let's explore them.

The main directories I recognize are:
[...]
./xfce4/xfconf/xfce-perchannel-xml/xfce4-session.xml
./xfce4/xfconf/xfce-perchannel-xml/pointers.xml
./xfce4/xfconf/xfce-perchannel-xml/thunar.xml
./xfce4/xfconf/xfce-perchannel-xml/xfce4-keyboard-shortcuts.xml
./xfce4/xfconf/xfce-perchannel-xml/displays.xml
./xfce4/xfconf/xfce-perchannel-xml/xsettings.xml
[...]

Those seem the settings shown by xfconf earlier. Opening those files confirms that they are likely the same settings, stored in .xml.
[...]
./xfce4/panel/launcher-12533521212.rc
./xfce4/panel/systray-4.rc
./xfce4/panel/tasklist-12533520341.rc
./xfce4/panel/launcher-9
./xfce4/panel/launcher-9/13094492190.desktop
[...]

this looks an awful lot like what I have in my "menu bar" at the bottom of the screen. This is probably where xfce4 stores the buttons I have configured.
./xfce4-session
turns out to be empty :(. Nothing here again. It's probably time to get to the next level, let's look at the xfce4-session source code.

First thing I notice in main are:
  /* check that no other session manager is running */
  sm = g_getenv ("SESSION_MANAGER");
  if (sm != NULL && strlen (sm) > 0)
    {
      g_printerr ("%s: Another session manager is already running\n", PACKAGE_NAME);
      exit (EXIT_FAILURE);
    }

  /* check if running in verbose mode */
  if (g_getenv ("XFSM_VERBOSE") != NULL)
    xfsm_enable_verbose ();
So by removing the "SESSION_MANAGER" environment variable and setting XFSM_VERBOSE, hopefully I can run xfce4-session manually and see what's happening.

Let's try:
$ unset SESSION_MANAGER
$ export XFSM_VERBOSE=foo
$ xfce4-session
xfce4-session: Another session manager is already running
Argh, there is another check further down in the code:
  if (DBUS_REQUEST_NAME_REPLY_PRIMARY_OWNER != ret)
    {
      g_printerr ("%s: Another session manager is already running\n",
                  PACKAGE_NAME);
      exit (EXIT_FAILURE);
    }
so, no luck. And well, I am out of time for today.

Last and final round...

I'd really like at this point to run xfce4-session under strace, to see what's happening under the hood. Given I can't just run xfce4-session from my shell easily, let's try to exit the graphical interface, and run "strace -f startxfce4 2>/tmp/log". If I am lucky, I will see something failing after the "exec(... xfce4-session ...)" somewhere in the middle of the trace. If not, it will be too noisy, and will need to find a better way to trace the problem.

As soon as I exit the graphical interface, I notice on my console some messages that look like:

(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfwm4": Failed to change to directory '/home/xxx' (No such file or directory)

(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfwm4": Failed to change to directory '/home/xxx' (No such file or directory)

(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfce4-panel": Failed to change to directory '/home/xxx' (No such file or directory)

(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfce4-panel": Failed to change to directory '/home/xxx' (No such file or directory)

(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfdesktop": Failed to change to directory '/home/xxx' (No such file or directory)

(xfce4-session:16876): xfce4-session-WARNING **: Unable to launch "xfdesktop": Failed to change to directory '/home/xxx' (No such file or directory)
YAY! This is probably the culprit: a few months ago I moved my home directory to /opt, as my root partition (where /home used to live), was full. I changed the record in passwd, and naively assumed that programs would still find their data in the new location, using wordexp() for tilde expansion, environment variables or getpwnam().

I bet nobody ever changes his home directory, or has different NFS mount points on different machines (right, who would use /home/u/user, for example, on a crowded server? and /home/user in his own desktop? surely a home directory is visible from the same path on every computer where one might access it).

Conclusion

At this point, I first tried to create a symlink from the old home location to the new one, and everything appeared to work:
ln -s /home/xxx /opt/home/xxx
To fix the problem forever, I used something like:
find ~/.config ~/.cache/sessions -print0 |xargs -0 sed -i -e "s@/home/xxx@/opt/home/xxx@"
which updated all the paths in every config and sessions file.

Updates: as pointed out on the Debian bug I filed, I should have started looking from ~/.xsession-errors. That would have made things easier :)

Tuesday, May 14, 2013

How to get started with libvirt on Debian


If you like hacking and have a few machines you use for development, chances are you know what I am about to talk about here. You start from this new idea, install a few tools, peek at some existing source code, try to compile it, get something running... and eventually move on to the next project

At least until your laptop becomes a giant meatball of services running for who knows what reason, you can't remember which machine you were actually using for that test, or half assed scripts you have no memory of  keep creeping up in your PATH.

My first approach at finding a solution was based on chroots. The idea was simple: only develop on my laptop, but create a self contained environment for each project where to install all the needed dependencies and tools, and where to run all my crazy experiments. The holy grail of the time were chroots, and during those years, I became good friend with rsync, debootstrap, mount --rbind and sometimes even pivot_root.

This worked well for a while. Until, well, I run into the limitations of chroots: can't really simulate networking, run different kernels (or OSes), and don't help much if you need to work on something boot related or that has to do with userspace and kernel interactions.

Guess what was my second approach? I started using Virtual Machines.

At first it was only one. A good old image created from scratch I would run with qemu and a tap device. A few tens of lines of shell script to get it up as needed, and I was back in business with my hacking.

Fast forward a few years, and I have > 10 different VMs on my laptop, this shell script has grown to almost 1k lines of an unmaintainable entanglement of relatively simple commands and images to run, and I am afraid of even thinking of what to use for my next project. My own spaghetti VMs.

A few weekends ago I finally built up the courage to fix this, and well, discovered how easy it is to manage VMs with libvirt. So, here's what I learned...

Setup

You start by installing the needed tools. On a Debian system:
$ sudo -s
# apt-get install libvirt-bin virtinst

This should get a "libvirtd" binary running on your machine:
$ ps u -C libvirtd
USER    PID %CPU %MEM    VSZ  RSS TTY STAT START TIME COMMAND
root  11950  0.0  0.1 111928 7544 ?   Sl   Apr19 1:29 /usr/sbin/libvirtd -d

The role of libvirtd is quite important: it takes care of managing the VMs running on your host. It is the daemon that starts them up, stops them and prepares the environment that they need. You control libvirtd by using virsh from the shell, or virt-manager to have a graphical interface. I am generally not fond of graphical interfaces, so I will talk about virsh for the rest of the post.

First few steps with libvirt

Before anything else, you should know that libvirt and virsh not only allow you to manage VMs running on your own system, but can control VMs running on remote systems or a cluster of physical machines. Every time you use virsh you need to specify some sort of URI to tell libvirt which sets of virtual machines you want to control.

For example, let's say you want to control a XEN virtual machine running on a remote server called "myserver.com". When using virsh, you can refer to that VM by providing an URI like "xen+ssh://root@myserver.com/", indicating that you want to use ssh to connect as root to the server myserver.com, and control xen virtual machines running there.

With QEMU (and KVM), which is what I use, there are two URIs you need to be aware of:
  • qemu://xxxx/system, to indicate all the system VMs running on server xxxx. 
  • qemu://xxxx/session, to indicate all the VMs belonging to the user that is running the virsh command.
That's right: each user can have its own set of VMs and networks, and if allowed to do so, can control a set of system, global VMs. Session VMs run as the user that started them, while system VMs generally run as an unprivileged, dedicated, user, libvirt-qemu on a debian systems.

If you omit xxxx, with URIs like qemu:///system, or qemu:///session, you are referring to the system and session VMs running on the machines you are running the command on, localhost.

Note that if you use virsh as root, and do not specify which sets of VMs you want to control, it will default to controlling the system VMs, the global ones. If you run virsh as a different user instead, it will default to controlling the session VMs, the ones that only belong to you.

This is a common mistake and good source of confusion when you get started, and you should keep in mind that it is a good idea to explicitly specify which VMs you want to work on with the -c option, that you will see in a few minutes.

Managing system VMs

On a Debian machine, for a user to be allowed to mange system VMs it needs to be able to send commands to libvirtd. By default, libvirtd listens on a unix domain socket in /var/run/libvirt, and for a user to be able to write to that socket he needs to belong to the libvirt group.

If you edit /etc/libvirt/libvirtd.conf, you can configure libvirtd to wait for commands using a variety of different mechanisms, including for example SSL encrypted TCP sockets.

Given that I only wanted to manage system local virtual machines, I just added my user, rabexc, to the group libvirt so I didn't have to be root to manage these machines:
usermod -a -G libvirt rabexc
# alternatively, use vigr and vigr -s

Defining a network

Each VM you define will likely need some sort of network connectivity, and some sort of storage to use.
Each object in libvirt, being it a network, a pool of disks to use, or a VM, is defined by an xml file.

Let's start by looking at the default network configuration, run:
$ virsh -c qemu:///system net-list
Name                 State      Autostart
-----------------------------------------
This means that there are no active virtual networks. Try one more time adding --all:
$ virsh -c qemu:///system net-list --all
Name                 State      Autostart
-----------------------------------------
default              inactive   no
and notice the default network.
If you want to inspect or change the configuration of the network, you can use either net-dumpxml or net-edit, like:
$ virsh -c qemu:///system net-dumpxml default
<network>
  <name>default</name>
  <uuid>ee49713c-d1c8-e08b-b007-6401efd145fe</uuid>
  <forward mode="nat">
  <bridge delay="0" name="virbr0" stp="on">
  <ip address="192.168.122.1" netmask="255.255.255.0">
    <dhcp>
      <range end="192.168.122.254" start="192.168.122.2">
    </range></dhcp>
  </ip>
  </bridge>
  </forward>
</network>

The output is pretty much self explanatory: 192.168.122.1 will be assigned to the virbr0 interface as the address of the gateway, virtual machines will be assigned addresses between 192.168.122.2 and 192.168.122.254 using dhcp, and forward traffic of those virtual machines to the outside world by using nat, eg, by mapping their IP address behind the address of your host.

A bridge device (virbr0) allows Virtual Machines to communicate with each other, as if they were connected to their own dedicated network. You can configure networking in many different ways, with nat, with bridging, with simple gateway forwarding, ... You can find full documentation on the parameters here: http://libvirt.org/formatnetwork.html, and change the definition by using net-edit. Other handy commands:
  • "net-undefine default", for example, to forever eliminate the default network.
  • "net-define file.xml", to define a new network starting from an .xml file. I usually start from the xml of another network, by using "virsh ... net-dumpxml default > file.xml", edit edit edit, and then "virsh ... net-define file.xml".

Starting and stopping networks

Once you have a network defined, you need to start it, or well, tell virsh that you want it started automatically. In our case, the commands would be:
  • "net-start default", to start the default network.
  • "net-destroy default", to stop the default network, with the ability of starting it again in the future.
  • "net-autostart default", to automatically start the default network at boot.
Now... what happens exactly when we start a network? My laptop has quite a few iptables rules and various other random network configurations. So, let's try:

$ virsh -c qemu:///system net-start default
Network default started
And have a look at the system:
$ ps faux
[...]
root   1799 0.0 0.6 109688 6508 ? Sl May01 0:00 /usr/sbin/libvirtd -d
nobody 4246 0.0 0.0   4608  896 ?  S 08:35 0:00 /usr/sbin/dnsmasq --strict-order --bind-interfaces --pid-file=/var/run/libvirt/network/default.pid --conf-file= --except-interface lo --listen-address 192.168.122.1 --dhcp-range 192.168.122.2,192.168.122.254 --dhcp-leasefile=/var/lib/libvirt/dnsmasq/default.leases --dhcp-lease-max=253 --dhcp-no-override
# netstat -nulp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address  Foreign Address PID/Program name
udp        0      0 192.168.0.1:53 0.0.0.0:*       4246/dnsmasq
udp        0      0 0.0.0.0:67     0.0.0.0:*       4246/dnsmasq

# netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address   Foreign Address  State   PID/Program name
tcp        0      0 192.168.0.1:53  0.0.0.0:*        LISTEN  4246/dnsmasq
tcp        0      0 0.0.0.0:22      0.0.0.0:*        LISTEN  2108/sshd
libvirt started dnsmasq, which is a simple dhcp server with the ability to also provide DNS names. Note that the command line parameters seem to match what we had in the default xml file.
$ ip address show
1: lo:  mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 52:54:00:2e:72:8b brd ff:ff:ff:ff:ff:ff
    inet 192.168.100.86/24 brd 192.168.100.255 scope global eth0
    inet6 fe80::5054:ff:fe2e:728b/64 scope link
       valid_lft forever preferred_lft forever
4: virbr0:  mtu 1500 qdisc noqueue state DOWN
    link/ether 8a:3c:6e:11:28:85 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
This shows that a new device, virbr0, has been created, and assigned 192.168.122.1 as an address.
$ sudo iptables -nvL
Chain INPUT (policy ACCEPT 565 packets, 38728 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:53
    0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:53
    0     0 ACCEPT     udp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            udp dpt:67
    0     0 ACCEPT     tcp  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            tcp dpt:67

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  *      virbr0  0.0.0.0/0            192.168.122.0/24     state RELATED,ESTABLISHED
    0     0 ACCEPT     all  --  virbr0 *       192.168.122.0/24     0.0.0.0/0
    0     0 ACCEPT     all  --  virbr0 virbr0  0.0.0.0/0            0.0.0.0/0
    0     0 REJECT     all  --  *      virbr0  0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable
    0     0 REJECT     all  --  virbr0 *       0.0.0.0/0            0.0.0.0/0            reject-with icmp-port-unreachable

Chain OUTPUT (policy ACCEPT 376 packets, 124K bytes)
 pkts bytes target     prot opt in     out     source               destination

$ cat /proc/sys/net/ipv4/ip_forward
1
Firewalling rules have also been installed. In particular, the first 4 rules allow querying of dnsmasq from the virtual network. Here they are meaningless: iptables default policy is to accept by default. But had I had my real iptables rules running, they would have allowed that traffic, and those new rules would have been inserted before my existing rules.

Forwarding rules, instead, allow all replies to come back in (packets belonging to RELATED and ESTABLISHED sessions), and allow communications from the virtual network to any other network, as long as the source ip is 192.168.122/24.
Note also that ip forwarding has either been enabled, or was already enabled by default.
$ sudo iptables -t nat -nvL
Chain PREROUTING (policy ACCEPT 1 packets, 32 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain INPUT (policy ACCEPT 1 packets, 32 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain OUTPUT (policy ACCEPT 1 packets, 1500 bytes)
 pkts bytes target     prot opt in     out     source               destination

Chain POSTROUTING (policy ACCEPT 1 packets, 1500 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 MASQUERADE  tcp  --  *      *       192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
    0     0 MASQUERADE  udp  --  *      *       192.168.122.0/24    !192.168.122.0/24     masq ports: 1024-65535
    0     0 MASQUERADE  all  --  *      *       192.168.122.0/24    !192.168.122.0/24
Finally, note that rules to perform NAT have been installed. Those rules are added by scripts when the network is setup. Some documentation is provided here: http://wiki.libvirt.org/page/Networking#Forwarding_Incoming_Connections.

If you want to, you can also add arbitrary rules to filter traffic to virtual machines, and have libvirt install and remove them automatically. As for the network commands, the main commands are: nwfilter-define, nwfilter-undefine, ...-edit, ...-list, ...-dumpxml. You can read more about firewalling on the libvirt site: http://libvirt.org/firewall.html

Managing storage

Now that we have a network running for our VMs, we need to worry about storage. There are many ways to get some disk space, ranging from dedicated partitions or LVM volumes to simple files.

The main idea is to create a "pool" from which you can draw space from, and create "volumes". Not very original, is it? On my system, I just dedicated a directory to storing images and "volumes".

You can start with:
$ virsh -c qemu:///system \
    pool-define-as devel \
    dir --target /opt/kvms/pools/devel

This creates a pool called devel  in a drectory /opt/kvms/pools/devel.  I can see this pool with:
$ virsh -c qemu:///system pool-list --all
Name                 State      Autostart 
-----------------------------------------
devel                inactive   no        

Note the --all parameter, without it, you would only see started pools. And as before, you can mark it to be automatically started by using:
$ virsh -c qemu:///system pool-autostart devel

and start it with:
$ virsh -c qemu:///system pool-start devel

To create and manage volumes you can use vol-create, vol-delete, vol-resize, ... all the vol commands that "virsh help" shows you. Or, you can just let virsh manage the volumes for you, as we will see in a second. The one command you will find useful is vol-list, to have the list of volumes in a pool.

For example:
$ virsh -c qemu:///system vol-list devel
Name                 Path
-----------------------------------------
Shows that there are no volumes. Don't forget that the pool has to be active for most of the vol- commands to work.

Installing a virtual machine

Now you are finally ready to create a new virtual machine. The main command to use is "virt-install". Let's look at a typical invocation:
virt-install -n debian-testing \
             --ram 2048 --vcpus=2 \
             --cpu=host \
             -c ./netinst/debian-6.0.7-amd64-netinst.iso \
             --os-type=linux --os-variant=debiansqueeze \
             --disk=pool=devel,size=2,format=qcow2 \
             -w network=devel --graphics=vnc

and go over the command line for a minute:

  • -n debian-testing is just a name. I am calling this VM "debian-testing".
  • --ram 2048 --vcpus=2 should also be no surprise: give it 2Gb of RAM, and 2 CPUs.
  • --cpu=host means that I do not want to emulate any specific CPU, the VM should just be provided the same CPU as my physical machine. This is generally fast, but can mean troubles if you want to be able to migrate your VMs to a less capable machine. The fact is, however, that I don't care about migrating my VMs, and prefer them to be fast :).
  • -c ./netinst... means that the VM should be configured to have a "CD-ROM" with the file .iso specified in it. This is just an installation image of debian.
  • --os-type, --os-variant are optional, but in theory allow libvirt to configure the VM with the optimal parameters for your operating system.
The most interesting part to me comes from:
  • --disk=pool=devel,size=2,format=qcow2, which asks libvirt to automatically allocate 2 Gb of space from the devel pool. Do you remember? The pool we defined just a few sections ago. The format parameter indicates how to store this VMs disks. The qcow2 format is probably the most common format for KVM and QEMU, and provides a great deal of flexibility. Look at the man page for more details.
  • -w network=devel means that the VM should be connected to the default network. Again, the network we created at the start of this article.
  • --graphics=vnc just means that you want to have a vnc window to control the VM.
Of course, you need to get a suitable installation media in advance, the file specified with -c ./netinsta.... I generally use CD or USB images suitable for a network install, which means minimal system, most of it downloaded from the network. virt-install also supports fetching directly the image to use from an http, ftp, or nfs server, in which case you should use the -l option, and read the man page, man virt-install. Don't forget that the image type must match the cpu you specify with --cpu.

Converting an existing virtual machine

In my case I had many existing VMs on my system. I did not want to maintain the same network setup, in facts, the default DHCP and NAT setup with a bridge provided by libvirt was better than what I had before. To import the VMs, I followed a simple procedure:

  1. Copied the image in the directory of the pool: cp my-vm.qcow2 /opt/kvms/pools/devel
  2. Refreshed the pool, just in case: virsh -c qemu:///system pool-refresh default
  3. Created a new VM based on that image, by using virt-install with the --import option, for example:
    virt-install --connect qemu:///system --ram 1024 -n my-vm --os-type=linux --os-variant=debianwheezy --disk vol=default/my-vm.qcow2,device=disk,format=qcow2 --vcpus=1 --vnc --import

    Note the default/my-vm.qcow2 indicating the file to use, and --import.
Of course, once the import was completed I had to connect to the VM and change the network parameters to use DHCP instead of a static address.

Managing Virtual Machines

You may have noticed that once you run virt-install, your virtual machine is started. The main commands to manage virtual machines are:
  • virt-viewer my-vm - to have the screen of your VM opened up in a vnc client.
  • virsh start my-vm - to start your VM.
  • virsh destroy my-vm - to stop your VM violently. It is generally much better to run "shutdown" from your VM, or better...
  • virsh shutdown my-vm - to send your VM a "shutdown request", like if you had pressed the shutdown button on your server. Note that it is then up to the OS installed and its configuration to decide what to do. Some desktop environments, for example, will pop up a window asking you what you want to do, and not really shutdown the machine.
  • virt-clone --original my-vm --auto-clone - to make an exact copy of your VM.
  • virsh autostart my-vm - to automatically start your vm at boot.

A few other random notes

VNC console from remote machine with no libvirt tools

I had to connect to the VNC console of my virtual machines from a remote desktop that did not have virt-viewer installed, so I could not use the -c and URI parameters. A simple port forwarding got me what I wanted:
$ ssh rabexc@server -L 5905:localhost:5900
$ vncviewer :5

To forward port 5900, first VM running VNC, to the local port 5905, and asked vncviewer to connect directly to the 5th VNC console locally (5900 + 5 = 5905).

virsh snapshots and qcow2

First time I used "virsh snapshot-save my-vm" to take a snapshot of all the volumes used by my VM I could not find where the data was stored. It turns out that qcow2 files have direct support for snapshots, which are saved internally within the same file. To see them, beside the virsh commands, you can use: qemu-img info /opt/kvms/pools/devel/my-vm.qcow2.

Moving qcow2 images around

If you created qcow2 images based on other images by using -o backing_file=... to only record the differences, if you move the images around this diff will not work anymore, as it will not find the original backing file. A quick fix was to use:

qemu-img rebase -u -b original_backing_file_in_new_path.img \
    derived_image.qcow2

Note that -u, unsafe, is only usable if really, the only thing that changed between the two images was the path.

Sending qemu monitor commands directly

Before switching to libvirt I was used to managing kvm / qemu VMs by using the monitor interface. Despite what the documentation claims, it is possible to send commands through this interface directly by using:

$ virsh -c qemu:///system \
    qemu-monitor-command \
    --hmp debian-testing "help"

for example.

Finding the IP address of your VM

When a VM starts with the default network configuration it will be assigned an IP via DHCP by dnsmasq. This IP can change. For some reason, I was sort of expecting dnsmasq, also capable of behaving as a simple DNS server, would maintain a mapping VM name to IP, and accept DNS queries to resolve the name of the VM. Turns out this is not the case, unless you explicitly add mappings between names and the MAC address of your VM in the network configuration. Or at least, I could not find a better way to do it.

The only reliable way to find the IP of your VM is to either provide a static mapping, or look into /var/lib/libvirt/dnsmasq/default.leases for the MAC address of your VM, where default is the name of your network.

You can find the MAC address of your VM by looking at its xml definition, with something like:
virsh dumpxml debian-modxslt |grep "mac address"

You can find plenty of shell scripts on google to do this automatically for you.

Conclusions

Switching to libvirt took me only a few hours, and I am no longer afraid of having to deal with multiple VMs on my laptop :). Creating them, cloning temporarily, or removing them has become an extremely simple task.

Tuesday, April 16, 2013

Many encrypted volumes, a single passphrase?

Just a few days ago I finally got a new server to replace a good old friend of mine which has been keeping my data safe since 2005. I was literally dying to get it up and running and move my data over when I realized it had been 8 years since I last setup dmcrypt on a server I only had ssh access to, and had no idea of what best current practices are.

So, let me start first by describing the environment. Like my previous server, this new machine is setup in a datacenter somewhere in Europe. I don't have any physical access to this machine, I can only ssh into it. I don't have a serial port I can connect to over the network, I don't have IPMI, nor something like intel kvm, but I really want to keep my data encrypted.

Having a laptop or desktop with your whole disk encrypted is pretty straightforward with modern linux systems. Your distro will boot up, kernel will be started, your scripts in the initrd will detect the encrypted partition, stop the boot process, ask you for a passphrase, decrypt your disk, and happily continue with the boot process.

But when your passphrase is asked, your network is not up yet, there is no ssh access. Either you sit in front of the monitor and type your passphrase, or there is really not that much you can do from a few thousand miles away.

To have encrypted partitions you can manage remotely, you pretty much need:
  1. A "minimal" linux system to boot. Minimal enough that you can get your network up and running, and some protocol so you can connect and type your passphrase. I'll get back to this in a few paragraphs.
  2. Some tool or script to mount your encrypted file systems and continue the boot process once you connect and enter your password.
Sounds easy, doesn't it? I spent some time looking around to see if I could find some pre-baked solution, like a simple package to install that would tweak my initrd and add ssh and the needed scripts, or some suggestion on how to do it in a smart way. In the end, I baked my own solution, exactly like 8 years ago.

So, here it is...

A minimal system to boot on...

Creating an initrd or a tiny partition to do the initial boot did not seem very attractive: for one, I do not want to keep the whole root encrypted. Root contains only tools and scripts downloaded from the Debian repositories. Configs really contain no sensitive data, and the kind of logs I care about do not end up in /var/log. In second instance, my experience with initrd is that it changes quite a bit over time: you need to generate a new initrd for every new kernel and set of drivers, which is tricky by itself, the tools have changed significantly over time, you need to compute (and install) the dependencies for any tool you need from the minimal root, and hook your stuff well enough with the generator so it keeps working over time.

Creating a minimal root outside of the initrd did not seem very attractive either: not fancying having two root partitions to keep up to date in terms of kernel, grub, updates, and so on. And again, I did not need to encrypt root.

The solution I used is pretty simple: have your root and boot in clear, boot from there as normal. From rc*.d, disable all the services that require my encrypted data (like mysql, apache, or my repositories), remove my encrypted partitions from fstab and crypttab (or mark them noauto in both), so the boot process does not stop to ask me for a passphrase, and make sure ssh is up and running at that point in time.

Once the system boots...

Once the system boots, have a script I can run manually that, in order:
  • decrypts all the partitions (...)
  • checks that the file systems are sane (remember the fsck run at boot?)
  • mounts them in the right location
  • starts all the other services that depend on that data

Encrypted partitions...

Let's start from the encrypted partitions. I've been using LVM and dmcrypt pretty much since they existed. I don't have hybrid systems, always linux only, and I like LVM much more than managing partition tables manually.

One common solution to have multiple volumes encrypted is to create an encrypted volume with LVM, decrypt it, and then use it as a physical volume for another volume group. So you end up, for example, with a  system volume (like vg0), system/encrypted as a logical volume, and rather than have a simple file system in there, have another volume group with multiple encrypted sub volumes. I am not quite fond of this solution as it makes it hard to borrow space from encrypted space to clear text space and vice versa, and generally makes things more confusing.

What I tend to do is just have a single volume group, containing some encrypted and clear text logical volumes. This however means that each logical volume has to be decrypted independently, and most of the tools will ask you for a passphrase for each volume, which is annoying. Some wikis suggest you to keep keyfiles on disk, which is roughly what I do: I create an encrypted logical volume, keys, with a strong passphrase, that contains truly random keys, that I only mount when mounting the other volumes and then immediately unmount.

Using this mechanism, the "decrypt all the partitions" step I described above  becomes:
  • ask for passphrase
  • decrypt keys volume
  • mount it
  • for each encrypted volume
    • load key file in keys partition
    • decrypt the volume
  • umount the keys volume
  • ... continue with checking the filesystems yadda yadda ...
As an additional requirement, I wanted those steps to be idempotent: if a partition is already mounted, it should be skipped, if I run the script multiple times, it should just complete the work that wasn't done before. 

My solution...

Back in 2005, together with a few friends with whom I was sharing the server, we wrote a small script to maintain those volumes and implement the steps above. The script is now checked in on github, you can find it here: https://github.com/ccontavalli/sys-scripts.

Setup

To get it running, you first need to install the tools, and create the volume where to store the keys, something like:

# Install sys-scripts.
mkdir -p /opt/{scripts,conf}
git clone https://github.com/ccontavalli/sys-scripts.git /opt/scripts

# Install the tools that are needed.
apt-get install cryptsetup lvm2

# Create a volume "encrypted-keys" in group "system", this would
# be "vg0" unless you changed the default.
lvcreate -L 20M -n encrypted-keys system

# Encrypt the partition and open it.
cryptsetup luksFormat /dev/system/encrypted-keys \
    --cipher=aes-cbc-essiv:sha256 --key-size=256 --verify-passphrase
cryptsetup luksOpen /dev/system/encrypted-keys cleartext-keys

# Put a file system on that partition.
mkfs.ext4 /dev/mapper/cleartext-keys
Note that if your volume is not called "system" but "vg0", you will need to edit /opt/scripts/ac-dmcrypt-manage and change cfg_key_volume to look like:
cfg_key_volume=${cfg_key_volume-vg0/encrypted-keys}
or remember to always call ac-dmcrypt-manage with the volume passed, like:
cfg_key_volume=vg0/encrypted-keys ac-dmcrypt-manage ...

Creating volumes

Now you are ready to create volumes. All you have to do is something like:
/opt/scripts/ac-dmcrypt-manage create-volume sytem \
    mysql 20G /opt/mysql ext4
for example, and follow the prompts. You can create as many volumes as you like.
If you want to try mount that volume, you can then run:
/opt/scripts/ac-dmcrypt-manage start
If you want to inspect the generated keys:
/opt/scripts/ac-dmcrypt-manage mount-keys
Just remember to umount them after a whlie, by using umount-keys.

You can also change the mount options of your partition by editing /opt/conf/ac-fstab, which has been generated automatically by create-volume.


Managing the boot process

Let's say now you want to mark mysql as a process that cannot be started until the encrypted partitions are mounted. What you have to do is:
/opt/scripts/ac-system-boot add mysql
The script will disable mysql from the normal boot, by running something like update-rc.d mysql disable.

When the system reboots

All you have to do is ssh on the system, and then run:
/opt/scripts/ac-system-boot start

Conclusions...

This set of scripts has served me well for several years, and will probably stick to them until I find a better mechanism for this kind of setup. Systems like ecryptfs or encfs look like viable alternatives for home directories or private data for individual users. But from what I have read so far dmcrypt still looks like the best option to keep system partitions encrypted, on a server.

Before using ac-system-boot, we tried using runlevels. Isn't this what they were meant for? The idea was to have a minimal network runlevel, and another runlevel with the system daemons to boot once the partitions are available. But between the various alternatives to SysV init that popped up in the last few years, the attention to speeding up the boot process, and various distribution scripts fiddling with rc*.d or assuming one setup or another, this did not work well.

Do you have better proposals? alternatives? let me know.

Wednesday, April 10, 2013

Cleaning up a CSS

Let's say you have a CSS with a few thousand selectors and many many rules. Let's say you want to eliminate the unused rules, how do you do that?

I spent about an hour looking online for some tool that would easily clean up CSS files. I've ended up trying a few browser extensions:

  • CSS Remove and combine, for chrome, did not work for me. It would only parse the very first web site in my browser window, and seemed to refuse file:/// urls. I later on discovered that chrome natively supports this feature: just go in developer tools (ctrl + shift + i), click the audits tab, click run, and you will find a drop down listing the unused rules in your CSS.
  • Dust-me Selectors, for firefox, worked like a charm: it correctly identified all the unused selectors. 
In both cases, however, the list of unused selectors did not seem that useful, especially if there is thousands of them. I was really not looking forward to go through my CSS by hand. And remember you can really only remove a style if there are no selectors left.
In the end, I noticed that "Dust-me" allowed to export the list of unused selectors as a .csv file, and wrote my own script: https://github.com/ccontavalli/css-tidy to read this csv, parse the .css, and output a cleaned up version of it.

The result was pretty good, and in the end it saved me lot of work :-), have a look at it. Note that this also works with Chrome: all you have to do is feed css-tidy with a list of selectors to eliminate.

Wednesday, April 3, 2013

Randomizing should be easy, right? oh, well, maybe not..

A simple problem...

Let's say you have a regression test or fuzzy testing suite that relies on generating a random set of operations, and verifying their results (like ldap-torture). You want this set operations to be reproducible, so if you find a bug, you can easily get to the exact same conditions that triggered it.

There are many ways to do this, but one simple way is to use one of many pseudo random generators, one that given the same starting seed generates the same sequence of random numbers. Example?

Let's look at perl:

# Seed the random number generator.
srand($seed);

# Generate 100 random numbers.
for (my $count = 0; $count < 100; $count++) {
  print rand() . "\n";
}

Given the same $seed, the sequence of random numbers will always be the same. Not surprising, right?

Now, let's go back to our original problem: you want your test to be reproducible, but still be random. Something you can do is get rid of $seed, and just call srand(). srand will return the seed generated, that you can helpfully run print on the screen. The final code would look like:

if ($seed) {
  srand($seed);
} else {
  $seed = srand();
}
print "SEED TO REPRODUCE TEST: " . $seed . "\n";

A broken solution...

Now, where is the problem? Well, the problem is that before perl 5.14 (~2011, in case you are wondering), srand() did not return the seed it set. Just doing $seed = srand() did not work.

I was debugging a piece of code I wrote a long time ago (2004), and here's what I was doing:

...
} else {
  my $seed = int(rand(~0));
  srand($seed);
}
...

Now, looks nice, doesn't it? rand() will automatically seed to some random value the first time rand is called, which means rand() will produce a reasonable value (well, reasonable for non-crypto purposes, and with reasonable versions of perl), which I can then store, use as a seed, and be done with it.

But what about the ~0 in parenthesis? Well, if I just call rand() by itself, without parameters, the returned value is a number between 0 and 1. srand() takes an integer, so something like $seed = rand(); srand($seed) would always lead to seeding the prng with 0, not good.

According to the man page, rand($something) instead will return a random between 0 and $something. By using ~0 as a parameter I get the maximum integer that perl can represent, an integer entirely made of 1s in binary. On my laptop, this is 2^64 - 1.

So: get a random number in the widest range I can possibly get, feed it to srand, print it on the screen, and have a reasonably ok reproducible random sequence at every run of my tool, right?

WRONG! What, why?

Well, turns out that there are 2 problems:
  1. rand() returns floating points, if you ask too large of a number and convert to integer, there will be no entropy in the lower bits.
  2. and well, srand only seems to be using the lower bits of a number to actually seed the prng.
Try yourself if you don't believe me:

my $count;
for ($count = 0; $count < 10; $count ++) {
  srand(); # This just seeds the prng to a non-great but ok value.
  $seed = int(rand(~0));
  srand($seed);
  print "SEED: $seed, NEXT: " . rand() . "\n";
}

Let's try to run this program:

$ perl /tmp/srand.pl
SEED: 1060381934447820800, NEXT: 0.559209994114102
SEED: 7074176055472357376, NEXT: 0.559209994114102
SEED: 1895145662064951296, NEXT: 0.559209994114102
SEED: 8633284823558979584, NEXT: 0.559209994114102
SEED: 18297293223351091200, NEXT: 0.559209994114102
SEED: 12078737747670532096, NEXT: 0.559209994114102
SEED: 11431093298324897792, NEXT: 0.559209994114102
SEED: 15164597111904862208, NEXT: 0.559209994114102
SEED: 11321558760259911680, NEXT: 0.559209994114102
SEED: 7997988821801172992, NEXT: 0.559209994114102

Note that despite the SEED being reasonably random and significantly different every time, the next random number generated is always the same. This means I'd get the same sequence, despite the different seeds. But are the seeds so different? Let's look at them in hex, let's add a sprintf("%x"):

perl /tmp/srand.pl 
SEED: 871400663299457024 c17d64951010000 NEXT: 0.559209994114102
SEED: 8131900614985711616 70da508651010000 NEXT: 0.559209994114102
SEED: 2174713905224417280 1e2e22c651010000 NEXT: 0.559209994114102
SEED: 18069664157141106688 fac457d051010000 NEXT: 0.559209994114102
SEED: 16751261221931515904 e8786f5451010000 NEXT: 0.559209994114102
SEED: 13819218582325755904 bfc7ba1551010000 NEXT: 0.559209994114102
SEED: 4060176321643347968 3858a4da51010000 NEXT: 0.559209994114102
SEED: 17907160938465787904 f88303ff51010000 NEXT: 0.559209994114102
SEED: 12784784062795546624 b16cad5e51010000 NEXT: 0.559209994114102
SEED: 13701418886505758720 be2537e251010000 NEXT: 0.559209994114102

Tadah! Note that the last 32 bits of the seeds are always the same!  Now, let's assume for a second that srand() is only using the last 32 bits for seeding. If I shift the number right by 1, with >>1, I should have one bit of entropy, right? and two different outcomes for the next rand? Let's try:

$ perl /tmp/srand.pl
SEED: 40252668253339648, NEXT: 0.365019015110196
SEED: 7239562128531685376, NEXT: 0.865019015110196
SEED: 1399728002052423680, NEXT: 0.365019015110196
SEED: 6143080778424156160, NEXT: 0.365019015110196
SEED: 1155420768230735872, NEXT: 0.865019015110196
SEED: 1359272858982842368, NEXT: 0.365019015110196
SEED: 8077490881973747712, NEXT: 0.865019015110196
SEED: 1391389776166289408, NEXT: 0.865019015110196
SEED: 3567395640554061824, NEXT: 0.865019015110196
SEED: 5663678882486714368, NEXT: 0.365019015110196

Seems like the theory is correct: I only obtain two different values.

So.. I wish that whatever srand() is doing was documented somewhere, the manual page makes no mention of srand only looking at the lowest 32 bits. And I feel naive for not having thought about float conversion to int, and well, very large numbers.

In fairness, this was almost 9 years ago, haven't used perl in a while, and well, have been spoiled by integers in python, which can have arbitrary length.


Getting back to use openldap...

While trying to get ldap torture back in shape, I had to learn again how to get slapd up and running with a reasonable configs. Here's a few things I had long forgotten and I have learned this morning:
  1. The order of the statements in slapd.conf is relevant. Don't be naive, even though the config looks like a normal key value store, some keys can be repeated multiple times (like backend, or database), and can only appear before / after other statements.
  2. My good old example slapd.conf file, no longer worked with slapd. Some of it is because the setup is just different, some of it because I probably had a few errors to being of, some of it is because a few statements moved around or are no longer valid. See the changes I had to make.
  3. Recent versions of slapd support having configs in the database itself, or at least represented in ldiff format and within the tree. Many distros ship slapd with the new format. To convert from the old format to the new one, you can use:
      slapd -f slapd.conf -F /etc/ldap/slapd.d
  4. I had long forgotten how quiet slapd can be, even when things go wrong. Looking in /var/log/syslog might often not be enough. In facts, my database was invalid, configs had error, and there was very little indication of the fact that when I started slapd, it was sitting there idle because it couldn't really start. To debug errors, I ended up running it with:
     
    slapd -d Any -f slapd.conf
  5. slapd will not create the initial database by itself. To do so, I had to use:
      /usr/sbin/slapcat -f slapd.conf < base.ldiffwith base.ldiff being something like this.
  6. Even if you set no password, ldapsearch with SASL authentication will likely ask you to confirm. It's easy to fix, though: just pass the -x parameter to go back to simple authentication, like with:
      ldapsearch -x -H "ldap://127.0.0.1:9009/" -b dc=test,dc=it
    Note that I had slapd run on a non standard port for experimentation purposes.
  7. Let's say you use -h instead of -H for ldapsearch because your memory is flaky, but you specify the parameter like -H would expect:
      ldapsearch -x -h "ldap://127.0.0.1:9009/" -b dc=test,dc=it
    The command will silently fail. Eg, it will accept -h as "valid" parameter, but still report "unable to connect". Really, -h takes a simple hostname, like 127.0.0.1, but will not fail in a case like above. Took me a few minutes to realize the mistake.
Let's see what the next roadblocks will be ...

Saturday, March 30, 2013

How much of a file system monger are you?

Have you ever been lost in conversations or threads about one or the other file system? which one is faster? which one is slower? is that feature stable? which file system to use for this or that payload?

I was recently surprised by seeing ext4 as the default file system on a new linux installation. Yes, I know, ext4 has been around for a good while, and it does offer some pretty nifty features. But when it comes to my personal laptop and my data, well, I must confess switching to something newer always sends shrives down my back.

Better performance? Are you sure it's really that important? I'm lucky enough that most of my coding & browsing can fit in RAM. And if I have to recompile the kernel, I can wait that extra minute. Is the additional slowness actually impacting your user experience? and productivity?
Larger files? Never had to store anything that ext2 could not support. Even with a 4Gb file limit, I've only rarely had problems (no, I don't use FAT32, but when dmcrypt/ecryptfs/encfs and friends did not exist, I used for years the good old CFS, which turned out to have a 2Gb file size limit).
Less fragmentation? More contiguous blocks? C'mon, how often have you had to worry about the fragmentation of your ext2 file system on your laptop?

What I generally worry about is the safety of my data. I want to be freaking sure that if I lose electric power,  forget my laptop in suspend mode or my horrible wireless driver causes a kernel panic I don't lose any data. I don't want no freaking bug in the filesystem to cause any data loss or inconsistency. And of course, I want a good toolset to recover data in case the worst happens (fsck, debug.*fs, recovery tools, ...).

So, what do I do? I stick to older file systems for longer. At least, for as long as the old system is well maintained, and the new system doesn't have something I really really want (like a journal, when they started popping up).

What else? Well, talking about ext.* file system and my setup...
  1. I use data=journal in fstab, whenever possible. For the root partition, be careful that you need to either add the option "rootflags=data=journal" to your grub / lilo configuration, or use something like tune2fs -o journal_data /dev/your/root/device, so that the file system is first mounted with data journaling enabled. If you are curious, the problem stems from the fact that you can't change the journaling mode when the file system is already mounted, the boot process on some distros will fail if you don't follow those steps.
  2. I make sure barriers are enabled. Most modern disks cache your data in an internal memory to be faster. If you lose power, journal or not, that data will be lost. Journal or not, you risk corrupting data.
    With barrier=1 in fstab you ensure that at least the journal entries are written properly to disk. This again can slow you down, but makes corruption significantly more unlikely.
  3. keep the file systems I don't need to write to read only, with the hope that in case things go wrong, I will be at least able to boot and reduce the surface of damage.
  4. other tunings to reduce battery use.
So, here's what my fstab looks like:

/dev/sda1        /boot   ext2 nodiratime,noatime,ro,nodev,nosuid,noexec,sync 0       2
/dev/mapper/root  /          ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 1
/dev/mapper/opt   /opt       ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 1
/dev/mapper/media /opt/media ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 2 
/dev/mapper/vms   /opt/vms   ext3 nodiratime,noatime,data=journal,errors=remount-ro 0 2 

Note that I use LVM underneath. It was hard for me to start, I was fearing an extra layer of indirection and possible caching would have complicated things :), but encryption and snapshotting features sold me, and I've been happily using it for years.

I use a similar setup on a raspberry pi in a small appliance that I cannot properly shutdown, and have been plugging and unplugging it directly without headaches for a while (luckily? maybe). 

So, what's next? I'm looking forward for a logged file system or a stable file system that properly supports snapshots. Something like NILFS or maybe btrfs In the past, I had a script taking snapshots of my LVM partitions periodically and at boot, so if I screwed up with an update or accidentally removed a file I could easily go back in time.

I gave up on that as LVM snapshots turned out to be fairly buggy from kernel to kernel, not well supported by distributions (had at least one instance of  initrd scripts getting confused by the presence of a persistent snapshot, and refusing to boot), and often lead more headaches than advantages, at least for personal use. I will probably give them a shot again in the near future :), but for now, I'm happy as it is.