Getting started with Unprivileged Linux Containers in Ubuntu 14.04

People familiar with Solaris 10 or AIX 7 and later will know that it is possible run additional sub-instances of the operating system, called Zones and WPARs, respectively. The idea is to have a single kernel managing the process table and I/O in such a way that certain elements are tagged and sectioned off, similar to a chroot jail. The difference is that a Solaris Zone or WPAR appears to operate as a completely separate OS instance that can be restricted by resource controls applied in the "global" OS.

Enter Linux Containers (aka LXC). Developers of the OpenVZ project have contributed much of their work to the upstream Linux kernel and with Ubuntu 14.04 (LTS) Trusty Tahr, Linux Containers 1.0 has been made available to a mainstream Linux server distribution.

One of the most appealing things about Linux Containers 1.0 included with Trusty Tahr is that it supports unprivileged containers; that is, containers that are created and run by unprivileged, nonroot users. That means there is a stronger security boundary between the root or top-level operating system and the running container, making it less likely for a wayward process in the container to do any significant damage to the top-level system.

By default, unprivileged containers can only be used with a private bridge and a NAT. This is somewhat inconvenient, so the following steps are used to set up an unprivileged container with a virtual nic bridged to the physical one so that it can operate on same network as the host.

1) Add a new unprivleged user without sudo access

$ sudo useradd -s /bin/bash -d /lxc -m lxcd
$ sudo passwd -l lxcd

To be on the safe side, I didn't want sudo involved at all and I didn't want to keep entering the password of my new user. Therefore I set up password-less ssh keys for the lxcd user and then I ssh [email protected]

$ ssh-keygen -t rsa
$ sudo mkdir -m 700 /lxc/.ssh
$ sudo cat .ssh/id_rsa.pub >> /lxc/.ssh/authorized_keys
$ sudo chmod 600 /lxc/.ssh/authorized_keys
$ chown -R lxcd:lxcd /lxc

2) Configure ID mapping and bridge permissions

The following allows containers run by lxcd to map its container UIDs and GIDs to unused ranges on the host. In other words, a process run by uid 0 in the container will show up as being run by 100000 on the host.

$ sudo usermod --add-subuids 100000-165536 lxcd
$ sudo usermod --add-subgids 100000-165536 lxcd

This file controls how many virtual nics the user can attach to each bridge, which effectively limits how many (useful) containers the user can create. LXC provides the lxcbr0 for the private bridge+NAT. I added the line for br0, which is defined in /etc/network/interfaces.

$ sudo vi /etc/lxc/lxc-usernet
lxcd veth lxcbr0 4
lxcd veth br0 4

My /etc/network/interfaces file for reference:

# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
#auto eth0
#iface eth0 inet dhcp

# The primary network bridge for LXC
auto br0
iface br0 inet static
    address 192.168.1.5
    network 192.168.1.0
    netmask 255.255.255.0
    broadcast 192.168.1.255
    gateway 192.168.1.1
    dns-nameservers 192.168.1.1 8.8.8.8
    domain brentingitup.com
    dns-search local

    bridge_ports eth2
    bridge_fd 9
    bridge_hello 2
    bridge_maxage 12
    bridge_stp off

3) Create a container

Unfortunately, lxc-create can automatically manage btrfs and zfs filesystems only for root-run containers (e.g. -B zfs --zfsroot nas/lxc). For this task I created a zfs filesystem ahead of time and mounted in on /lxc/web, then I specified the destination directory for the installation (after giving the lxcd user write access to /lxc/web).

$ ssh [email protected]
$ lxc-create -t download -n web -B dir --dir /lxc/web/rootfs
	Distribution: ubuntu
	Release: trusty
	Architecture: amd64

4) Connect to the bridge

The configuration file for the new container "web" is stored in the file /lxc/.local/share/lxc/web/config. Update it so the container attaches to the external-facing bridge instead of the LXC private bridge (if desired):

$ vi ~/.local/share/lxc/web/config
	...
	#lxc.network.link = lxcbr0
	lxc.network.link = br0
	...
	:wq!

If you want all new containers created by this user to use a different bridge, update the ~/.config/lxc/default.conf file:

vi ~/.config/lxc/default.conf
lxc.id_map = u 0 100000 65536
lxc.id_map = g 0 100000 65536
lxc.network.type = veth
#lxc.network.link = lxcbr0
lxc.network.link = br0
lxc.network.flags = up
lxc.network.hwaddr = 00:16:3e:xx:xx:xx

To change the default for all containers on the system, update /etc/lxc/default.conf

5) Start and connect

Start the container. Without the -d option to "daemonize" the process in the background, the container console will take control of the terminal until the container is shut down again. This is useful for troubleshooting but without screen or some other mechanism to manage the foreground process it can be cumbersome. Start it in the background instead.

$ lxc-start -n web -d

For the initial boot, it is a good idea to connect to the console, check out the container, then break out of the console. This is analogous to zlogin -C in Solaris. The -e option sets the break character, which defaults to 'a'. I found this can interfere with other programs that depend on Ctrl+a, so I set it to 'b'.

$ lxc-console -n web -e b
	Type <Ctrl+b q> to exit the console, <Ctrl+b Ctrl+b> to enter Ctrl+b itself
web login: ubuntu
Password: ubuntu

[email protected]:~$ hostname
web
[email protected]:~$
[ctrl+b]  [q]
$ 

See /usr/share/doc/lxc/examples for more examples.

After the initial boot, use lxc-attach to go straight to a root shell in the container. "exit" returns to the host. This is analogous to zlogin without the -C option.

$ lxc-attach -n web
[email protected]:/# exit
$ 

6) Additional configuration

Autostart

Note: the following configuration is not working for me. Will fix eventually
To enable a container to start on boot, add the following section to the container config file

vi ~/.local/share/lxc/web/config
...
# Auto start
lxc.group = onboot
lxc.start.auto = 1
lxc.start.delay = 5
...

Read more about container configuration options in lxc.container.conf(5):

man lxc.container.conf

Bind Mounts

It may be desirable to share certain directories in the host with processes in the container. The simplest way to do this is to add bind mount enties in the container configuration. The following mounts a web document root as read-only in the container, to be served by Apache:

vi ~/.local/share/lxc/web/config
...
# Bind mounts
lxc.mount.entry = /nas/www /lxc/web/rootfs/var/www none defaults,bind,create=dir 0 0
...

Troubleshooting

"Assertion failed" when starting container

Though the bug is now fixed (update your server), when I first started using unprivileged LXC on Ubuntu 14.04, my non-root user would occasionally lose its cgroups - the tagging mechanism that allows container process and I/O separation. Verify the current cgroups with

$ cat /proc/self/cgroup

It should say /user/<UID>.user/<N>.session for everything. If it does not, the cgroups can be reinitialized, but it requires (temporary) access to sudo by the user in question.

$ for c in hugetlb cpuset cpu cpuacct memory devices freezer blkio perf_event; do
sudo dbus-send --print-reply --address=unix:path=/sys/fs/cgroup/cgmanager/sock \
--type=method_call \
/org/linuxcontainers/cgmanager org.linuxcontainers.cgmanager0_0.Create \
string:$c string:$USER

sudo dbus-send --print-reply --address=unix:path=/sys/fs/cgroup/cgmanager/sock \
--type=method_call \
/org/linuxcontainers/cgmanager org.linuxcontainers.cgmanager0_0.Chown \
string:$c string:$USER int32:$(id -u) int32:$(id -g)

dbus-send --print-reply --address=unix:path=/sys/fs/cgroup/cgmanager/sock \
--type=method_call \
/org/linuxcontainers/cgmanager org.linuxcontainers.cgmanager0_0.MovePid \
string:$c string:$USER int32:$$
done

This is from lxc bug 181

Apache: ulimit: error setting limit (Operation not permitted)

Normally, Apache runs as root and it is able to raise the number of open files permitted to it to 8192. In Ubuntu 14.04, the defaults are 1024 for the soft limit (which may be raised by process up to the hard limit) and 4096 for the hard limit (may only be raised by root).

In an unprivileged container, even a process running as root is really mapped to an unprivileged user in the host OS, so the attempt to raise the hard limit is denied. To fix this limitation for Apache, set the hard limit for the user running the container as follows:

$ sudo vi /etc/security/limits.conf
	lxcd            soft    nofile          1024
	lxcd            hard    nofile          8192

Additional Reading

For more details about LXC 1.0 in Ubuntu 14.04, including its interaction with AppArmor, nested containers, cloning, and more, see the Ubuntu LXC Help Page. I also got most of my details from this excellent LXC 1.0 web series by St├ęphane Graber, a core Ubuntu developer and LXC upstream maintainer.