John Montroy

System Calls

All applications run in user space. Applications can make system calls, which involve interacting with the kernel. Since all containers are applications on one Linux host e.g. one kernel, managing system calls from containers is important. Some example system calls are things like read, write, chown, clone, execve. Rarely do we make any system calls directly - in Golang, you might use the glibc library in the syscall package, but that’s at most.

File Permissions

Everything on Linux is a file. Config, logs, sockets, printers, devices.

If you see an s instead of x for the executable permission on a file, it has setuid set. setuid tells the file to execute as the owning user instead of as the user who actually executes the file. So if user “john” owns a binary which has setuid (chmod +s ./mybinary), then no matter who executes it, it’ll run as user “john”.

The canonical example is ping - anyone should be able to execute ping, which opens a raw network socket. But preferably almost no one should be able to create a raw network socket outside of ping! So ping is owned by root and has setuid set, which means anyone executing ping will run it as root.

Note: recent Ubuntu distributions (e.g. 24.04 LTS) don’t use setuid and instead rely purely on capabilities - see section below for a description, and here’s what you get with getcap:

getcap $(which ping)
/usr/bin/ping cap_net_raw=ep

setuid is old and dangerous – it’s a common pathway to privilege escalation, such that vulnerability scanners will usually flag it. See this small example on how to get root access via Docker volumes.

Two more settings:

Capabilities

Nowadays, what a binary / thread can do is managed through capabilities. Capabilities can be assigned to threads, and to files via setcap, and you can see a process’s capabilities with getpcap. Some big capabilities:

Privilege Escalation

That’s any time someone can get privileges they shouldn’t have. The above Docker vulnerability is an example. setuid is a problem here, and capability management is a problem.

Remember: by default, containers run as root. That means applications in the container run as root by default, and if someone can escape the container, they’ve got system root access. More to come.

#Container-Security