Back to Blog
high SEVERITY8 min read

Locking Down Docker: Preventing Privilege Escalation in Container Services

A high-severity privilege escalation vulnerability was discovered in a Docker Compose configuration where the `nginx` service lacked the `no-new-privileges` security option and was running with a writable root filesystem. These misconfigurations could allow a compromised container process to gain elevated permissions or download and execute malicious payloads. The fix applies defense-in-depth by adding `no-new-privileges:true`, enforcing a read-only root filesystem, and redirecting writable path

O
By orbisai0security
May 28, 2026

Locking Down Docker: Preventing Privilege Escalation in Container Services

Introduction

Container security is often treated as an afterthought — developers focus on getting services running correctly, and security configuration gets deferred until "later." Unfortunately, "later" sometimes arrives as an incident report. This post examines a real-world high-severity vulnerability found in a Docker Compose file: a combination of missing privilege escalation controls and a writable root filesystem on an nginx reverse proxy service.

If you run containerized workloads in production (and who doesn't these days?), understanding these misconfigurations could be the difference between a contained incident and a full compromise.


The Vulnerability Explained

What Is Privilege Escalation via setuid/setgid Binaries?

Linux systems use two special file permission bits — setuid (Set User ID) and setgid (Set Group ID) — to allow executables to run with the permissions of their owner rather than the user who launched them. Classic examples include /usr/bin/passwd, which needs root privileges to modify /etc/shadow even when run by a regular user.

Inside a container, setuid and setgid binaries present a serious risk. If a container image contains such binaries (and many base images do), a process running inside the container could execute them to escalate its privileges beyond what was originally granted. Even if your application starts as a non-root user, a vulnerability in your app (like a Remote Code Execution flaw) could be chained with a setuid binary to gain root inside the container.

By default, Docker does not prevent this. Without explicit hardening, container processes are free to exploit any setuid/setgid binary present in the image.

What Is a Writable Root Filesystem Risk?

When a container's root filesystem is writable (the default), a compromised process can:

  • Download additional malware or attacker tooling directly to the container's filesystem
  • Modify application files — injecting backdoors into served content or configuration
  • Persist changes across container restarts (if volumes are involved)
  • Stage lateral movement by writing scripts that interact with mounted volumes or network resources

For a reverse proxy like nginx, a writable filesystem is especially dangerous. If an attacker achieves code execution (perhaps through a vulnerability in a proxied application), they could modify nginx configuration files, inject content into served responses, or use the container as a foothold for further attacks.

The Real-World Attack Scenario

Imagine this chain of events:

  1. A vulnerability in a proxied backend application allows Remote Code Execution (RCE)
  2. The attacker's payload runs inside the nginx container (perhaps via a misconfigured proxy or shared process namespace)
  3. Because the root filesystem is writable, the attacker downloads a privilege escalation tool to /tmp
  4. A setuid binary in the nginx:alpine image is exploited to gain root inside the container
  5. With root and a writable filesystem, the attacker modifies nginx.conf or injects malicious content, pivoting to other services on the internal rustfs-network

Each of these steps is made possible by the absence of two simple configuration lines.


The Fix

The automated fix addresses both attack vectors simultaneously with four targeted changes to docker-compose.yml.

Before

# NGINX reverse proxy (optional)
nginx:
  image: nginx:alpine
  container_name: nginx-proxy
  ports:
    - "80:80"
    - "443:443"
  volumes:
    - ./.docker/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
    - ./.docker/nginx/ssl:/etc/nginx/ssl:ro
  networks:
    - rustfs-network
  restart: unless-stopped
  profiles:
    - proxy
  depends_on:
    ...

After

# NGINX reverse proxy (optional)
nginx:
  security_opt:
    - "no-new-privileges:true"          # ← NEW: Blocks privilege escalation
  image: nginx:alpine
  container_name: nginx-proxy
  ports:
    - "80:80"
    - "443:443"
  volumes:
    - ./.docker/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
    - ./.docker/nginx/ssl:/etc/nginx/ssl:ro
  tmpfs:
    - /var/run                           # ← NEW: In-memory writable scratch space
    - /var/cache/nginx                   # ← NEW: In-memory cache
    - /var/log/nginx                     # ← NEW: In-memory logs
  networks:
    - rustfs-network
  restart: unless-stopped
  read_only: true                        # ← NEW: Immutable root filesystem
  profiles:
    - proxy
  depends_on:
    ...

How Each Change Helps

1. security_opt: no-new-privileges:true

This is a Linux kernel-level control that sets the PR_SET_NO_NEW_PRIVS flag on the container's process tree. Once set, no child process can gain more privileges than its parent — even by executing a setuid or setgid binary. The bit is inherited across fork(), clone(), and execve() calls, making it sticky and reliable.

This directly neutralizes the privilege escalation vector described in the vulnerability report for the opensearch service pattern, and it's applied here to nginx as well.

security_opt:
  - "no-new-privileges:true"

Key insight: This does not remove setuid binaries from the image — it prevents them from being effective. Even if an attacker finds a setuid binary, executing it won't grant elevated privileges.

2. read_only: true

This mounts the container's root filesystem as read-only at the kernel level. Any attempt to write to the filesystem (outside of explicitly designated writable mounts) will result in a permission error. This directly prevents:

  • Downloading and executing additional payloads
  • Modifying container files
  • Persisting attacker tooling
read_only: true

3. tmpfs Mounts for Legitimate Writable Paths

nginx legitimately needs to write to certain directories at runtime — PID files, cache, and logs. Making the root filesystem read-only would break nginx if these paths weren't handled. The fix uses tmpfs (in-memory, non-persistent) mounts for exactly these directories:

tmpfs:
  - /var/run          # PID files and Unix sockets
  - /var/cache/nginx  # Proxy cache and temp files
  - /var/log/nginx    # Access and error logs

tmpfs is the right tool here for several reasons:
- Ephemeral: Contents are lost when the container stops — no persistence for attacker tooling
- Memory-backed: Fast I/O, no disk writes
- Scoped: Only these specific paths are writable; everything else remains read-only
- Isolated: Each container gets its own tmpfs instance

Note on logs: If you need to retain nginx logs for auditing or analysis, consider shipping them to stdout/stderr (which Docker captures) or using a dedicated log driver rather than writing to the container filesystem.


Prevention & Best Practices

Harden Every Service by Default

Don't wait for a vulnerability scan to add these controls. Make them part of your standard Docker Compose template:

# Security hardening template for Docker Compose services
services:
  my-service:
    security_opt:
      - no-new-privileges:true
    read_only: true
    tmpfs:
      - /tmp
      - /var/run
    # Drop all capabilities and add back only what's needed
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE  # Only if binding to ports < 1024

Principle of Least Privilege for Containers

Apply these practices across your container fleet:

Control Docker Compose Setting Purpose
Block privilege escalation security_opt: no-new-privileges:true Prevents setuid/setgid abuse
Read-only filesystem read_only: true Blocks payload downloads
Drop capabilities cap_drop: [ALL] Removes unnecessary kernel powers
Non-root user user: "1000:1000" Reduces blast radius
No host network (avoid network_mode: host) Isolates network namespace
Resource limits mem_limit, cpus Prevents resource exhaustion

Use Automated Scanning Tools

Don't rely solely on manual review. Integrate these tools into your CI/CD pipeline:

  • Trivy — Scans container images and IaC files including Docker Compose for misconfigurations
  • Checkov — Static analysis for Docker, Kubernetes, and Terraform
  • Hadolint — Dockerfile linter with security rules
  • Docker Scout — Built-in Docker vulnerability scanning
  • Semgrep — Custom rule-based scanning (the tool that caught this vulnerability)

Apply Defense in Depth

No single control is sufficient. Layer your defenses:

┌─────────────────────────────────────┐
│         Host OS / Kernel            │  ← Seccomp, AppArmor, SELinux
├─────────────────────────────────────┤
│         Container Runtime           │  ← Docker security defaults
├─────────────────────────────────────┤
│         Container Configuration     │  ← no-new-privileges, read_only, cap_drop
├─────────────────────────────────────┤
│         Application                 │  ← Non-root user, minimal image
└─────────────────────────────────────┘

Relevant Security Standards

This fix aligns with established security frameworks:

  • CWE-269: Improper Privilege Management
  • CWE-732: Incorrect Permission Assignment for Critical Resource
  • OWASP Docker Security Cheat Sheet: Recommends no-new-privileges and read-only filesystems
  • CIS Docker Benchmark: Controls 5.4 (no-new-privileges) and 5.12 (read-only rootfs)
  • NIST SP 800-190: Application Container Security Guide — recommends immutable container images

The opensearch Service — Don't Forget It

The vulnerability report specifically calls out the opensearch service as also lacking no-new-privileges. This fix addressed nginx, but a complete remediation should audit every service in your docker-compose.yml. Run a quick check:

# Find services missing no-new-privileges in your compose file
grep -A 20 "^\s\{2\}[a-z]" docker-compose.yml | grep -v "no-new-privileges"

Or better yet, add Trivy or Checkov to your pre-commit hooks so misconfigurations are caught before they ever reach a pull request.


Conclusion

Two lines of configuration — no-new-privileges:true and read_only: true — combined with thoughtful tmpfs mounts, dramatically reduce the attack surface of a containerized nginx service. These aren't exotic hardening techniques; they're well-documented, widely supported, and have essentially zero impact on legitimate application behavior.

The key takeaways from this fix:

  1. no-new-privileges:true is a free security win — add it to every service that doesn't explicitly require privilege escalation (which should be all of them)
  2. Read-only root filesystems contain compromises — even if an attacker achieves RCE, they can't download tools or modify files
  3. tmpfs makes read-only practical — identify the paths your application legitimately writes to and carve out in-memory mounts for them
  4. Automate detection — human review of Docker Compose files will miss things; static analysis tools won't
  5. Audit all services, not just the flagged one — misconfigurations tend to be systemic, not isolated

Container security doesn't require a security team or expensive tooling to get right. It requires building good defaults into your templates, automating checks in CI/CD, and treating each service configuration as a security-relevant artifact — because it is.

Stay curious, stay secure. 🔒


This vulnerability was automatically detected and fixed as part of an ongoing security hardening initiative. Automated security tooling identified the misconfiguration in docker-compose.yml and generated the remediation pull request.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #1005

Related Articles

medium

Plaintext OAuth Token Storage: A Medium-Severity Vulnerability Fix

A medium-severity vulnerability was discovered in a Docker CLI authentication plugin where OAuth tokens and API keys were stored in plaintext on the local filesystem without any encryption. Despite having PBKDF2 cryptographic capabilities available in the project dependencies, the authentication store was writing sensitive credentials directly to disk, exposing them to potential theft by malicious actors with filesystem access.

high

Shell Injection via Unsafe String Concatenation in gRPCurl Command Generation

A high-severity vulnerability was discovered in PaddleOCR's deployment configuration where model download URLs were specified using unencrypted `http://`, exposing users to man-in-the-middle attacks that could allow an attacker to intercept and replace model files with malicious ones. The fix upgrades all model download URLs to use `https://`, ensuring encrypted transmission and integrity of the downloaded files. This change is a critical security baseline for any application that downloads bina

high

Thread-Safe Tokenization: Fixing strtok() Reentrancy in Game Script Parsing

A high-severity vulnerability was discovered in `lvl_script_commands.c` where the use of the non-reentrant `strtok()` function during level script parsing created conditions for memory corruption and potential arbitrary code execution. The fix replaces all `strtok()` calls with the thread-safe `strtok_r()` variant, eliminating shared global state that could be exploited through maliciously crafted level files. This change is part of a broader effort to harden the game's script parsing pipeline a

high

Securing rpcbind: How Unauthenticated RPC Registration Exposes NFS Infrastructure

A high-severity vulnerability was discovered in an NFS utilities configuration where rpcbind (port 111) accepted RPC service registrations without any authentication, allowing any network-accessible attacker to register malicious services under legitimate RPC program numbers and redirect NFS clients. The fix adds critical security documentation and network isolation guidance, ensuring operators understand that rpcbind must be protected by host-level firewalling or Kubernetes network policies to

high

GPIO Bounds Checking: Fixing an Out-of-Bounds Access in py32ioexp Driver

A high-severity out-of-bounds access vulnerability was discovered and patched in the `py32ioexp` Linux GPIO expander driver. The `py32io_gpio_direction_input()` function failed to validate a user-supplied pin offset against the chip's declared GPIO count, opening the door to memory corruption via the GPIO character device interface. A two-line bounds check now closes the vulnerability cleanly and efficiently.

high

Buffer Overflow in RS-232 Serial Input: How a Missing Length Check Put Embedded Systems at Risk

A critical buffer overflow vulnerability was discovered in `serial.c`, where the `rs232_buffered_input` function could write more bytes than the destination buffer `rs232_ibuff` could hold — with no size limit to stop it. An attacker with access to the RS-232 serial port could exploit this to overwrite adjacent OS memory, including return addresses and critical data structures. The fix adds a simple but essential bounds check that clamps the returned byte count to the actual buffer size.