What is the no-new-privileges security option in Docker?

The no-new-privileges option prevents container processes from gaining additional privileges through setuid/setgid binaries, execve calls, or other privilege escalation mechanisms, ensuring a process cannot elevate beyond its initial permissions.

How do you prevent privilege escalation in Docker Compose?

Add `security_opt: - no-new-privileges:true` to your service definition, use `read_only: true` for the root filesystem, drop unnecessary capabilities with `cap_drop: - ALL`, and only add back specific capabilities you need.

What CWE is Docker privilege escalation?

Docker privilege escalation typically maps to CWE-250 (Execution with Unnecessary Privileges) and CWE-269 (Improper Privilege Management), depending on the specific misconfiguration.

Is running as non-root enough to prevent Docker privilege escalation?

No, running as non-root helps but is not sufficient. A non-root user can still exploit setuid binaries or capabilities to escalate privileges. The no-new-privileges flag provides an additional layer of defense.

Can static analysis detect Docker security misconfigurations?

Yes, tools like Semgrep, Hadolint, and dedicated Docker security scanners can detect missing security options like no-new-privileges, writable filesystems, and excessive capabilities in Docker Compose and Dockerfile configurations.

Locking Down Docker: Preventing Privilege Escalation in Container Services

Introduction

Container security is often treated as an afterthought — developers focus on getting services running correctly, and security configuration gets deferred until "later." Unfortunately, "later" sometimes arrives as an incident report. This post examines a real-world high-severity vulnerability found in a Docker Compose file: a combination of missing privilege escalation controls and a writable root filesystem on an nginx reverse proxy service.

If you run containerized workloads in production (and who doesn't these days?), understanding these misconfigurations could be the difference between a contained incident and a full compromise.

The Vulnerability Explained

What Is Privilege Escalation via `setuid`/`setgid` Binaries?

Linux systems use two special file permission bits — setuid (Set User ID) and setgid (Set Group ID) — to allow executables to run with the permissions of their owner rather than the user who launched them. Classic examples include /usr/bin/passwd, which needs root privileges to modify /etc/shadow even when run by a regular user.

Inside a container, setuid and setgid binaries present a serious risk. If a container image contains such binaries (and many base images do), a process running inside the container could execute them to escalate its privileges beyond what was originally granted. Even if your application starts as a non-root user, a vulnerability in your app (like a Remote Code Execution flaw) could be chained with a setuid binary to gain root inside the container.

By default, Docker does not prevent this. Without explicit hardening, container processes are free to exploit any setuid/setgid binary present in the image.

What Is a Writable Root Filesystem Risk?

When a container's root filesystem is writable (the default), a compromised process can:

Download additional malware or attacker tooling directly to the container's filesystem
Modify application files — injecting backdoors into served content or configuration
Persist changes across container restarts (if volumes are involved)
Stage lateral movement by writing scripts that interact with mounted volumes or network resources

For a reverse proxy like nginx, a writable filesystem is especially dangerous. If an attacker achieves code execution (perhaps through a vulnerability in a proxied application), they could modify nginx configuration files, inject content into served responses, or use the container as a foothold for further attacks.

The Real-World Attack Scenario

Imagine this chain of events:

A vulnerability in a proxied backend application allows Remote Code Execution (RCE)
The attacker's payload runs inside the nginx container (perhaps via a misconfigured proxy or shared process namespace)
Because the root filesystem is writable, the attacker downloads a privilege escalation tool to /tmp
A setuid binary in the nginx:alpine image is exploited to gain root inside the container
With root and a writable filesystem, the attacker modifies nginx.conf or injects malicious content, pivoting to other services on the internal rustfs-network

Each of these steps is made possible by the absence of two simple configuration lines.

The Fix

The automated fix addresses both attack vectors simultaneously with four targeted changes to docker-compose.yml.

Before

# NGINX reverse proxy (optional)
nginx:
  image: nginx:alpine
  container_name: nginx-proxy
  ports:
    - "80:80"
    - "443:443"
  volumes:
    - ./.docker/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
    - ./.docker/nginx/ssl:/etc/nginx/ssl:ro
  networks:
    - rustfs-network
  restart: unless-stopped
  profiles:
    - proxy
  depends_on:
    ...

After

# NGINX reverse proxy (optional)
nginx:
  security_opt:
    - "no-new-privileges:true"          # ← NEW: Blocks privilege escalation
  image: nginx:alpine
  container_name: nginx-proxy
  ports:
    - "80:80"
    - "443:443"
  volumes:
    - ./.docker/nginx/nginx.conf:/etc/nginx/nginx.conf:ro
    - ./.docker/nginx/ssl:/etc/nginx/ssl:ro
  tmpfs:
    - /var/run                           # ← NEW: In-memory writable scratch space
    - /var/cache/nginx                   # ← NEW: In-memory cache
    - /var/log/nginx                     # ← NEW: In-memory logs
  networks:
    - rustfs-network
  restart: unless-stopped
  read_only: true                        # ← NEW: Immutable root filesystem
  profiles:
    - proxy
  depends_on:
    ...

How Each Change Helps

1. `security_opt: no-new-privileges:true`

This is a Linux kernel-level control that sets the PR_SET_NO_NEW_PRIVS flag on the container's process tree. Once set, no child process can gain more privileges than its parent — even by executing a setuid or setgid binary. The bit is inherited across fork(), clone(), and execve() calls, making it sticky and reliable.

This directly neutralizes the privilege escalation vector described in the vulnerability report for the opensearch service pattern, and it's applied here to nginx as well.

security_opt:
  - "no-new-privileges:true"

Key insight: This does not remove setuid binaries from the image — it prevents them from being effective. Even if an attacker finds a setuid binary, executing it won't grant elevated privileges.

2. `read_only: true`

This mounts the container's root filesystem as read-only at the kernel level. Any attempt to write to the filesystem (outside of explicitly designated writable mounts) will result in a permission error. This directly prevents:

Downloading and executing additional payloads
Modifying container files
Persisting attacker tooling

read_only: true

3. `tmpfs` Mounts for Legitimate Writable Paths

nginx legitimately needs to write to certain directories at runtime — PID files, cache, and logs. Making the root filesystem read-only would break nginx if these paths weren't handled. The fix uses tmpfs (in-memory, non-persistent) mounts for exactly these directories:

tmpfs:
  - /var/run          # PID files and Unix sockets
  - /var/cache/nginx  # Proxy cache and temp files
  - /var/log/nginx    # Access and error logs

tmpfs is the right tool here for several reasons:
- Ephemeral: Contents are lost when the container stops — no persistence for attacker tooling
- Memory-backed: Fast I/O, no disk writes
- Scoped: Only these specific paths are writable; everything else remains read-only
- Isolated: Each container gets its own tmpfs instance

Note on logs: If you need to retain nginx logs for auditing or analysis, consider shipping them to stdout/stderr (which Docker captures) or using a dedicated log driver rather than writing to the container filesystem.

Prevention & Best Practices

Harden Every Service by Default

Don't wait for a vulnerability scan to add these controls. Make them part of your standard Docker Compose template:

# Security hardening template for Docker Compose services
services:
  my-service:
    security_opt:
      - no-new-privileges:true
    read_only: true
    tmpfs:
      - /tmp
      - /var/run
    # Drop all capabilities and add back only what's needed
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE  # Only if binding to ports < 1024

Principle of Least Privilege for Containers

Apply these practices across your container fleet:

Control	Docker Compose Setting	Purpose
Block privilege escalation	`security_opt: no-new-privileges:true`	Prevents setuid/setgid abuse
Read-only filesystem	`read_only: true`	Blocks payload downloads
Drop capabilities	`cap_drop: [ALL]`	Removes unnecessary kernel powers
Non-root user	`user: "1000:1000"`	Reduces blast radius
No host network	(avoid `network_mode: host`)	Isolates network namespace
Resource limits	`mem_limit`, `cpus`	Prevents resource exhaustion

Use Automated Scanning Tools

Don't rely solely on manual review. Integrate these tools into your CI/CD pipeline:

Trivy — Scans container images and IaC files including Docker Compose for misconfigurations
Checkov — Static analysis for Docker, Kubernetes, and Terraform
Hadolint — Dockerfile linter with security rules
Docker Scout — Built-in Docker vulnerability scanning
Semgrep — Custom rule-based scanning (the tool that caught this vulnerability)

Apply Defense in Depth

No single control is sufficient. Layer your defenses:

┌─────────────────────────────────────┐
│         Host OS / Kernel            │  ← Seccomp, AppArmor, SELinux
├─────────────────────────────────────┤
│         Container Runtime           │  ← Docker security defaults
├─────────────────────────────────────┤
│         Container Configuration     │  ← no-new-privileges, read_only, cap_drop
├─────────────────────────────────────┤
│         Application                 │  ← Non-root user, minimal image
└─────────────────────────────────────┘

Relevant Security Standards

This fix aligns with established security frameworks:

CWE-269: Improper Privilege Management
CWE-732: Incorrect Permission Assignment for Critical Resource
OWASP Docker Security Cheat Sheet: Recommends no-new-privileges and read-only filesystems
CIS Docker Benchmark: Controls 5.4 (no-new-privileges) and 5.12 (read-only rootfs)
NIST SP 800-190: Application Container Security Guide — recommends immutable container images

The `opensearch` Service — Don't Forget It

The vulnerability report specifically calls out the opensearch service as also lacking no-new-privileges. This fix addressed nginx, but a complete remediation should audit every service in your docker-compose.yml. Run a quick check:

# Find services missing no-new-privileges in your compose file
grep -A 20 "^\s\{2\}[a-z]" docker-compose.yml | grep -v "no-new-privileges"

Or better yet, add Trivy or Checkov to your pre-commit hooks so misconfigurations are caught before they ever reach a pull request.

Conclusion

Two lines of configuration — no-new-privileges:true and read_only: true — combined with thoughtful tmpfs mounts, dramatically reduce the attack surface of a containerized nginx service. These aren't exotic hardening techniques; they're well-documented, widely supported, and have essentially zero impact on legitimate application behavior.

The key takeaways from this fix:

no-new-privileges:true is a free security win — add it to every service that doesn't explicitly require privilege escalation (which should be all of them)
Read-only root filesystems contain compromises — even if an attacker achieves RCE, they can't download tools or modify files
tmpfs makes read-only practical — identify the paths your application legitimately writes to and carve out in-memory mounts for them
Automate detection — human review of Docker Compose files will miss things; static analysis tools won't
Audit all services, not just the flagged one — misconfigurations tend to be systemic, not isolated

Container security doesn't require a security team or expensive tooling to get right. It requires building good defaults into your templates, automating checks in CI/CD, and treating each service configuration as a security-relevant artifact — because it is.

Stay curious, stay secure. 🔒

This vulnerability was automatically detected and fixed as part of an ongoing security hardening initiative. Automated security tooling identified the misconfiguration in docker-compose.yml and generated the remediation pull request.

cwe	CWE-250 (Execution with Unnecessary Privileges)
fix	Add security_opt no-new-privileges:true and read_only: true with tmpfs for writable paths
risk	Compromised container can escalate privileges and execute malicious payloads
language	Docker Compose YAML
root cause	Missing no-new-privileges security option and writable root filesystem
vulnerability	Docker Container Privilege Escalation

Locking Down Docker: Preventing Privilege Escalation in Container Services

Answer Summary

Vulnerability at a Glance

Locking Down Docker: Preventing Privilege Escalation in Container Services

Introduction

The Vulnerability Explained

What Is Privilege Escalation via `setuid`/`setgid` Binaries?

What Is a Writable Root Filesystem Risk?

The Real-World Attack Scenario

The Fix

Before

After

How Each Change Helps

1. `security_opt: no-new-privileges:true`

2. `read_only: true`

3. `tmpfs` Mounts for Legitimate Writable Paths

Prevention & Best Practices

Harden Every Service by Default

Principle of Least Privilege for Containers

Use Automated Scanning Tools

Apply Defense in Depth

Relevant Security Standards

The `opensearch` Service — Don't Forget It

Conclusion

Frequently Asked Questions

What is the no-new-privileges security option in Docker?

How do you prevent privilege escalation in Docker Compose?

What CWE is Docker privilege escalation?

Is running as non-root enough to prevent Docker privilege escalation?

Can static analysis detect Docker security misconfigurations?

View the Security Fix

Related Articles

Local File Inclusion in Crawl4AI Docker API via file:// URL Injection

Plaintext OAuth Token Storage: A Medium-Severity Vulnerability Fix

How missing Dependabot cooldown happens in GitHub Actions and how to fix it

How Server-Sent Events Injection via Unsanitized Newlines happens in Node.js h3 and how to fix it

How buffer overflow from unsafe string copy functions happens in C network interface code and how to fix it

How Memory Exhaustion via Large Comma-Separated Selector Lists happens in Python Soup Sieve and how to fix it

Locking Down Docker: Preventing Privilege Escalation in Container Services

Answer Summary

Vulnerability at a Glance

Locking Down Docker: Preventing Privilege Escalation in Container Services

Introduction

The Vulnerability Explained

What Is Privilege Escalation via setuid/setgid Binaries?

What Is a Writable Root Filesystem Risk?

The Real-World Attack Scenario

The Fix

Before

After

How Each Change Helps

1. security_opt: no-new-privileges:true

2. read_only: true

3. tmpfs Mounts for Legitimate Writable Paths

Prevention & Best Practices

Harden Every Service by Default

Principle of Least Privilege for Containers

Use Automated Scanning Tools

Apply Defense in Depth

Relevant Security Standards

The opensearch Service — Don't Forget It

Conclusion

Frequently Asked Questions

What is the no-new-privileges security option in Docker?

How do you prevent privilege escalation in Docker Compose?

What CWE is Docker privilege escalation?

Is running as non-root enough to prevent Docker privilege escalation?

Can static analysis detect Docker security misconfigurations?

View the Security Fix

Related Articles

Local File Inclusion in Crawl4AI Docker API via file:// URL Injection

Plaintext OAuth Token Storage: A Medium-Severity Vulnerability Fix

How missing Dependabot cooldown happens in GitHub Actions and how to fix it

How Server-Sent Events Injection via Unsanitized Newlines happens in Node.js h3 and how to fix it

How buffer overflow from unsafe string copy functions happens in C network interface code and how to fix it

How Memory Exhaustion via Large Comma-Separated Selector Lists happens in Python Soup Sieve and how to fix it

What Is Privilege Escalation via `setuid`/`setgid` Binaries?

1. `security_opt: no-new-privileges:true`

2. `read_only: true`

3. `tmpfs` Mounts for Legitimate Writable Paths

The `opensearch` Service — Don't Forget It