Back to Blog
critical SEVERITY8 min read

Supply Chain Attack via Unsafe subprocess in CI/CD Hooks: Fixed

A high-severity vulnerability in `graphify/hooks.py` allowed attackers to achieve arbitrary code execution on CI/CD runners by injecting malicious hook script paths through a user-controlled configuration file. The fix introduces strict path validation against an allowlist of permitted directories before any subprocess execution. This kind of supply-chain attack vector is increasingly common and can silently compromise entire build pipelines with a single malicious commit.

O
By orbisai0security
May 6, 2026
#security#supply-chain#subprocess#python#ci-cd#code-injection#arbitrary-code-execution

Supply Chain Attack via Unsafe subprocess in CI/CD Hooks: How It Works and How It Was Fixed

Introduction

Imagine merging what looks like a routine pull request — a small config tweak, perhaps a new hook script path — only to discover that it silently redirected your CI/CD runner to execute an attacker-controlled binary. No exploit kit required. No zero-day. Just a misconfigured subprocess.run() call and a YAML file that nobody thought to validate.

This is exactly the class of vulnerability that was discovered and patched in graphify/hooks.py. Rated high severity, it represents one of the most dangerous patterns in modern software development: a supply-chain attack vector baked directly into the developer tooling itself.

If you write Python that executes external scripts, runs shell commands, or processes configuration files from repositories — this post is for you.


The Vulnerability Explained

What Happened?

At line 154 of graphify/hooks.py, the application called subprocess.run() to execute external hook scripts. The path to those scripts was read from a configuration file — something like a .graphify/hooks.yaml in the repository root.

Here's the core problem: the script path was never validated.

In simplified terms, the vulnerable code looked something like this:

# VULNERABLE - Do not use
import subprocess
import yaml

def run_hook(config_path: str):
    with open(config_path) as f:
        config = yaml.safe_load(f)

    hook_script = config.get("hook_script")  # User-controlled value!

    # No validation — executes whatever path the config specifies
    subprocess.run([hook_script], check=True)

If an attacker could modify hooks.yaml — for example, by merging a pull request — they could set hook_script to point to any executable on the filesystem:

# Malicious hooks.yaml
hook_script: "/tmp/malicious_payload.sh"

Or even more subtly:

hook_script: "../../.git/hooks/post-merge"

How Could It Be Exploited?

The attack chain is straightforward and alarmingly practical:

  1. Attacker forks a repository that uses graphify with hook support enabled.
  2. Attacker submits a pull request that modifies .graphify/hooks.yaml to point hook_script at a malicious binary they've also included in the PR (or one already present on the runner).
  3. CI/CD pipeline automatically checks out and processes the PR — as most modern pipelines do for automated testing and linting.
  4. graphify reads the hooks config and calls subprocess.run() with the attacker-supplied path.
  5. Arbitrary code executes on the CI/CD runner with whatever privileges the pipeline process holds.

From here, the attacker can:
- Exfiltrate secrets, API keys, and environment variables
- Tamper with build artifacts (injecting malicious code into your releases)
- Pivot to internal infrastructure accessible from the runner
- Establish persistence for future attacks

Why Is This Especially Dangerous in CI/CD?

CI/CD runners are high-value targets. They typically have access to:
- Repository secrets (deploy keys, signing certificates, cloud credentials)
- Package registries (the ability to publish releases)
- Internal networks (staging environments, databases, internal APIs)

A single malicious commit that gets processed by an automated pipeline can compromise all of the above — without ever requiring direct access to your infrastructure.

This attack pattern has been used in real-world incidents against major open-source projects. The SolarWinds attack and the codecov breach are high-profile examples of what happens when build pipeline trust is misplaced.

CWE and OWASP Classification

  • CWE-78: Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')
  • CWE-426: Untrusted Search Path
  • OWASP A03:2021 – Injection
  • OWASP A08:2021 – Software and Data Integrity Failures (Supply Chain)

The Fix

The patch to graphify/hooks.py introduces path allowlisting — validating the resolved, absolute path of any hook script against a set of permitted directories before passing it to subprocess.run().

Before (Vulnerable Pattern)

# BEFORE: No validation of the hook script path
import subprocess
import yaml

def run_hook(config_path: str):
    with open(config_path) as f:
        config = yaml.safe_load(f)

    hook_script = config.get("hook_script")
    subprocess.run([hook_script], check=True)  # ⚠️ Unvalidated user input

After (Secure Pattern)

# AFTER: Path validated against an allowlist before execution
import subprocess
import yaml
from pathlib import Path

# Define permitted base directories for hook scripts
ALLOWED_HOOK_DIRS = [
    Path("/opt/graphify/hooks").resolve(),
    Path("./graphify/default_hooks").resolve(),
]

class HookPathViolation(SecurityError):
    """Raised when a hook script path fails allowlist validation."""
    pass

def _validate_hook_path(script_path: str) -> Path:
    """
    Resolve the script path and verify it falls within an allowed directory.
    Raises HookPathViolation if the path is outside permitted locations.
    """
    resolved = Path(script_path).resolve()  # Resolves symlinks and ".." traversal

    for allowed_dir in ALLOWED_HOOK_DIRS:
        try:
            resolved.relative_to(allowed_dir)  # Raises ValueError if not a subpath
            return resolved
        except ValueError:
            continue

    raise HookPathViolation(
        f"Hook script '{script_path}' (resolved: '{resolved}') is outside "
        f"permitted directories: {ALLOWED_HOOK_DIRS}"
    )

def run_hook(config_path: str):
    with open(config_path) as f:
        config = yaml.safe_load(f)

    raw_hook_script = config.get("hook_script")
    if not raw_hook_script:
        return  # No hook configured, nothing to do

    # Validate before executing — raises if path is not permitted
    safe_script_path = _validate_hook_path(raw_hook_script)

    subprocess.run([str(safe_script_path)], check=True)  # ✅ Safe to execute

Key Security Improvements

Aspect Before After
Path validation None Allowlist against permitted directories
Symlink resolution Not handled Path.resolve() prevents symlink escapes
Directory traversal Vulnerable to ../../ Resolved absolute path checked
Error handling Silent failure Explicit HookPathViolation exception
Auditability None Clear, loggable rejection of invalid paths

Why Path.resolve() Matters

A naive check like script_path.startswith("/opt/graphify/hooks") can be bypassed with path traversal:

/opt/graphify/hooks/../../etc/passwd

Calling Path(script_path).resolve() first collapses all .. components and follows symlinks, giving you the true filesystem path before comparison. This is essential for any path-based security check.


Prevention & Best Practices

1. Never Trust Configuration File Contents

Configuration files — especially those committed to a repository — should be treated as untrusted user input. Any value read from them that influences execution (file paths, command names, arguments) must be validated.

# Treat config values like HTTP request parameters — validate everything
hook_script = config.get("hook_script")
assert isinstance(hook_script, str), "hook_script must be a string"
safe_path = _validate_hook_path(hook_script)  # Validate before use

2. Prefer Allowlists Over Denylists

It's tempting to block "dangerous" paths like /etc/ or /tmp/. Don't. Attackers are creative, and denylists are always incomplete. Instead, define exactly where scripts are permitted to live and reject everything else.

# ❌ Denylist — incomplete and bypassable
BLOCKED_PATHS = ["/etc", "/tmp", "/root"]

# ✅ Allowlist — explicit and safe
ALLOWED_DIRS = [Path("/opt/myapp/hooks").resolve()]

3. Apply Principle of Least Privilege to CI/CD Runners

Even with the fix in place, defense-in-depth matters:
- Run CI/CD jobs in ephemeral, isolated containers or VMs
- Use read-only repository checkouts where possible
- Scope secrets to only the jobs that need them
- Enable branch protection rules to require reviews before merging config changes

4. Audit subprocess Usage Regularly

Search your codebase for dangerous patterns:

# Find subprocess calls that might use variable input
grep -rn "subprocess\.\(run\|call\|Popen\|check_output\)" . \
  --include="*.py" | grep -v "shell=False"

Better yet, use static analysis tools:
- Bandit (pip install bandit) — specifically checks for subprocess misuse (rule B603, B604)
- Semgrep with the python.lang.security.audit.subprocess-shell-true rule
- Safety for dependency vulnerability scanning

# Run Bandit on your project
bandit -r ./graphify -t B603,B604,B607

5. Consider Sandboxing Hook Execution

For applications that genuinely need to run user-provided scripts, consider sandboxing:
- Docker containers with limited capabilities and no network access
- seccomp profiles to restrict syscalls
- firejail or bubblewrap for lightweight sandboxing on Linux

6. Security Standards to Reference


A Note on the Related Credential Storage Issue

While this post focuses on the subprocess vulnerability (V-003), it's worth briefly acknowledging that a separate critical vulnerability (V-001) was also identified: OAuth tokens and API keys being stored in plaintext on the local filesystem in graphify/extract.py.

Plaintext credential storage is a serious companion risk — if an attacker achieves code execution via the hook injection described above, any plaintext credentials on disk become immediately accessible. Defense-in-depth means fixing both issues: prevent the execution, and encrypt the credentials so that even a successful breach yields less value to the attacker.

Encrypting stored credentials using a key derivation function like PBKDF2 (already available in the project's Rust dependencies) is the recommended remediation for V-001.


Conclusion

The vulnerability fixed in this PR is a textbook example of why input validation must happen at every trust boundary — not just at HTTP endpoints, but anywhere your application reads data that influences execution. A configuration file in a repository is just as dangerous as a form field on a website if its contents are blindly trusted.

The key takeaways:

  • Always validate file paths read from configuration against an explicit allowlist
  • Use Path.resolve() before any path comparison to prevent traversal and symlink attacks
  • Treat CI/CD pipelines as high-value targets — they have access to your most sensitive secrets
  • Apply defense-in-depth: input validation + least privilege + sandboxing + secret scoping
  • Automate security scanning with tools like Bandit and Semgrep to catch these patterns before they reach production

Supply chain attacks are not theoretical. They are happening today, at scale, against real organizations. The cost of adding a 10-line path validation function is essentially zero. The cost of not adding it can be catastrophic.

Write the validation. Merge the fix. Sleep better.


This vulnerability was identified and patched by OrbisAI Security. Automated security scanning combined with LLM-assisted code review confirmed both the vulnerability and the effectiveness of the fix.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #747

Related Articles

critical

Stack Buffer Overflow in MapScale: How Five Unsafe sprintf Calls Created a Critical Vulnerability

A critical stack-based buffer overflow vulnerability was discovered and patched in `src/mapscale.c`, where five unbounded `sprintf` calls wrote formatted output into fixed-size stack buffers without any bounds checking. An attacker controlling unit text strings could overflow the stack buffer, potentially overwriting the function return address and achieving arbitrary code execution. The fix replaces dangerous `sprintf` calls with their bounds-checked counterparts, eliminating the overflow risk

critical

Heap Buffer Overflows in YAML Parser: How Unchecked memcpy Calls Create Critical Attack Vectors

A critical heap buffer overflow vulnerability was discovered and patched in the YAML parser embedded within an Android VPN application, where five unvalidated `memcpy` calls could allow an attacker to corrupt heap memory by supplying a crafted YAML configuration file. This class of vulnerability is particularly dangerous because it can lead to arbitrary code execution or application crashes in security-sensitive contexts. The fix adds proper bounds validation before each copy operation, eliminat

critical

Critical Buffer Overflow Fixed: When "Safe" Functions Aren't Safe

A critical vulnerability in DeepSkyStackerKernel's StackWalker.cpp was silently replacing bounds-checking string functions with their unsafe counterparts via preprocessor macros, exposing the entire codebase to buffer overflow attacks. This fix removes the dangerous macro definitions that discarded buffer size arguments, restoring the intended memory safety protections across all call sites. Understanding how this subtle macro trick works is essential for any C/C++ developer working with string