Supply Chain Attack via Unsafe subprocess in CI/CD Hooks: How It Works and How It Was Fixed
Introduction
Imagine merging what looks like a routine pull request — a small config tweak, perhaps a new hook script path — only to discover that it silently redirected your CI/CD runner to execute an attacker-controlled binary. No exploit kit required. No zero-day. Just a misconfigured subprocess.run() call and a YAML file that nobody thought to validate.
This is exactly the class of vulnerability that was discovered and patched in graphify/hooks.py. Rated high severity, it represents one of the most dangerous patterns in modern software development: a supply-chain attack vector baked directly into the developer tooling itself.
If you write Python that executes external scripts, runs shell commands, or processes configuration files from repositories — this post is for you.
The Vulnerability Explained
What Happened?
At line 154 of graphify/hooks.py, the application called subprocess.run() to execute external hook scripts. The path to those scripts was read from a configuration file — something like a .graphify/hooks.yaml in the repository root.
Here's the core problem: the script path was never validated.
In simplified terms, the vulnerable code looked something like this:
# VULNERABLE - Do not use
import subprocess
import yaml
def run_hook(config_path: str):
with open(config_path) as f:
config = yaml.safe_load(f)
hook_script = config.get("hook_script") # User-controlled value!
# No validation — executes whatever path the config specifies
subprocess.run([hook_script], check=True)
If an attacker could modify hooks.yaml — for example, by merging a pull request — they could set hook_script to point to any executable on the filesystem:
# Malicious hooks.yaml
hook_script: "/tmp/malicious_payload.sh"
Or even more subtly:
hook_script: "../../.git/hooks/post-merge"
How Could It Be Exploited?
The attack chain is straightforward and alarmingly practical:
- Attacker forks a repository that uses
graphifywith hook support enabled. - Attacker submits a pull request that modifies
.graphify/hooks.yamlto pointhook_scriptat a malicious binary they've also included in the PR (or one already present on the runner). - CI/CD pipeline automatically checks out and processes the PR — as most modern pipelines do for automated testing and linting.
graphifyreads the hooks config and callssubprocess.run()with the attacker-supplied path.- Arbitrary code executes on the CI/CD runner with whatever privileges the pipeline process holds.
From here, the attacker can:
- Exfiltrate secrets, API keys, and environment variables
- Tamper with build artifacts (injecting malicious code into your releases)
- Pivot to internal infrastructure accessible from the runner
- Establish persistence for future attacks
Why Is This Especially Dangerous in CI/CD?
CI/CD runners are high-value targets. They typically have access to:
- Repository secrets (deploy keys, signing certificates, cloud credentials)
- Package registries (the ability to publish releases)
- Internal networks (staging environments, databases, internal APIs)
A single malicious commit that gets processed by an automated pipeline can compromise all of the above — without ever requiring direct access to your infrastructure.
This attack pattern has been used in real-world incidents against major open-source projects. The SolarWinds attack and the codecov breach are high-profile examples of what happens when build pipeline trust is misplaced.
CWE and OWASP Classification
- CWE-78: Improper Neutralization of Special Elements used in an OS Command ('OS Command Injection')
- CWE-426: Untrusted Search Path
- OWASP A03:2021 – Injection
- OWASP A08:2021 – Software and Data Integrity Failures (Supply Chain)
The Fix
The patch to graphify/hooks.py introduces path allowlisting — validating the resolved, absolute path of any hook script against a set of permitted directories before passing it to subprocess.run().
Before (Vulnerable Pattern)
# BEFORE: No validation of the hook script path
import subprocess
import yaml
def run_hook(config_path: str):
with open(config_path) as f:
config = yaml.safe_load(f)
hook_script = config.get("hook_script")
subprocess.run([hook_script], check=True) # ⚠️ Unvalidated user input
After (Secure Pattern)
# AFTER: Path validated against an allowlist before execution
import subprocess
import yaml
from pathlib import Path
# Define permitted base directories for hook scripts
ALLOWED_HOOK_DIRS = [
Path("/opt/graphify/hooks").resolve(),
Path("./graphify/default_hooks").resolve(),
]
class HookPathViolation(SecurityError):
"""Raised when a hook script path fails allowlist validation."""
pass
def _validate_hook_path(script_path: str) -> Path:
"""
Resolve the script path and verify it falls within an allowed directory.
Raises HookPathViolation if the path is outside permitted locations.
"""
resolved = Path(script_path).resolve() # Resolves symlinks and ".." traversal
for allowed_dir in ALLOWED_HOOK_DIRS:
try:
resolved.relative_to(allowed_dir) # Raises ValueError if not a subpath
return resolved
except ValueError:
continue
raise HookPathViolation(
f"Hook script '{script_path}' (resolved: '{resolved}') is outside "
f"permitted directories: {ALLOWED_HOOK_DIRS}"
)
def run_hook(config_path: str):
with open(config_path) as f:
config = yaml.safe_load(f)
raw_hook_script = config.get("hook_script")
if not raw_hook_script:
return # No hook configured, nothing to do
# Validate before executing — raises if path is not permitted
safe_script_path = _validate_hook_path(raw_hook_script)
subprocess.run([str(safe_script_path)], check=True) # ✅ Safe to execute
Key Security Improvements
| Aspect | Before | After |
|---|---|---|
| Path validation | None | Allowlist against permitted directories |
| Symlink resolution | Not handled | Path.resolve() prevents symlink escapes |
| Directory traversal | Vulnerable to ../../ |
Resolved absolute path checked |
| Error handling | Silent failure | Explicit HookPathViolation exception |
| Auditability | None | Clear, loggable rejection of invalid paths |
Why Path.resolve() Matters
A naive check like script_path.startswith("/opt/graphify/hooks") can be bypassed with path traversal:
/opt/graphify/hooks/../../etc/passwd
Calling Path(script_path).resolve() first collapses all .. components and follows symlinks, giving you the true filesystem path before comparison. This is essential for any path-based security check.
Prevention & Best Practices
1. Never Trust Configuration File Contents
Configuration files — especially those committed to a repository — should be treated as untrusted user input. Any value read from them that influences execution (file paths, command names, arguments) must be validated.
# Treat config values like HTTP request parameters — validate everything
hook_script = config.get("hook_script")
assert isinstance(hook_script, str), "hook_script must be a string"
safe_path = _validate_hook_path(hook_script) # Validate before use
2. Prefer Allowlists Over Denylists
It's tempting to block "dangerous" paths like /etc/ or /tmp/. Don't. Attackers are creative, and denylists are always incomplete. Instead, define exactly where scripts are permitted to live and reject everything else.
# ❌ Denylist — incomplete and bypassable
BLOCKED_PATHS = ["/etc", "/tmp", "/root"]
# ✅ Allowlist — explicit and safe
ALLOWED_DIRS = [Path("/opt/myapp/hooks").resolve()]
3. Apply Principle of Least Privilege to CI/CD Runners
Even with the fix in place, defense-in-depth matters:
- Run CI/CD jobs in ephemeral, isolated containers or VMs
- Use read-only repository checkouts where possible
- Scope secrets to only the jobs that need them
- Enable branch protection rules to require reviews before merging config changes
4. Audit subprocess Usage Regularly
Search your codebase for dangerous patterns:
# Find subprocess calls that might use variable input
grep -rn "subprocess\.\(run\|call\|Popen\|check_output\)" . \
--include="*.py" | grep -v "shell=False"
Better yet, use static analysis tools:
- Bandit (pip install bandit) — specifically checks for subprocess misuse (rule B603, B604)
- Semgrep with the python.lang.security.audit.subprocess-shell-true rule
- Safety for dependency vulnerability scanning
# Run Bandit on your project
bandit -r ./graphify -t B603,B604,B607
5. Consider Sandboxing Hook Execution
For applications that genuinely need to run user-provided scripts, consider sandboxing:
- Docker containers with limited capabilities and no network access
- seccomp profiles to restrict syscalls
- firejail or bubblewrap for lightweight sandboxing on Linux
6. Security Standards to Reference
- OWASP Command Injection Prevention Cheat Sheet
- CWE-78: OS Command Injection
- SLSA Supply Chain Security Framework
- OpenSSF Scorecard — automated supply chain risk assessment
A Note on the Related Credential Storage Issue
While this post focuses on the subprocess vulnerability (V-003), it's worth briefly acknowledging that a separate critical vulnerability (V-001) was also identified: OAuth tokens and API keys being stored in plaintext on the local filesystem in graphify/extract.py.
Plaintext credential storage is a serious companion risk — if an attacker achieves code execution via the hook injection described above, any plaintext credentials on disk become immediately accessible. Defense-in-depth means fixing both issues: prevent the execution, and encrypt the credentials so that even a successful breach yields less value to the attacker.
Encrypting stored credentials using a key derivation function like PBKDF2 (already available in the project's Rust dependencies) is the recommended remediation for V-001.
Conclusion
The vulnerability fixed in this PR is a textbook example of why input validation must happen at every trust boundary — not just at HTTP endpoints, but anywhere your application reads data that influences execution. A configuration file in a repository is just as dangerous as a form field on a website if its contents are blindly trusted.
The key takeaways:
- ✅ Always validate file paths read from configuration against an explicit allowlist
- ✅ Use
Path.resolve()before any path comparison to prevent traversal and symlink attacks - ✅ Treat CI/CD pipelines as high-value targets — they have access to your most sensitive secrets
- ✅ Apply defense-in-depth: input validation + least privilege + sandboxing + secret scoping
- ✅ Automate security scanning with tools like Bandit and Semgrep to catch these patterns before they reach production
Supply chain attacks are not theoretical. They are happening today, at scale, against real organizations. The cost of adding a 10-line path validation function is essentially zero. The cost of not adding it can be catastrophic.
Write the validation. Merge the fix. Sleep better.
This vulnerability was identified and patched by OrbisAI Security. Automated security scanning combined with LLM-assisted code review confirmed both the vulnerability and the effectiveness of the fix.