Path Traversal Vulnerability Fixed in Hatch-Pet Scripts: A Deep Dive
Introduction
Imagine handing a stranger the keys to your house and saying, "You can only go in the kitchen." Now imagine they discover the keys also open every other room — and you never knew. That's essentially what a path traversal vulnerability does to your application's filesystem.
A path traversal vulnerability (also known as a directory traversal attack) allows an attacker to manipulate file path references in your application to access files and directories stored outside the intended folder. In the worst cases, this means reading sensitive configuration files, credentials, or even overwriting critical system files.
This post breaks down a recently patched high-severity path traversal vulnerability found in skills/hatch-pet/scripts/generate_pet_images.py, explains how it could have been exploited, and walks through the best practices you should adopt to prevent similar issues in your own code.
The Vulnerability Explained
What Happened?
The vulnerability resided in how the hatch-pet script constructed file paths. Specifically, the code joined a run_dir variable with a hardcoded filename (pet_request.json) to build a full file path — something like this:
# Vulnerable pattern
import os
run_dir = get_run_dir() # Could come from CLI args, API params, or env vars
request_file = os.path.join(run_dir, 'pet_request.json')
with open(request_file, 'r') as f:
data = f.read()
This looks innocent at first glance. The problem? run_dir was derived from user-controlled input without any validation.
The Anatomy of a Path Traversal Attack
The ../ sequence (or ..\ on Windows) is the key weapon in a path traversal attack. When you join an unvalidated user input with a base path, an attacker can supply sequences like:
../../etc/passwd
../../../var/secrets/.env
../../../../home/user/.ssh/id_rsa
So if an attacker passes run_dir as ../../etc/, the resulting path becomes:
os.path.join("../../etc/", "pet_request.json")
# Resolves to: ../../etc/pet_request.json
This effectively points the file operation outside the intended working directory, potentially exposing or corrupting sensitive system files.
Real-World Impact
Depending on how the vulnerable code is used, an attacker could:
- Read sensitive files:
/etc/passwd,.envfiles containing API keys, database credentials, private SSH keys - Overwrite critical files: Configuration files, log files, or even executable scripts
- Achieve privilege escalation: By overwriting scripts run by higher-privileged processes
- Exfiltrate application secrets: Internal configuration, tokens, or user data
Attack Scenario: Step by Step
Let's walk through a concrete attack scenario:
Setup: The hatch-pet script accepts a --run-dir CLI argument or reads RUN_DIR from an environment variable.
Step 1 — Attacker identifies the input vector:
python generate_pet_images.py --run-dir /tmp/myrun
Step 2 — Attacker crafts a malicious path:
python generate_pet_images.py --run-dir "../../etc/"
Step 3 — Script attempts to read:
# Internally resolves to:
open("../../etc/pet_request.json", 'r')
# Or in a write scenario:
open("../../etc/pet_request.json", 'w')
Step 4 — Data exfiltration or corruption occurs silently, with no error raised by the application.
This is classified as CWE-22: Improper Limitation of a Pathname to a Restricted Directory and is consistently listed in the OWASP Top 10 under A01:2021 – Broken Access Control.
The Fix
What Changed?
The fix introduces proper path validation before the run_dir variable is used to construct any file path. The core of the solution is to:
- Resolve the absolute, canonical path of the constructed file path
- Verify it falls within the expected base directory
- Reject any path that escapes the intended directory
Here's what a secure implementation looks like:
# BEFORE (Vulnerable)
import os
def get_request_file(run_dir):
return os.path.join(run_dir, 'pet_request.json')
run_dir = args.run_dir # Unvalidated user input
request_file = get_request_file(run_dir)
with open(request_file, 'r') as f:
data = f.read()
# AFTER (Secure)
import os
# Define a strict base directory for all run operations
BASE_RUNS_DIR = os.path.abspath('/app/runs')
def get_safe_request_file(run_dir: str) -> str:
"""
Safely construct a file path, ensuring it stays within BASE_RUNS_DIR.
Raises ValueError if the resolved path escapes the base directory.
"""
# Resolve the absolute path (resolves all ../ sequences)
abs_run_dir = os.path.realpath(os.path.abspath(run_dir))
# Ensure the resolved path starts with our allowed base directory
if not abs_run_dir.startswith(BASE_RUNS_DIR + os.sep):
raise ValueError(
f"Invalid run_dir: path '{run_dir}' escapes the allowed directory."
)
return os.path.join(abs_run_dir, 'pet_request.json')
# Usage
try:
request_file = get_safe_request_file(args.run_dir)
with open(request_file, 'r') as f:
data = f.read()
except ValueError as e:
print(f"Security error: {e}")
sys.exit(1)
Why This Fix Works
The critical line is:
abs_run_dir = os.path.realpath(os.path.abspath(run_dir))
os.path.abspath()converts the path to an absolute path based on the current working directoryos.path.realpath()resolves all symbolic links and normalizes../sequences
After this resolution, ../../etc/ becomes /etc/ — and the subsequent check:
if not abs_run_dir.startswith(BASE_RUNS_DIR + os.sep):
...immediately catches that /etc/ is not within /app/runs/ and raises an error before any file operation occurs.
Note: Always append
os.sepwhen checkingstartswith()to avoid false positives where a directory like/app/runs-extramight incorrectly match a check for/app/runs.
Prevention & Best Practices
1. Always Validate and Canonicalize File Paths
Never trust user-supplied paths directly. Always resolve to an absolute, canonical path and verify it falls within your expected directory:
import os
def is_safe_path(base_dir: str, user_path: str) -> bool:
base = os.path.realpath(os.path.abspath(base_dir))
target = os.path.realpath(os.path.abspath(user_path))
return target.startswith(base + os.sep)
2. Use Allowlists, Not Blocklists
Instead of trying to block ../ sequences (which can be bypassed with URL encoding like %2e%2e%2f), use an allowlist approach — define exactly what's permitted and reject everything else.
# BAD: Blocklist approach (bypassable)
if '../' in run_dir or '..\' in run_dir:
raise ValueError("Invalid path")
# GOOD: Allowlist via canonical path check
if not is_safe_path(BASE_RUNS_DIR, run_dir):
raise ValueError("Path escapes allowed directory")
3. Apply the Principle of Least Privilege
Your application should only have filesystem access to the directories it genuinely needs. Use OS-level controls:
- Run processes as a dedicated low-privilege user
- Use chroot jails or containers to limit filesystem visibility
- Apply strict directory permissions so even a traversal can't access sensitive files
4. Treat All External Input as Untrusted
Path traversal inputs can come from many sources that developers often overlook:
- ✅ CLI arguments
- ✅ Environment variables
- ✅ API request parameters
- ✅ Configuration files loaded from external sources
- ✅ Data read from databases or queues
Any of these can be attacker-controlled. Validate them all.
5. Use Security Scanning Tools
Integrate static analysis tools into your CI/CD pipeline to catch these issues before they reach production:
| Tool | Language | Notes |
|---|---|---|
| Bandit | Python | Specifically detects path traversal patterns |
| Semgrep | Multi-language | Highly customizable rules |
| CodeQL | Multi-language | Deep semantic analysis |
| Snyk Code | Multi-language | Developer-friendly SAST |
6. Write Security-Focused Tests
Add test cases that specifically attempt path traversal:
import pytest
def test_path_traversal_rejected():
"""Ensure path traversal attempts are blocked."""
malicious_inputs = [
"../../etc/passwd",
"../secret",
"/absolute/path/outside",
"valid/../../../etc",
"%2e%2e%2fetc", # URL-encoded traversal
]
for malicious_input in malicious_inputs:
with pytest.raises(ValueError, match="escapes the allowed directory"):
get_safe_request_file(malicious_input)
def test_valid_run_dir_accepted():
"""Ensure legitimate paths still work."""
valid_input = "/app/runs/job-12345"
result = get_safe_request_file(valid_input)
assert result == "/app/runs/job-12345/pet_request.json"
Security Standards & References
- CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')
- OWASP Path Traversal: OWASP Testing Guide - Path Traversal
- OWASP Top 10 A01:2021: Broken Access Control
- NIST SP 800-53: Input Validation controls (SI-10)
Conclusion
Path traversal vulnerabilities are a classic yet persistently dangerous class of security flaw. They're often introduced through seemingly harmless code — a simple os.path.join() call that doesn't account for what a malicious actor might supply as input.
The key lessons from this vulnerability and its fix are:
- Never trust user input — validate and canonicalize all file paths before use
- Use
os.path.realpath()+ boundary checks to enforce directory restrictions in Python - Allowlists beat blocklists — define what's permitted, reject everything else
- Least privilege matters — limit what your process can access at the OS level
- Automate detection — integrate SAST tools and write security-specific tests
Security vulnerabilities like this one are fixed one patch at a time, but the real win comes from building a culture where developers understand why these patterns are dangerous and proactively avoid them. The next time you write code that touches the filesystem with user-supplied input, take a moment to ask: "What happens if someone passes ../../etc/ here?"
That moment of reflection might just save your application — and your users.
This vulnerability was identified and patched as part of an automated security scanning process. Automated tools are a valuable layer of defense, but they work best when paired with developer education and secure-by-default coding practices.