Path Traversal Vulnerability Fixed in Hatch-Pet Scripts: A Deep Dive

Introduction

Imagine handing a stranger the keys to your house and saying, "You can only go in the kitchen." Now imagine they discover the keys also open every other room — and you never knew. That's essentially what a path traversal vulnerability does to your application's filesystem.

A path traversal vulnerability (also known as a directory traversal attack) allows an attacker to manipulate file path references in your application to access files and directories stored outside the intended folder. In the worst cases, this means reading sensitive configuration files, credentials, or even overwriting critical system files.

This post breaks down a recently patched high-severity path traversal vulnerability found in skills/hatch-pet/scripts/generate_pet_images.py, explains how it could have been exploited, and walks through the best practices you should adopt to prevent similar issues in your own code.

The Vulnerability Explained

What Happened?

The vulnerability resided in how the hatch-pet script constructed file paths. Specifically, the code joined a run_dir variable with a hardcoded filename (pet_request.json) to build a full file path — something like this:

# Vulnerable pattern
import os

run_dir = get_run_dir()  # Could come from CLI args, API params, or env vars
request_file = os.path.join(run_dir, 'pet_request.json')

with open(request_file, 'r') as f:
    data = f.read()

This looks innocent at first glance. The problem? run_dir was derived from user-controlled input without any validation.

The Anatomy of a Path Traversal Attack

The ../ sequence (or ..\ on Windows) is the key weapon in a path traversal attack. When you join an unvalidated user input with a base path, an attacker can supply sequences like:

../../etc/passwd
../../../var/secrets/.env
../../../../home/user/.ssh/id_rsa

So if an attacker passes run_dir as ../../etc/, the resulting path becomes:

os.path.join("../../etc/", "pet_request.json")
# Resolves to: ../../etc/pet_request.json

This effectively points the file operation outside the intended working directory, potentially exposing or corrupting sensitive system files.

Real-World Impact

Depending on how the vulnerable code is used, an attacker could:

Read sensitive files: /etc/passwd, .env files containing API keys, database credentials, private SSH keys
Overwrite critical files: Configuration files, log files, or even executable scripts
Achieve privilege escalation: By overwriting scripts run by higher-privileged processes
Exfiltrate application secrets: Internal configuration, tokens, or user data

Attack Scenario: Step by Step

Let's walk through a concrete attack scenario:

Setup: The hatch-pet script accepts a --run-dir CLI argument or reads RUN_DIR from an environment variable.

Step 1 — Attacker identifies the input vector:

python generate_pet_images.py --run-dir /tmp/myrun

Step 2 — Attacker crafts a malicious path:

python generate_pet_images.py --run-dir "../../etc/"

Step 3 — Script attempts to read:

# Internally resolves to:
open("../../etc/pet_request.json", 'r')
# Or in a write scenario:
open("../../etc/pet_request.json", 'w')

Step 4 — Data exfiltration or corruption occurs silently, with no error raised by the application.

This is classified as CWE-22: Improper Limitation of a Pathname to a Restricted Directory and is consistently listed in the OWASP Top 10 under A01:2021 – Broken Access Control.

The Fix

What Changed?

The fix introduces proper path validation before the run_dir variable is used to construct any file path. The core of the solution is to:

Resolve the absolute, canonical path of the constructed file path
Verify it falls within the expected base directory
Reject any path that escapes the intended directory

Here's what a secure implementation looks like:

# BEFORE (Vulnerable)
import os

def get_request_file(run_dir):
    return os.path.join(run_dir, 'pet_request.json')

run_dir = args.run_dir  # Unvalidated user input
request_file = get_request_file(run_dir)

with open(request_file, 'r') as f:
    data = f.read()

# AFTER (Secure)
import os

# Define a strict base directory for all run operations
BASE_RUNS_DIR = os.path.abspath('/app/runs')

def get_safe_request_file(run_dir: str) -> str:
    """
    Safely construct a file path, ensuring it stays within BASE_RUNS_DIR.
    Raises ValueError if the resolved path escapes the base directory.
    """
    # Resolve the absolute path (resolves all ../ sequences)
    abs_run_dir = os.path.realpath(os.path.abspath(run_dir))

    # Ensure the resolved path starts with our allowed base directory
    if not abs_run_dir.startswith(BASE_RUNS_DIR + os.sep):
        raise ValueError(
            f"Invalid run_dir: path '{run_dir}' escapes the allowed directory."
        )

    return os.path.join(abs_run_dir, 'pet_request.json')

# Usage
try:
    request_file = get_safe_request_file(args.run_dir)
    with open(request_file, 'r') as f:
        data = f.read()
except ValueError as e:
    print(f"Security error: {e}")
    sys.exit(1)

Why This Fix Works

The critical line is:

abs_run_dir = os.path.realpath(os.path.abspath(run_dir))

os.path.abspath() converts the path to an absolute path based on the current working directory
os.path.realpath() resolves all symbolic links and normalizes ../ sequences

After this resolution, ../../etc/ becomes /etc/ — and the subsequent check:

if not abs_run_dir.startswith(BASE_RUNS_DIR + os.sep):

...immediately catches that /etc/ is not within /app/runs/ and raises an error before any file operation occurs.

Note: Always append os.sep when checking startswith() to avoid false positives where a directory like /app/runs-extra might incorrectly match a check for /app/runs.

Prevention & Best Practices

1. Always Validate and Canonicalize File Paths

Never trust user-supplied paths directly. Always resolve to an absolute, canonical path and verify it falls within your expected directory:

import os

def is_safe_path(base_dir: str, user_path: str) -> bool:
    base = os.path.realpath(os.path.abspath(base_dir))
    target = os.path.realpath(os.path.abspath(user_path))
    return target.startswith(base + os.sep)

2. Use Allowlists, Not Blocklists

Instead of trying to block ../ sequences (which can be bypassed with URL encoding like %2e%2e%2f), use an allowlist approach — define exactly what's permitted and reject everything else.

# BAD: Blocklist approach (bypassable)
if '../' in run_dir or '..\' in run_dir:
    raise ValueError("Invalid path")

# GOOD: Allowlist via canonical path check
if not is_safe_path(BASE_RUNS_DIR, run_dir):
    raise ValueError("Path escapes allowed directory")

3. Apply the Principle of Least Privilege

Your application should only have filesystem access to the directories it genuinely needs. Use OS-level controls:

Run processes as a dedicated low-privilege user
Use chroot jails or containers to limit filesystem visibility
Apply strict directory permissions so even a traversal can't access sensitive files

4. Treat All External Input as Untrusted

Path traversal inputs can come from many sources that developers often overlook:

✅ CLI arguments
✅ Environment variables
✅ API request parameters
✅ Configuration files loaded from external sources
✅ Data read from databases or queues

Any of these can be attacker-controlled. Validate them all.

5. Use Security Scanning Tools

Integrate static analysis tools into your CI/CD pipeline to catch these issues before they reach production:

Tool	Language	Notes
Bandit	Python	Specifically detects path traversal patterns
Semgrep	Multi-language	Highly customizable rules
CodeQL	Multi-language	Deep semantic analysis
Snyk Code	Multi-language	Developer-friendly SAST

6. Write Security-Focused Tests

Add test cases that specifically attempt path traversal:

import pytest

def test_path_traversal_rejected():
    """Ensure path traversal attempts are blocked."""
    malicious_inputs = [
        "../../etc/passwd",
        "../secret",
        "/absolute/path/outside",
        "valid/../../../etc",
        "%2e%2e%2fetc",  # URL-encoded traversal
    ]
    for malicious_input in malicious_inputs:
        with pytest.raises(ValueError, match="escapes the allowed directory"):
            get_safe_request_file(malicious_input)

def test_valid_run_dir_accepted():
    """Ensure legitimate paths still work."""
    valid_input = "/app/runs/job-12345"
    result = get_safe_request_file(valid_input)
    assert result == "/app/runs/job-12345/pet_request.json"

Security Standards & References

CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')
OWASP Path Traversal: OWASP Testing Guide - Path Traversal
OWASP Top 10 A01:2021: Broken Access Control
NIST SP 800-53: Input Validation controls (SI-10)

Conclusion

Path traversal vulnerabilities are a classic yet persistently dangerous class of security flaw. They're often introduced through seemingly harmless code — a simple os.path.join() call that doesn't account for what a malicious actor might supply as input.

The key lessons from this vulnerability and its fix are:

Never trust user input — validate and canonicalize all file paths before use
Use os.path.realpath() + boundary checks to enforce directory restrictions in Python
Allowlists beat blocklists — define what's permitted, reject everything else
Least privilege matters — limit what your process can access at the OS level
Automate detection — integrate SAST tools and write security-specific tests

Security vulnerabilities like this one are fixed one patch at a time, but the real win comes from building a culture where developers understand why these patterns are dangerous and proactively avoid them. The next time you write code that touches the filesystem with user-supplied input, take a moment to ask: "What happens if someone passes ../../etc/ here?"

That moment of reflection might just save your application — and your users.

This vulnerability was identified and patched as part of an automated security scanning process. Automated tools are a valuable layer of defense, but they work best when paired with developer education and secure-by-default coding practices.

Path Traversal Vulnerability Fixed in Hatch-Pet Scripts: A Deep Dive

Path Traversal Vulnerability Fixed in Hatch-Pet Scripts: A Deep Dive

Introduction

The Vulnerability Explained

What Happened?

The Anatomy of a Path Traversal Attack

Real-World Impact

Attack Scenario: Step by Step

The Fix

What Changed?

Why This Fix Works

Prevention & Best Practices

1. Always Validate and Canonicalize File Paths

2. Use Allowlists, Not Blocklists

3. Apply the Principle of Least Privilege

4. Treat All External Input as Untrusted

5. Use Security Scanning Tools

6. Write Security-Focused Tests

Security Standards & References

Conclusion

View the Security Fix

Related Articles

Stack Buffer Overflow in MapScale: How Five Unsafe sprintf Calls Created a Critical Vulnerability

Heap Buffer Overflows in YAML Parser: How Unchecked memcpy Calls Create Critical Attack Vectors

Critical Buffer Overflow Fixed: When "Safe" Functions Aren't Safe