Back to Blog
high SEVERITY7 min read

How path traversal in open() happens in Python and how to fix it

A high-severity path traversal vulnerability was discovered in `tool/update-doc.py`, where user-controlled input was passed directly to Python's `open()` function without sanitization. This flaw could allow an attacker to read arbitrary files on the server by manipulating the file path. The fix ensures that file paths are validated and restricted to an intended directory before being opened.

O
By Orbis AppSec
Published June 6, 2026Reviewed June 6, 2026

Answer Summary

This is a path traversal vulnerability (CWE-22) in Python's `open()` function inside `tool/update-doc.py`, where user-controlled input was used as a file path without sanitization. An attacker could supply a path like `../../etc/passwd` to read arbitrary files outside the intended directory. The fix involves sanitizing or validating the input path — typically by resolving it to an absolute path and confirming it remains within an allowed base directory before calling `open()`. Orbis AppSec automatically detected and patched this issue.

Vulnerability at a Glance

cweCWE-22
fixValidate and restrict file paths to an intended base directory before calling open()
riskAttackers can read arbitrary files on the server, including credentials, config files, and source code
languagePython
root causeUser-controlled input passed directly to open() without path sanitization or boundary enforcement
vulnerabilityPath Traversal via open()

How path traversal in open() happens in Python and how to fix it

Summary

A high-severity path traversal vulnerability was discovered in tool/update-doc.py, where user-controlled input was passed directly to Python's open() function without sanitization. This flaw could allow an attacker to read arbitrary files on the server by manipulating the file path. The fix ensures that file paths are validated and restricted to an intended directory before being opened.


Introduction

The tool/update-doc.py script handles documentation updates — a utility role that might seem low-risk at first glance. But a flaw in how it processes file paths created a serious security hole: user-controlled input flowed directly into a call to open() with no sanitization, no boundary checking, and no path normalization.

This is a textbook path traversal vulnerability. The script trusts the caller to provide a safe file path, but an attacker who controls that input can supply something like ../../../../etc/passwd and cause the application to open files it was never meant to touch.

For developers building documentation tools, CLI utilities, or any script that accepts filenames from external sources, this pattern is surprisingly easy to introduce and surprisingly dangerous to leave unaddressed.


The Vulnerability Explained

What went wrong in update-doc.py

The vulnerable pattern looks like this:

# Vulnerable code — user-controlled `doc_path` passed directly to open()
doc_path = request.args.get("doc")  # or sys.argv, or any external source
with open(doc_path, "r") as f:
    content = f.read()

The problem is straightforward: the value of doc_path comes from user-controlled input and is used directly as the argument to open(). Python's open() function will happily accept any valid filesystem path — including ones that traverse parent directories using ../ sequences.

There is no call to os.path.realpath(), no check that the resolved path starts with a trusted base directory, and no filtering of dangerous path components. The script simply trusts the input.

How an attacker exploits this

Imagine the script is intended to open documentation files from a directory like /app/docs/. An attacker who can influence the doc_path value could supply:

../../etc/passwd

Which resolves to:

/app/docs/../../etc/passwd  →  /etc/passwd

Python's open() resolves this path transparently and returns the contents of /etc/passwd. On a real system, this technique can be used to read:

  • /etc/passwd — system user accounts
  • ~/.ssh/id_rsa — private SSH keys
  • Application config files — database credentials, API keys, secrets
  • Other source files — internal business logic, hardcoded tokens

The attacker doesn't need authentication, special privileges, or a complex exploit chain. They just need to control the string passed to open().

Why update-doc.py is a realistic target

Documentation update scripts often run with elevated privileges (they need to write to the filesystem), and they frequently accept filenames from command-line arguments, web requests, or configuration files — all of which can be attacker-influenced. The combination of elevated access and unsanitized input makes this a high-value target.


The Fix

What needs to change

The fix requires two things:
1. Normalize the path — resolve .. sequences and symlinks to get the true absolute path.
2. Enforce a boundary — confirm the resolved path is inside the intended base directory before opening it.

Before (vulnerable)

# No sanitization — attacker controls doc_path entirely
doc_path = get_user_input()
with open(doc_path, "r") as f:
    content = f.read()

After (fixed)

import os

BASE_DIR = os.path.realpath("/app/docs")

def safe_open_doc(user_input):
    # Resolve the full, normalized absolute path
    requested_path = os.path.realpath(os.path.join(BASE_DIR, user_input))

    # Enforce that the resolved path is within the allowed base directory
    if not requested_path.startswith(BASE_DIR + os.sep):
        raise ValueError(f"Access denied: path traversal detected in '{user_input}'")

    with open(requested_path, "r") as f:
        return f.read()

Why this fix works

  • os.path.realpath() resolves all ../ sequences, symlinks, and redundant separators, giving you the true absolute path on the filesystem. An attacker can't hide traversal in URL encoding or double-dot tricks once this is applied.
  • startswith(BASE_DIR + os.sep) ensures the resolved path is genuinely inside the base directory, not just a string that happens to start with the same prefix. The + os.sep prevents a bypass where /app/docs_evil would incorrectly match /app/docs.
  • The check happens before open() is called, so the file is never opened if the path is out of bounds.

Using pathlib (Python 3.6+) offers an equally clean and Pythonic approach:

from pathlib import Path

BASE_DIR = Path("/app/docs").resolve()

def safe_open_doc(user_input):
    requested_path = (BASE_DIR / user_input).resolve()

    if not requested_path.is_relative_to(BASE_DIR):
        raise ValueError("Path traversal detected")

    return requested_path.read_text()

Path.is_relative_to() (Python 3.9+) makes the boundary check explicit and readable.


Prevention & Best Practices

1. Always normalize before validating

Raw string comparisons on file paths are unreliable. ../docs/../etc/passwd and /etc/passwd are the same file, but string comparison won't catch that. Always call os.path.realpath() or Path.resolve() first.

2. Define an explicit allow-list of base directories

Don't check what a path doesn't contain (e.g., filtering out ../). Instead, check what it does resolve to. Blocklist approaches are fragile — there are many encoding tricks (%2e%2e, null bytes on older systems, Unicode normalization) that can bypass them.

3. Use the principle of least privilege

If update-doc.py only needs to read files from /app/docs/, run it with a filesystem user that only has read access to that directory. Defense in depth means even a bypassed path check doesn't expose the entire filesystem.

4. Validate at the entry point

Sanitize file path inputs as close to the source as possible — when reading from sys.argv, an HTTP request, or a config file. Don't rely on downstream code to catch bad input.

5. Use static analysis to catch taint flows

Tools that perform taint analysis — tracking user-controlled data from source to sink — are especially effective at catching path traversal. Semgrep, Bandit, and Orbis AppSec can all identify when unsanitized input reaches open().

Relevant standards

  • CWE-22: Improper Limitation of a Pathname to a Restricted Directory ("Path Traversal")
  • OWASP Top 10 A01:2021 — Broken Access Control (path traversal is a key subcategory)
  • OWASP Path Traversal Cheat Sheet: comprehensive guidance on prevention

Key Takeaways

  • open(user_input) in update-doc.py is a direct path traversal sink — any user-controlled string reaching this call without normalization and boundary checking is exploitable.
  • String filtering for ../ is not a safe mitigation — attackers can encode, double-encode, or use OS-specific tricks to bypass blocklists. Use realpath() + prefix check instead.
  • Documentation and utility scripts are not low-risk — tools like update-doc.py often run with elevated permissions and accept filenames as input, making them attractive targets.
  • The startswith(BASE_DIR + os.sep) pattern is critical — omitting + os.sep creates a bypass where sibling directories with similar names pass the check incorrectly.
  • pathlib.Path.is_relative_to() is the modern, readable way to enforce directory boundaries in Python 3.9+ and should be preferred in new code.

How Orbis AppSec Detected This

  • Source: User-controlled input (e.g., command-line argument, request parameter, or external config value) providing the file path in tool/update-doc.py
  • Sink: open(doc_path, ...) called with the unsanitized user-controlled path, allowing arbitrary file reads
  • Missing control: No call to os.path.realpath() or equivalent normalization; no check that the resolved path falls within an intended base directory before the file is opened
  • CWE: CWE-22 — Improper Limitation of a Pathname to a Restricted Directory (Path Traversal)
  • Fix: Normalize the input path with os.path.realpath() and verify it resolves within the allowed base directory before passing it to open()

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.


Conclusion

Path traversal in Python's open() is one of those vulnerabilities that looks simple but carries serious consequences. In tool/update-doc.py, the absence of a single normalization-and-boundary-check pattern was enough to expose the entire server filesystem to anyone who could influence the file path argument.

The fix is not complex — os.path.realpath() combined with a startswith() check, or pathlib's is_relative_to(), closes the door entirely. The lesson for developers is to treat every file path that originates from outside your code as untrusted, normalize it unconditionally, and verify it resolves where you expect before acting on it.

Security in file handling isn't about trusting your users. It's about building code that stays safe regardless of what input it receives.


References

Frequently Asked Questions

What is a path traversal vulnerability?

A path traversal vulnerability occurs when user-controlled input is used to construct a file path without sanitization, allowing attackers to navigate outside the intended directory using sequences like `../` to access arbitrary files.

How do you prevent path traversal in Python?

Use `os.path.realpath()` or `pathlib.Path.resolve()` to normalize the path, then verify it starts with the expected base directory before passing it to `open()`.

What CWE is path traversal?

Path traversal is classified as CWE-22: Improper Limitation of a Pathname to a Restricted Directory.

Is input length validation enough to prevent path traversal?

No. Length validation alone does not prevent path traversal. You must also normalize the path and enforce that it resolves within an allowed base directory.

Can static analysis detect path traversal?

Yes. Static analysis tools like Semgrep, Bandit, and Orbis AppSec can trace tainted data from user input to dangerous sinks like `open()` and flag unvalidated path usage automatically.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #10

Related Articles

medium

How path traversal happens in C file extraction and how to fix it

A path traversal vulnerability in the borpak archive extraction tool allowed attackers to write files to arbitrary locations on the filesystem by crafting malicious .pak archives with `../` sequences in filenames. This medium-severity issue in `tools/borpak/source/borpak.c` could enable system compromise through overwriting critical files like `.bashrc` or cron jobs. The fix implements path validation to ensure extracted files never escape the intended extraction directory.

critical

How command injection happens in Go ffmpeg wrappers and how to fix it

A critical command injection vulnerability was discovered in `drivers/local/util.go` where user-influenced file paths were passed directly to `ffmpeg.Input()` without any sanitization. Because many ffmpeg wrapper libraries construct shell command strings under the hood, an attacker could embed shell metacharacters in a file path to execute arbitrary OS commands with server-level privileges. The fix introduces a `sanitizeFilePath()` function that validates paths are absolute, clean, and point to

critical

Path Traversal in ZMODEM Receiver: How a Missing basename() Call Could Overwrite Your SSH Keys

A critical path traversal vulnerability in a ZMODEM file receiver allowed a malicious sender to supply crafted filenames containing directory traversal sequences (like `../../.ssh/authorized_keys`), causing the receiver to write file contents to arbitrary locations on the filesystem. The fix strips path separators and validates filenames before use, ensuring received files can only be written to the intended download directory. This class of vulnerability is a stark reminder that any input origi

critical

Heap Buffer Overflow in Path Normalization: How Two Unsafe memcpy Calls Almost Became a Critical Exploit

A critical heap buffer overflow vulnerability was discovered and patched in `src/aux.c`, where two `memcpy` calls in a path normalization function copied data into buffers without verifying sufficient capacity. An attacker capable of influencing the current working directory path — through deeply nested directories or crafted symlinks — could trigger heap corruption with potentially severe consequences. The fix introduces an integer overflow guard that ensures buffer allocation math cannot wrap

critical

Path Traversal in TFTP Server: How Directory Traversal Bugs Enable Arbitrary File Writes

A critical path traversal vulnerability (CWE-22) was discovered and patched in a TFTP server implementation where unsanitized filenames in write requests could allow attackers to overwrite arbitrary files on the host filesystem. This post breaks down how the vulnerability worked, how it was exploited, and what developers can do to prevent similar issues in their own code.

high

How Spring Boot EndpointRequest.to() security bypass happens in Java Spring Boot and how to fix it

CVE-2025-22235 is a high-severity vulnerability in Spring Boot where `EndpointRequest.to()` creates an incorrect request matcher when an actuator endpoint is not exposed, potentially allowing unauthorized access to protected endpoints. The fix upgrades Spring Boot from 3.4.4 to 3.4.5 in the anti-corruption-layer service's `pom.xml`. This is particularly dangerous because actuator endpoints can expose sensitive operational data and administrative functions.