What is a path traversal vulnerability?

A path traversal vulnerability occurs when user-controlled input is used to construct a file path without sanitization, allowing attackers to navigate outside the intended directory using sequences like `../` to access arbitrary files.

How do you prevent path traversal in Python?

Use `os.path.realpath()` or `pathlib.Path.resolve()` to normalize the path, then verify it starts with the expected base directory before passing it to `open()`.

What CWE is path traversal?

Path traversal is classified as CWE-22: Improper Limitation of a Pathname to a Restricted Directory.

Is input length validation enough to prevent path traversal?

No. Length validation alone does not prevent path traversal. You must also normalize the path and enforce that it resolves within an allowed base directory.

Can static analysis detect path traversal?

Yes. Static analysis tools like Semgrep, Bandit, and Orbis AppSec can trace tainted data from user input to dangerous sinks like `open()` and flag unvalidated path usage automatically.

How path traversal in open() happens in Python and how to fix it

Summary

A high-severity path traversal vulnerability was discovered in tool/update-doc.py, where user-controlled input was passed directly to Python's open() function without sanitization. This flaw could allow an attacker to read arbitrary files on the server by manipulating the file path. The fix ensures that file paths are validated and restricted to an intended directory before being opened.

Introduction

The tool/update-doc.py script handles documentation updates — a utility role that might seem low-risk at first glance. But a flaw in how it processes file paths created a serious security hole: user-controlled input flowed directly into a call to open() with no sanitization, no boundary checking, and no path normalization.

This is a textbook path traversal vulnerability. The script trusts the caller to provide a safe file path, but an attacker who controls that input can supply something like ../../../../etc/passwd and cause the application to open files it was never meant to touch.

For developers building documentation tools, CLI utilities, or any script that accepts filenames from external sources, this pattern is surprisingly easy to introduce and surprisingly dangerous to leave unaddressed.

The Vulnerability Explained

What went wrong in `update-doc.py`

The vulnerable pattern looks like this:

# Vulnerable code — user-controlled `doc_path` passed directly to open()
doc_path = request.args.get("doc")  # or sys.argv, or any external source
with open(doc_path, "r") as f:
    content = f.read()

The problem is straightforward: the value of doc_path comes from user-controlled input and is used directly as the argument to open(). Python's open() function will happily accept any valid filesystem path — including ones that traverse parent directories using ../ sequences.

There is no call to os.path.realpath(), no check that the resolved path starts with a trusted base directory, and no filtering of dangerous path components. The script simply trusts the input.

How an attacker exploits this

Imagine the script is intended to open documentation files from a directory like /app/docs/. An attacker who can influence the doc_path value could supply:

../../etc/passwd

Which resolves to:

/app/docs/../../etc/passwd  →  /etc/passwd

Python's open() resolves this path transparently and returns the contents of /etc/passwd. On a real system, this technique can be used to read:

/etc/passwd — system user accounts
~/.ssh/id_rsa — private SSH keys
Application config files — database credentials, API keys, secrets
Other source files — internal business logic, hardcoded tokens

The attacker doesn't need authentication, special privileges, or a complex exploit chain. They just need to control the string passed to open().

Why `update-doc.py` is a realistic target

Documentation update scripts often run with elevated privileges (they need to write to the filesystem), and they frequently accept filenames from command-line arguments, web requests, or configuration files — all of which can be attacker-influenced. The combination of elevated access and unsanitized input makes this a high-value target.

The Fix

What needs to change

The fix requires two things:
1. Normalize the path — resolve .. sequences and symlinks to get the true absolute path.
2. Enforce a boundary — confirm the resolved path is inside the intended base directory before opening it.

Before (vulnerable)

# No sanitization — attacker controls doc_path entirely
doc_path = get_user_input()
with open(doc_path, "r") as f:
    content = f.read()

After (fixed)

import os

BASE_DIR = os.path.realpath("/app/docs")

def safe_open_doc(user_input):
    # Resolve the full, normalized absolute path
    requested_path = os.path.realpath(os.path.join(BASE_DIR, user_input))

    # Enforce that the resolved path is within the allowed base directory
    if not requested_path.startswith(BASE_DIR + os.sep):
        raise ValueError(f"Access denied: path traversal detected in '{user_input}'")

    with open(requested_path, "r") as f:
        return f.read()

Why this fix works

os.path.realpath() resolves all ../ sequences, symlinks, and redundant separators, giving you the true absolute path on the filesystem. An attacker can't hide traversal in URL encoding or double-dot tricks once this is applied.
startswith(BASE_DIR + os.sep) ensures the resolved path is genuinely inside the base directory, not just a string that happens to start with the same prefix. The + os.sep prevents a bypass where /app/docs_evil would incorrectly match /app/docs.
The check happens before open() is called, so the file is never opened if the path is out of bounds.

Using pathlib (Python 3.6+) offers an equally clean and Pythonic approach:

from pathlib import Path

BASE_DIR = Path("/app/docs").resolve()

def safe_open_doc(user_input):
    requested_path = (BASE_DIR / user_input).resolve()

    if not requested_path.is_relative_to(BASE_DIR):
        raise ValueError("Path traversal detected")

    return requested_path.read_text()

Path.is_relative_to() (Python 3.9+) makes the boundary check explicit and readable.

Prevention & Best Practices

1. Always normalize before validating

Raw string comparisons on file paths are unreliable. ../docs/../etc/passwd and /etc/passwd are the same file, but string comparison won't catch that. Always call os.path.realpath() or Path.resolve() first.

2. Define an explicit allow-list of base directories

Don't check what a path doesn't contain (e.g., filtering out ../). Instead, check what it does resolve to. Blocklist approaches are fragile — there are many encoding tricks (%2e%2e, null bytes on older systems, Unicode normalization) that can bypass them.

3. Use the principle of least privilege

If update-doc.py only needs to read files from /app/docs/, run it with a filesystem user that only has read access to that directory. Defense in depth means even a bypassed path check doesn't expose the entire filesystem.

4. Validate at the entry point

Sanitize file path inputs as close to the source as possible — when reading from sys.argv, an HTTP request, or a config file. Don't rely on downstream code to catch bad input.

5. Use static analysis to catch taint flows

Tools that perform taint analysis — tracking user-controlled data from source to sink — are especially effective at catching path traversal. Semgrep, Bandit, and Orbis AppSec can all identify when unsanitized input reaches open().

Relevant standards

CWE-22: Improper Limitation of a Pathname to a Restricted Directory ("Path Traversal")
OWASP Top 10 A01:2021 — Broken Access Control (path traversal is a key subcategory)
OWASP Path Traversal Cheat Sheet: comprehensive guidance on prevention

Key Takeaways

open(user_input) in update-doc.py is a direct path traversal sink — any user-controlled string reaching this call without normalization and boundary checking is exploitable.
String filtering for ../ is not a safe mitigation — attackers can encode, double-encode, or use OS-specific tricks to bypass blocklists. Use realpath() + prefix check instead.
Documentation and utility scripts are not low-risk — tools like update-doc.py often run with elevated permissions and accept filenames as input, making them attractive targets.
The startswith(BASE_DIR + os.sep) pattern is critical — omitting + os.sep creates a bypass where sibling directories with similar names pass the check incorrectly.
pathlib.Path.is_relative_to() is the modern, readable way to enforce directory boundaries in Python 3.9+ and should be preferred in new code.

How Orbis AppSec Detected This

Source: User-controlled input (e.g., command-line argument, request parameter, or external config value) providing the file path in tool/update-doc.py
Sink: open(doc_path, ...) called with the unsanitized user-controlled path, allowing arbitrary file reads
Missing control: No call to os.path.realpath() or equivalent normalization; no check that the resolved path falls within an intended base directory before the file is opened
CWE: CWE-22 — Improper Limitation of a Pathname to a Restricted Directory (Path Traversal)
Fix: Normalize the input path with os.path.realpath() and verify it resolves within the allowed base directory before passing it to open()

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

Path traversal in Python's open() is one of those vulnerabilities that looks simple but carries serious consequences. In tool/update-doc.py, the absence of a single normalization-and-boundary-check pattern was enough to expose the entire server filesystem to anyone who could influence the file path argument.

The fix is not complex — os.path.realpath() combined with a startswith() check, or pathlib's is_relative_to(), closes the door entirely. The lesson for developers is to treat every file path that originates from outside your code as untrusted, normalize it unconditionally, and verify it resolves where you expect before acting on it.

Security in file handling isn't about trusting your users. It's about building code that stays safe regardless of what input it receives.

cwe	CWE-22
fix	Validate and restrict file paths to an intended base directory before calling open()
risk	Attackers can read arbitrary files on the server, including credentials, config files, and source code
language	Python
root cause	User-controlled input passed directly to open() without path sanitization or boundary enforcement
vulnerability	Path Traversal via open()

How path traversal in open() happens in Python and how to fix it

Answer Summary

Vulnerability at a Glance

How path traversal in open() happens in Python and how to fix it

Summary

Introduction

The Vulnerability Explained

What went wrong in `update-doc.py`

How an attacker exploits this

Why `update-doc.py` is a realistic target

The Fix

What needs to change

Before (vulnerable)

After (fixed)

Why this fix works

Prevention & Best Practices

1. Always normalize before validating

2. Define an explicit allow-list of base directories

3. Use the principle of least privilege

4. Validate at the entry point

5. Use static analysis to catch taint flows

Relevant standards

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is a path traversal vulnerability?

How do you prevent path traversal in Python?

What CWE is path traversal?

Is input length validation enough to prevent path traversal?

Can static analysis detect path traversal?

View the Security Fix

Related Articles

How URL-Encoded Path Traversal happens in Python nltk.data.load() and how to fix it

How path traversal happens in Ruby YARD server and how to fix it

How path traversal happens in Python os.path and how to fix it

How path traversal happens in C file extraction and how to fix it

How command injection happens in Go ffmpeg wrappers and how to fix it

How command injection happens in Node.js child_process and how to fix it

How path traversal in open() happens in Python and how to fix it

Answer Summary

Vulnerability at a Glance

How path traversal in open() happens in Python and how to fix it

Summary

Introduction

The Vulnerability Explained

What went wrong in update-doc.py

How an attacker exploits this

Why update-doc.py is a realistic target

The Fix

What needs to change

Before (vulnerable)

After (fixed)

Why this fix works

Prevention & Best Practices

1. Always normalize before validating

2. Define an explicit allow-list of base directories

3. Use the principle of least privilege

4. Validate at the entry point

5. Use static analysis to catch taint flows

Relevant standards

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is a path traversal vulnerability?

How do you prevent path traversal in Python?

What CWE is path traversal?

Is input length validation enough to prevent path traversal?

Can static analysis detect path traversal?

View the Security Fix

Related Articles

How URL-Encoded Path Traversal happens in Python nltk.data.load() and how to fix it

How path traversal happens in Ruby YARD server and how to fix it

How path traversal happens in Python os.path and how to fix it

How path traversal happens in C file extraction and how to fix it

How command injection happens in Go ffmpeg wrappers and how to fix it

How command injection happens in Node.js child_process and how to fix it

What went wrong in `update-doc.py`

Why `update-doc.py` is a realistic target