Back to Blog
critical SEVERITY6 min read

Node-tar Path Traversal: How Unicode Collisions Bypass Security Checks

A medium-severity vulnerability in node-tar (CVE-2026-24842) allowed attackers to create arbitrary files outside intended directories by exploiting Unicode path collisions in hardlink security checks. This race condition could enable malicious tar archives to overwrite critical system files, potentially leading to remote code execution or privilege escalation.

O
By orbisai0security
March 6, 2026
#security#node-tar#path-traversal#unicode#cve#nodejs#vulnerability

Introduction

The node-tar package, a widely-used Node.js library for handling tar archives, recently patched a critical security vulnerability that could allow attackers to write files anywhere on your system. With millions of downloads per week and usage across countless npm packages, this vulnerability (CVE-2026-24842) represents a significant supply chain security concern.

If your application extracts tar archives—whether from user uploads, CI/CD pipelines, or third-party sources—you need to understand this vulnerability and ensure you're running a patched version.

The Vulnerability Explained

What is Path Traversal?

Path traversal vulnerabilities occur when an application fails to properly validate file paths, allowing attackers to access or write files outside the intended directory. In tar archives, this typically involves entries with paths like ../../../../etc/passwd that "escape" the extraction directory.

The node-tar Security Mechanism

Node-tar implements security checks to prevent path traversal attacks, including special validation for hardlinks—filesystem entries that point to existing files. The library verifies that hardlink targets don't reference paths outside the extraction directory.

The Unicode Collision Exploit

This vulnerability exploited a subtle race condition involving Unicode path normalization. Here's how the attack worked:

Step 1: Unicode Equivalence
Different Unicode sequences can represent visually identical characters. For example:
- café (using U+00E9: é)
- café (using U+0065 U+0301: e + combining acute accent)

These look identical but have different binary representations.

Step 2: The Race Condition
1. Attacker creates a malicious tar archive with a hardlink entry
2. The hardlink path uses Unicode characters that normalize differently at different stages
3. During the security check, the path appears safe (within bounds)
4. During actual file creation, the normalized path points outside the extraction directory
5. The attacker successfully writes to arbitrary locations

Step 3: Arbitrary File Overwrite
By carefully crafting the Unicode sequences, attackers could:
- Overwrite configuration files
- Replace executable scripts
- Modify package.json or other critical files
- Potentially achieve remote code execution

Real-World Attack Scenario

Imagine a CI/CD pipeline that extracts dependency archives:

// Vulnerable code pattern
const tar = require('tar');

app.post('/upload-package', async (req, res) => {
  // Extract uploaded tar file
  await tar.extract({
    file: req.files.package.path,
    cwd: '/app/packages/'
  });

  res.send('Package installed successfully');
});

An attacker uploads a malicious tar archive containing:

packages/my-package/index.js          (legitimate file)
packages/my-package/../../.bashrc     (hardlink with Unicode collision)

The hardlink security check passes due to Unicode normalization differences, but the actual file creation overwrites /app/.bashrc, potentially executing malicious code on the next shell invocation.

The Fix

What Changed

The fix addresses the Unicode path collision by implementing consistent Unicode normalization across all path validation checks. Specifically:

  1. Consistent Normalization: All paths are normalized to a canonical form (NFC - Normalization Form Canonical Composition) before any security checks
  2. Enhanced Hardlink Validation: Hardlink targets undergo the same normalization process as the paths being checked
  3. Race Condition Prevention: Security checks and file operations now use the same normalized path representation

Security Improvement

The patch ensures that:
- Unicode equivalence cannot bypass security boundaries
- Hardlink validation occurs on canonicalized paths
- No discrepancy exists between check-time and use-time paths (TOCTOU protection)

Before and After Comparison

Before (Vulnerable):

// Simplified representation of vulnerable logic
function isValidHardlink(linkPath, targetPath, extractDir) {
  // Security check on raw paths
  const resolvedTarget = path.resolve(extractDir, targetPath);
  const resolvedExtract = path.resolve(extractDir);

  // Vulnerable: doesn't account for Unicode normalization differences
  return resolvedTarget.startsWith(resolvedExtract);
}

After (Patched):

// Simplified representation of patched logic
function isValidHardlink(linkPath, targetPath, extractDir) {
  // Normalize all paths consistently
  const normalizedTarget = normalizeUnicode(targetPath);
  const normalizedLink = normalizeUnicode(linkPath);

  const resolvedTarget = path.resolve(extractDir, normalizedTarget);
  const resolvedExtract = path.resolve(extractDir);

  // Check occurs on normalized paths
  return resolvedTarget.startsWith(resolvedExtract);
}

function normalizeUnicode(str) {
  // Convert to canonical form
  return str.normalize('NFC');
}

Prevention & Best Practices

Immediate Actions

  1. Update node-tar immediately: Run npm audit fix or manually update to the latest patched version
  2. Check your dependencies: Use npm ls tar to identify all packages depending on node-tar
  3. Review package-lock.json: Ensure transitive dependencies are also updated

Long-Term Security Practices

1. Input Validation and Sanitization

Always validate and sanitize file paths, especially when dealing with archives:

const path = require('path');

function sanitizePath(filePath, baseDir) {
  // Normalize Unicode
  const normalized = filePath.normalize('NFC');

  // Resolve to absolute path
  const resolved = path.resolve(baseDir, normalized);

  // Ensure it's within baseDir
  if (!resolved.startsWith(path.resolve(baseDir) + path.sep)) {
    throw new Error('Path traversal attempt detected');
  }

  return resolved;
}

2. Principle of Least Privilege

Extract archives in isolated, restricted directories:

const tar = require('tar');
const fs = require('fs').promises;

async function safeExtract(archivePath) {
  // Create temporary, isolated extraction directory
  const tempDir = await fs.mkdtemp('/tmp/extract-');

  try {
    await tar.extract({
      file: archivePath,
      cwd: tempDir,
      strict: true,  // Enable strict mode
      filter: (path) => {
        // Additional path validation
        return !path.includes('..');
      }
    });

    // Process extracted files
    // ...
  } finally {
    // Cleanup
    await fs.rm(tempDir, { recursive: true, force: true });
  }
}

3. Security Scanning and Monitoring

  • npm audit: Regularly run npm audit in your CI/CD pipeline
  • Snyk or Dependabot: Use automated dependency scanning tools
  • OWASP Dependency-Check: Integrate into your build process
  • Runtime monitoring: Log and alert on suspicious file operations

4. Defense in Depth

Implement multiple layers of security:

const tar = require('tar');
const path = require('path');

async function secureExtract(archivePath, targetDir) {
  const allowedExtensions = ['.js', '.json', '.md'];
  const maxFileSize = 10 * 1024 * 1024; // 10MB

  await tar.extract({
    file: archivePath,
    cwd: targetDir,
    strict: true,
    filter: (filePath, entry) => {
      // Unicode normalization
      const normalized = filePath.normalize('NFC');

      // Path traversal check
      if (normalized.includes('..')) return false;

      // File extension whitelist
      const ext = path.extname(normalized);
      if (!allowedExtensions.includes(ext)) return false;

      // Size limit
      if (entry.size > maxFileSize) return false;

      return true;
    }
  });
}

Relevant Security Standards

  • CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')
  • CWE-367: Time-of-check Time-of-use (TOCTOU) Race Condition
  • CWE-176: Improper Handling of Unicode Encoding
  • OWASP A05:2021: Security Misconfiguration
  • OWASP A06:2021: Vulnerable and Outdated Components

Detection Tools

  • ESLint security plugins: eslint-plugin-security
  • Static analysis: SonarQube, Semgrep
  • Dynamic testing: OWASP ZAP for web applications
  • Dependency scanning: npm audit, Snyk, GitHub Dependabot

Conclusion

The node-tar Unicode path collision vulnerability demonstrates how subtle implementation details can create serious security risks. Even well-intentioned security checks can be bypassed through creative exploitation of edge cases like Unicode normalization.

Key Takeaways:

  1. Update immediately: Patch node-tar to the latest version in all projects
  2. Defense in depth: Don't rely on a single security mechanism
  3. Validate consistently: Apply the same normalization and validation everywhere
  4. Monitor dependencies: Use automated tools to catch vulnerabilities early
  5. Assume breach: Design systems that limit damage even when vulnerabilities exist

Security is not a one-time fix but an ongoing process. By understanding vulnerabilities like CVE-2026-24842, implementing robust validation, and staying current with patches, you can significantly reduce your application's attack surface.

Remember: every dependency is a trust relationship. Regularly audit your dependencies, understand their security posture, and have a plan to respond quickly when vulnerabilities are disclosed.

Stay secure, and happy coding! 🔒


Resources:
- Node-tar GitHub Repository
- CVE-2026-24842 Details
- OWASP Path Traversal Guide
- Unicode Security Considerations

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #66

Related Articles

critical

Stack Buffer Overflow in MapScale: How Five Unsafe sprintf Calls Created a Critical Vulnerability

A critical stack-based buffer overflow vulnerability was discovered and patched in `src/mapscale.c`, where five unbounded `sprintf` calls wrote formatted output into fixed-size stack buffers without any bounds checking. An attacker controlling unit text strings could overflow the stack buffer, potentially overwriting the function return address and achieving arbitrary code execution. The fix replaces dangerous `sprintf` calls with their bounds-checked counterparts, eliminating the overflow risk

critical

Heap Buffer Overflows in YAML Parser: How Unchecked memcpy Calls Create Critical Attack Vectors

A critical heap buffer overflow vulnerability was discovered and patched in the YAML parser embedded within an Android VPN application, where five unvalidated `memcpy` calls could allow an attacker to corrupt heap memory by supplying a crafted YAML configuration file. This class of vulnerability is particularly dangerous because it can lead to arbitrary code execution or application crashes in security-sensitive contexts. The fix adds proper bounds validation before each copy operation, eliminat

critical

Critical Buffer Overflow Fixed: When "Safe" Functions Aren't Safe

A critical vulnerability in DeepSkyStackerKernel's StackWalker.cpp was silently replacing bounds-checking string functions with their unsafe counterparts via preprocessor macros, exposing the entire codebase to buffer overflow attacks. This fix removes the dangerous macro definitions that discarded buffer size arguments, restoring the intended memory safety protections across all call sites. Understanding how this subtle macro trick works is essential for any C/C++ developer working with string