What is a Unicode path collision path traversal?

It is a path traversal attack where two file paths that look different as raw strings resolve to the same location after Unicode normalization, allowing an attacker to bypass security checks that compare the raw strings.

How do you prevent Unicode path collision path traversal in Node.js?

Always normalize paths to a consistent Unicode form (e.g., NFC via `str.normalize('NFC')`) before comparing them in any security-sensitive check, and additionally resolve absolute paths with `path.resolve()` before validation.

What CWE is Unicode path collision path traversal?

CWE-22 (Improper Limitation of a Pathname to a Restricted Directory — "Path Traversal").

Is stripping `..` sequences enough to prevent path traversal in node-tar?

No. Stripping `..` sequences does not protect against Unicode normalization attacks. Paths must also be Unicode-normalized before any security comparison, because two Unicode representations of the same path can bypass string-equality checks even without `..` sequences.

Can static analysis detect Unicode path collision path traversal?

Yes. Static analysis tools like Semgrep can flag hardlink or path-comparison logic that lacks a normalization step before string comparison, and tools like CodeQL can trace tainted archive entry paths to unsanitized comparison sinks.

Node-tar Path Traversal: How Unicode

Introduction

The node-tar package, a widely-used Node.js library for handling tar archives, recently patched a critical security vulnerability that could allow attackers to write files anywhere on your system. With millions of downloads per week and usage across countless npm packages, this vulnerability (CVE-2026-24842) represents a significant supply chain security concern.

If your application extracts tar archives—whether from user uploads, CI/CD pipelines, or third-party sources—you need to understand this vulnerability and ensure you're running a patched version.

The Vulnerability Explained

What is Path Traversal?

Path traversal vulnerabilities occur when an application fails to properly validate file paths, allowing attackers to access or write files outside the intended directory. In tar archives, this typically involves entries with paths like ../../../../etc/passwd that "escape" the extraction directory.

The node-tar Security Mechanism

Node-tar implements security checks to prevent path traversal attacks, including special validation for hardlinks—filesystem entries that point to existing files. The library verifies that hardlink targets don't reference paths outside the extraction directory.

The Unicode Collision Exploit

This vulnerability exploited a subtle race condition involving Unicode path normalization. Here's how the attack worked:

Step 1: Unicode Equivalence
Different Unicode sequences can represent visually identical characters. For example:
- café (using U+00E9: é)
- café (using U+0065 U+0301: e + combining acute accent)

These look identical but have different binary representations.

Step 2: The Race Condition
1. Attacker creates a malicious tar archive with a hardlink entry
2. The hardlink path uses Unicode characters that normalize differently at different stages
3. During the security check, the path appears safe (within bounds)
4. During actual file creation, the normalized path points outside the extraction directory
5. The attacker successfully writes to arbitrary locations

Step 3: Arbitrary File Overwrite
By carefully crafting the Unicode sequences, attackers could:
- Overwrite configuration files
- Replace executable scripts
- Modify package.json or other critical files
- Potentially achieve remote code execution

Real-World Attack Scenario

Imagine a CI/CD pipeline that extracts dependency archives:

// Vulnerable code pattern
const tar = require('tar');

app.post('/upload-package', async (req, res) => {
  // Extract uploaded tar file
  await tar.extract({
    file: req.files.package.path,
    cwd: '/app/packages/'
  });

  res.send('Package installed successfully');
});

An attacker uploads a malicious tar archive containing:

packages/my-package/index.js          (legitimate file)
packages/my-package/../../.bashrc     (hardlink with Unicode collision)

The hardlink security check passes due to Unicode normalization differences, but the actual file creation overwrites /app/.bashrc, potentially executing malicious code on the next shell invocation.

The Fix

What Changed

The fix addresses the Unicode path collision by implementing consistent Unicode normalization across all path validation checks. Specifically:

Consistent Normalization: All paths are normalized to a canonical form (NFC - Normalization Form Canonical Composition) before any security checks
Enhanced Hardlink Validation: Hardlink targets undergo the same normalization process as the paths being checked
Race Condition Prevention: Security checks and file operations now use the same normalized path representation

Security Improvement

The patch ensures that:
- Unicode equivalence cannot bypass security boundaries
- Hardlink validation occurs on canonicalized paths
- No discrepancy exists between check-time and use-time paths (TOCTOU protection)

Before and After Comparison

Before (Vulnerable):

// Simplified representation of vulnerable logic
function isValidHardlink(linkPath, targetPath, extractDir) {
  // Security check on raw paths
  const resolvedTarget = path.resolve(extractDir, targetPath);
  const resolvedExtract = path.resolve(extractDir);

  // Vulnerable: doesn't account for Unicode normalization differences
  return resolvedTarget.startsWith(resolvedExtract);
}

After (Patched):

// Simplified representation of patched logic
function isValidHardlink(linkPath, targetPath, extractDir) {
  // Normalize all paths consistently
  const normalizedTarget = normalizeUnicode(targetPath);
  const normalizedLink = normalizeUnicode(linkPath);

  const resolvedTarget = path.resolve(extractDir, normalizedTarget);
  const resolvedExtract = path.resolve(extractDir);

  // Check occurs on normalized paths
  return resolvedTarget.startsWith(resolvedExtract);
}

function normalizeUnicode(str) {
  // Convert to canonical form
  return str.normalize('NFC');
}

Prevention & Best Practices

Immediate Actions

Update node-tar immediately: Run npm audit fix or manually update to the latest patched version
Check your dependencies: Use npm ls tar to identify all packages depending on node-tar
Review package-lock.json: Ensure transitive dependencies are also updated

Long-Term Security Practices

1. Input Validation and Sanitization

Always validate and sanitize file paths, especially when dealing with archives:

const path = require('path');

function sanitizePath(filePath, baseDir) {
  // Normalize Unicode
  const normalized = filePath.normalize('NFC');

  // Resolve to absolute path
  const resolved = path.resolve(baseDir, normalized);

  // Ensure it's within baseDir
  if (!resolved.startsWith(path.resolve(baseDir) + path.sep)) {
    throw new Error('Path traversal attempt detected');
  }

  return resolved;
}

2. Principle of Least Privilege

Extract archives in isolated, restricted directories:

const tar = require('tar');
const fs = require('fs').promises;

async function safeExtract(archivePath) {
  // Create temporary, isolated extraction directory
  const tempDir = await fs.mkdtemp('/tmp/extract-');

  try {
    await tar.extract({
      file: archivePath,
      cwd: tempDir,
      strict: true,  // Enable strict mode
      filter: (path) => {
        // Additional path validation
        return !path.includes('..');
      }
    });

    // Process extracted files
    // ...
  } finally {
    // Cleanup
    await fs.rm(tempDir, { recursive: true, force: true });
  }
}

3. Security Scanning and Monitoring

npm audit: Regularly run npm audit in your CI/CD pipeline
Snyk or Dependabot: Use automated dependency scanning tools
OWASP Dependency-Check: Integrate into your build process
Runtime monitoring: Log and alert on suspicious file operations

4. Defense in Depth

Implement multiple layers of security:

const tar = require('tar');
const path = require('path');

async function secureExtract(archivePath, targetDir) {
  const allowedExtensions = ['.js', '.json', '.md'];
  const maxFileSize = 10 * 1024 * 1024; // 10MB

  await tar.extract({
    file: archivePath,
    cwd: targetDir,
    strict: true,
    filter: (filePath, entry) => {
      // Unicode normalization
      const normalized = filePath.normalize('NFC');

      // Path traversal check
      if (normalized.includes('..')) return false;

      // File extension whitelist
      const ext = path.extname(normalized);
      if (!allowedExtensions.includes(ext)) return false;

      // Size limit
      if (entry.size > maxFileSize) return false;

      return true;
    }
  });
}

Relevant Security Standards

CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')
CWE-367: Time-of-check Time-of-use (TOCTOU) Race Condition
CWE-176: Improper Handling of Unicode Encoding
OWASP A05:2021: Security Misconfiguration
OWASP A06:2021: Vulnerable and Outdated Components

Detection Tools

ESLint security plugins: eslint-plugin-security
Static analysis: SonarQube, Semgrep
Dynamic testing: OWASP ZAP for web applications
Dependency scanning: npm audit, Snyk, GitHub Dependabot

Conclusion

The node-tar Unicode path collision vulnerability demonstrates how subtle implementation details can create serious security risks. Even well-intentioned security checks can be bypassed through creative exploitation of edge cases like Unicode normalization.

Key Takeaways:

Update immediately: Patch node-tar to the latest version in all projects
Defense in depth: Don't rely on a single security mechanism
Validate consistently: Apply the same normalization and validation everywhere
Monitor dependencies: Use automated tools to catch vulnerabilities early
Assume breach: Design systems that limit damage even when vulnerabilities exist

Security is not a one-time fix but an ongoing process. By understanding vulnerabilities like CVE-2026-24842, implementing robust validation, and staying current with patches, you can significantly reduce your application's attack surface.

Remember: every dependency is a trust relationship. Regularly audit your dependencies, understand their security posture, and have a plan to respond quickly when vulnerabilities are disclosed.

Stay secure, and happy coding! 🔒

Resources:
- Node-tar GitHub Repository
- CVE-2026-24842 Details
- OWASP Path Traversal Guide
- Unicode Security Considerations

cwe	CWE-22
fix	Normalize all paths to Unicode NFC form before performing hardlink security comparisons
risk	Arbitrary file write outside extraction directory, leading to RCE or privilege escalation
language	JavaScript (Node.js)
root cause	Hardlink security checks compared raw Unicode path strings without normalization, allowing equivalent paths to appear distinct
vulnerability	Path Traversal via Unicode Hardlink Bypass

Node-tar Path Traversal: How Unicode Collisions Bypass Security Checks

Answer Summary

Vulnerability at a Glance

Introduction

The Vulnerability Explained

What is Path Traversal?

The node-tar Security Mechanism

The Unicode Collision Exploit

Real-World Attack Scenario

The Fix

What Changed

Security Improvement

Before and After Comparison

Prevention & Best Practices

Immediate Actions

Long-Term Security Practices

1. Input Validation and Sanitization

2. Principle of Least Privilege

3. Security Scanning and Monitoring

4. Defense in Depth

Relevant Security Standards

Detection Tools

Conclusion

Frequently Asked Questions

What is a Unicode path collision path traversal?

How do you prevent Unicode path collision path traversal in Node.js?

What CWE is Unicode path collision path traversal?

Is stripping `..` sequences enough to prevent path traversal in node-tar?

Can static analysis detect Unicode path collision path traversal?

View the Security Fix

Related Articles

How missing Dependabot cooldown happens in GitHub Actions and how to fix it

How Server-Sent Events Injection via Unsanitized Newlines happens in Node.js h3 and how to fix it

How Memory Exhaustion via Large Comma-Separated Selector Lists happens in Python Soup Sieve and how to fix it

How prototype pollution via `proto` key happens in Node.js defu and how to fix it

How buffer overflow in memcpy() happens in Node.js N-API bindings and how to fix it

How memory exhaustion via large comma-separated selector lists happens in Python soupsieve and how to fix it

Node-tar Path Traversal: How Unicode Collisions Bypass Security Checks

Answer Summary

Vulnerability at a Glance

Introduction

The Vulnerability Explained

What is Path Traversal?

The node-tar Security Mechanism

The Unicode Collision Exploit

Real-World Attack Scenario

The Fix

What Changed

Security Improvement

Before and After Comparison

Prevention & Best Practices

Immediate Actions

Long-Term Security Practices

1. Input Validation and Sanitization

2. Principle of Least Privilege

3. Security Scanning and Monitoring

4. Defense in Depth

Relevant Security Standards

Detection Tools

Conclusion

Frequently Asked Questions

What is a Unicode path collision path traversal?

How do you prevent Unicode path collision path traversal in Node.js?

What CWE is Unicode path collision path traversal?

Is stripping `..` sequences enough to prevent path traversal in node-tar?

Can static analysis detect Unicode path collision path traversal?

View the Security Fix

Related Articles

How missing Dependabot cooldown happens in GitHub Actions and how to fix it

How Server-Sent Events Injection via Unsanitized Newlines happens in Node.js h3 and how to fix it

How Memory Exhaustion via Large Comma-Separated Selector Lists happens in Python Soup Sieve and how to fix it

How prototype pollution via `__proto__` key happens in Node.js defu and how to fix it

How buffer overflow in memcpy() happens in Node.js N-API bindings and how to fix it

How memory exhaustion via large comma-separated selector lists happens in Python soupsieve and how to fix it

How prototype pollution via `proto` key happens in Node.js defu and how to fix it