Back to Blog
critical SEVERITY6 min read

Node-tar Path Traversal: How Unicode Collisions Bypass Security Checks

A medium-severity vulnerability in node-tar (CVE-2026-24842) allowed attackers to create arbitrary files outside intended directories by exploiting Unicode path collisions in hardlink security checks. This race condition could enable malicious tar archives to overwrite critical system files, potentially leading to remote code execution or privilege escalation.

O
By orbisai0security
March 6, 2026

Introduction

The node-tar package, a widely-used Node.js library for handling tar archives, recently patched a critical security vulnerability that could allow attackers to write files anywhere on your system. With millions of downloads per week and usage across countless npm packages, this vulnerability (CVE-2026-24842) represents a significant supply chain security concern.

If your application extracts tar archives—whether from user uploads, CI/CD pipelines, or third-party sources—you need to understand this vulnerability and ensure you're running a patched version.

The Vulnerability Explained

What is Path Traversal?

Path traversal vulnerabilities occur when an application fails to properly validate file paths, allowing attackers to access or write files outside the intended directory. In tar archives, this typically involves entries with paths like ../../../../etc/passwd that "escape" the extraction directory.

The node-tar Security Mechanism

Node-tar implements security checks to prevent path traversal attacks, including special validation for hardlinks—filesystem entries that point to existing files. The library verifies that hardlink targets don't reference paths outside the extraction directory.

The Unicode Collision Exploit

This vulnerability exploited a subtle race condition involving Unicode path normalization. Here's how the attack worked:

Step 1: Unicode Equivalence
Different Unicode sequences can represent visually identical characters. For example:
- café (using U+00E9: é)
- café (using U+0065 U+0301: e + combining acute accent)

These look identical but have different binary representations.

Step 2: The Race Condition
1. Attacker creates a malicious tar archive with a hardlink entry
2. The hardlink path uses Unicode characters that normalize differently at different stages
3. During the security check, the path appears safe (within bounds)
4. During actual file creation, the normalized path points outside the extraction directory
5. The attacker successfully writes to arbitrary locations

Step 3: Arbitrary File Overwrite
By carefully crafting the Unicode sequences, attackers could:
- Overwrite configuration files
- Replace executable scripts
- Modify package.json or other critical files
- Potentially achieve remote code execution

Real-World Attack Scenario

Imagine a CI/CD pipeline that extracts dependency archives:

// Vulnerable code pattern
const tar = require('tar');

app.post('/upload-package', async (req, res) => {
  // Extract uploaded tar file
  await tar.extract({
    file: req.files.package.path,
    cwd: '/app/packages/'
  });

  res.send('Package installed successfully');
});

An attacker uploads a malicious tar archive containing:

packages/my-package/index.js          (legitimate file)
packages/my-package/../../.bashrc     (hardlink with Unicode collision)

The hardlink security check passes due to Unicode normalization differences, but the actual file creation overwrites /app/.bashrc, potentially executing malicious code on the next shell invocation.

The Fix

What Changed

The fix addresses the Unicode path collision by implementing consistent Unicode normalization across all path validation checks. Specifically:

  1. Consistent Normalization: All paths are normalized to a canonical form (NFC - Normalization Form Canonical Composition) before any security checks
  2. Enhanced Hardlink Validation: Hardlink targets undergo the same normalization process as the paths being checked
  3. Race Condition Prevention: Security checks and file operations now use the same normalized path representation

Security Improvement

The patch ensures that:
- Unicode equivalence cannot bypass security boundaries
- Hardlink validation occurs on canonicalized paths
- No discrepancy exists between check-time and use-time paths (TOCTOU protection)

Before and After Comparison

Before (Vulnerable):

// Simplified representation of vulnerable logic
function isValidHardlink(linkPath, targetPath, extractDir) {
  // Security check on raw paths
  const resolvedTarget = path.resolve(extractDir, targetPath);
  const resolvedExtract = path.resolve(extractDir);

  // Vulnerable: doesn't account for Unicode normalization differences
  return resolvedTarget.startsWith(resolvedExtract);
}

After (Patched):

// Simplified representation of patched logic
function isValidHardlink(linkPath, targetPath, extractDir) {
  // Normalize all paths consistently
  const normalizedTarget = normalizeUnicode(targetPath);
  const normalizedLink = normalizeUnicode(linkPath);

  const resolvedTarget = path.resolve(extractDir, normalizedTarget);
  const resolvedExtract = path.resolve(extractDir);

  // Check occurs on normalized paths
  return resolvedTarget.startsWith(resolvedExtract);
}

function normalizeUnicode(str) {
  // Convert to canonical form
  return str.normalize('NFC');
}

Prevention & Best Practices

Immediate Actions

  1. Update node-tar immediately: Run npm audit fix or manually update to the latest patched version
  2. Check your dependencies: Use npm ls tar to identify all packages depending on node-tar
  3. Review package-lock.json: Ensure transitive dependencies are also updated

Long-Term Security Practices

1. Input Validation and Sanitization

Always validate and sanitize file paths, especially when dealing with archives:

const path = require('path');

function sanitizePath(filePath, baseDir) {
  // Normalize Unicode
  const normalized = filePath.normalize('NFC');

  // Resolve to absolute path
  const resolved = path.resolve(baseDir, normalized);

  // Ensure it's within baseDir
  if (!resolved.startsWith(path.resolve(baseDir) + path.sep)) {
    throw new Error('Path traversal attempt detected');
  }

  return resolved;
}

2. Principle of Least Privilege

Extract archives in isolated, restricted directories:

const tar = require('tar');
const fs = require('fs').promises;

async function safeExtract(archivePath) {
  // Create temporary, isolated extraction directory
  const tempDir = await fs.mkdtemp('/tmp/extract-');

  try {
    await tar.extract({
      file: archivePath,
      cwd: tempDir,
      strict: true,  // Enable strict mode
      filter: (path) => {
        // Additional path validation
        return !path.includes('..');
      }
    });

    // Process extracted files
    // ...
  } finally {
    // Cleanup
    await fs.rm(tempDir, { recursive: true, force: true });
  }
}

3. Security Scanning and Monitoring

  • npm audit: Regularly run npm audit in your CI/CD pipeline
  • Snyk or Dependabot: Use automated dependency scanning tools
  • OWASP Dependency-Check: Integrate into your build process
  • Runtime monitoring: Log and alert on suspicious file operations

4. Defense in Depth

Implement multiple layers of security:

const tar = require('tar');
const path = require('path');

async function secureExtract(archivePath, targetDir) {
  const allowedExtensions = ['.js', '.json', '.md'];
  const maxFileSize = 10 * 1024 * 1024; // 10MB

  await tar.extract({
    file: archivePath,
    cwd: targetDir,
    strict: true,
    filter: (filePath, entry) => {
      // Unicode normalization
      const normalized = filePath.normalize('NFC');

      // Path traversal check
      if (normalized.includes('..')) return false;

      // File extension whitelist
      const ext = path.extname(normalized);
      if (!allowedExtensions.includes(ext)) return false;

      // Size limit
      if (entry.size > maxFileSize) return false;

      return true;
    }
  });
}

Relevant Security Standards

  • CWE-22: Improper Limitation of a Pathname to a Restricted Directory ('Path Traversal')
  • CWE-367: Time-of-check Time-of-use (TOCTOU) Race Condition
  • CWE-176: Improper Handling of Unicode Encoding
  • OWASP A05:2021: Security Misconfiguration
  • OWASP A06:2021: Vulnerable and Outdated Components

Detection Tools

  • ESLint security plugins: eslint-plugin-security
  • Static analysis: SonarQube, Semgrep
  • Dynamic testing: OWASP ZAP for web applications
  • Dependency scanning: npm audit, Snyk, GitHub Dependabot

Conclusion

The node-tar Unicode path collision vulnerability demonstrates how subtle implementation details can create serious security risks. Even well-intentioned security checks can be bypassed through creative exploitation of edge cases like Unicode normalization.

Key Takeaways:

  1. Update immediately: Patch node-tar to the latest version in all projects
  2. Defense in depth: Don't rely on a single security mechanism
  3. Validate consistently: Apply the same normalization and validation everywhere
  4. Monitor dependencies: Use automated tools to catch vulnerabilities early
  5. Assume breach: Design systems that limit damage even when vulnerabilities exist

Security is not a one-time fix but an ongoing process. By understanding vulnerabilities like CVE-2026-24842, implementing robust validation, and staying current with patches, you can significantly reduce your application's attack surface.

Remember: every dependency is a trust relationship. Regularly audit your dependencies, understand their security posture, and have a plan to respond quickly when vulnerabilities are disclosed.

Stay secure, and happy coding! 🔒


Resources:
- Node-tar GitHub Repository
- CVE-2026-24842 Details
- OWASP Path Traversal Guide
- Unicode Security Considerations

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #66

Related Articles

medium

Mass Assignment Vulnerability: Why Your Rails Models Need attr_accessible

A medium-severity mass assignment vulnerability was identified in a Ruby on Rails model that lacked proper attribute whitelisting via `attr_accessible` or strong parameters. Without this protection, attackers can manipulate any model attribute through crafted HTTP requests, potentially escalating privileges or corrupting data. The fix enforces explicit attribute allowlisting, closing the door on unauthorized mass assignment exploitation.

critical

Shell Injection via os.system(): How a Single Line of Code Can Compromise Your System

A critical OS command injection vulnerability (CWE-78) was discovered and patched in `voice.py`, where user-controlled input was interpolated directly into a shell command string passed to `os.system()`. An attacker who could influence the `device` variable — through a config file, environment variable, or any external input — could execute arbitrary system commands with the full privileges of the running process. The fix replaces the dangerous `os.system()` calls with Python's `subprocess.run()

critical

Command Injection via os.system() in DeepSpeed's Data Analyzer: A Critical Fix

A critical command injection vulnerability was discovered in DeepSpeed's `data_analyzer.py`, where an `os.system()` call directly interpolated an unsanitized file path variable into a shell command string. An attacker who could influence dataset configuration or file paths could execute arbitrary shell commands on the host machine. The fix replaces the dangerous shell invocation with safe, Python-native file operations that never touch a shell interpreter.

high

CVE-2026-40073: How a BODY_SIZE_LIMIT Bypass in @sveltejs/adapter-node Put Your App at Risk

CVE-2026-40073 is a high-severity vulnerability in `@sveltejs/adapter-node` that allows attackers to bypass the `BODY_SIZE_LIMIT` configuration, potentially enabling denial-of-service attacks and resource exhaustion against SvelteKit applications. The vulnerability was silently present in versions prior to `@sveltejs/kit` 2.57.1, and has now been patched by upgrading the dependency across all affected project examples. If your application relies on body size limits to protect against oversized p

medium

From eval() to ast.literal_eval(): Closing a Code Injection Door in Slack Data Processing

A medium-severity vulnerability was discovered in a Slack data processing component where the use of Python's built-in `eval()` function to parse error message dictionaries could allow an attacker to inject and execute arbitrary code. The fix replaces `eval()` with the safer `ast.literal_eval()`, which safely evaluates only Python literals without executing arbitrary expressions. This change eliminates a critical attack surface that could have been exploited through crafted error messages return

critical

Critical Buffer Overflow in ELF Parser: How a Missing Bounds Check Almost Became a Heap Exploit

A critical out-of-bounds memory vulnerability was discovered and patched in `utils/symbol-rawelf.c`, where two separate `memcpy` calls lacked proper bounds validation when processing ELF binary files. Without these checks, a maliciously crafted ELF file could trigger an out-of-bounds read or heap overflow, potentially leading to remote code execution or memory corruption. This post breaks down how the vulnerability works, how it was fixed, and what every C developer should know about safe memory