What is heap buffer overflow in glob pattern parsing?

A heap buffer overflow occurs when glob pattern parsing code copies more data into a heap-allocated buffer than it can hold, corrupting adjacent memory. In glob.c, this happened when processing alternation patterns with long prefixes without checking if they fit in the destination buffer.

How do you prevent heap buffer overflow in C?

Always use bounds-checked string operations like strncpy() instead of strcpy(), validate input sizes before copying, allocate buffers dynamically based on actual input size, and use tools like AddressSanitizer during development to detect buffer overruns.

What CWE is heap buffer overflow?

Heap buffer overflow is classified as CWE-120 (Buffer Copy without Checking Size of Input) and can also relate to CWE-122 (Heap-based Buffer Overflow). These vulnerabilities allow attackers to write beyond buffer boundaries, corrupting memory.

Is using strncpy() enough to prevent heap buffer overflow?

strncpy() is safer than strcpy() but isn't foolproof. You must also ensure the size parameter is correct, null-terminate the destination buffer explicitly if needed, and validate that source data doesn't exceed expected bounds before copying.

Can static analysis detect heap buffer overflow?

Yes, static analysis tools can detect many heap buffer overflows by identifying unsafe function calls (strcpy, memcpy without bounds checks), tracking buffer sizes, and analyzing data flow. Tools like Semgrep, CodeQL, and Clang Static Analyzer excel at finding these patterns.

Heap Buffer Overflow in glob.c: How a Crafted Pattern Can Crash Your App

Severity: Critical | CWE: CWE-120 | File: glob/glob.c:401 | Fixed in: PR — "fix: at glob/glob in glob.c"

Introduction

If you've ever used shell-style wildcard matching in a C-based application — think *.txt, {foo,bar}/**, or any brace-expansion pattern — there's a good chance you've relied on a glob() implementation under the hood. These functions are foundational: they power file discovery, path expansion, build systems, and countless CLI tools.

But foundational code is not immune to foundational mistakes. This week, a critical heap buffer overflow was identified and patched in glob/glob.c — a vulnerability that could allow an attacker to corrupt process memory, crash an application, or potentially achieve arbitrary code execution simply by supplying a carefully crafted glob pattern.

This post breaks down exactly what went wrong, how it could be exploited, and what developers can learn to prevent the same class of bug in their own code.

The Vulnerability Explained

What Is a Heap Buffer Overflow?

A heap buffer overflow (CWE-120, "Buffer Copy without Checking Size of Input") occurs when a program writes more data into a heap-allocated buffer than the buffer was sized to hold. Unlike stack overflows, heap overflows don't immediately smash return addresses — but they can corrupt adjacent heap metadata and objects, leading to crashes, data corruption, or, in sophisticated exploitation scenarios, arbitrary code execution.

What Happened in glob.c?

The vulnerability exists in the alternation-pattern parsing logic — the code responsible for handling brace expansions like {foo,bar,baz}.

Here's the sequence of events that creates the bug:

Step 1 — Undersized Allocation (Line 401)

// Vulnerable allocation
char *onealt = malloc(strlen(pattern) - 1);

The buffer onealt is allocated as strlen(pattern) - 1 bytes. At first glance this might seem conservative, but it's actually an upper-bound assumption that doesn't account for how the pattern is actually decomposed during alternation expansion.

Step 2 — Unchecked memcpy (Line 414)

// Vulnerable copy — no bounds check
memcpy(onealt, pattern, begin - pattern);

This copies begin - pattern bytes — the prefix of the pattern before the opening { — into onealt. The problem: if the prefix is longer than strlen(pattern) - 1, this memcpy writes past the end of the allocated buffer.

Consider a pattern like:

averylongprefixstring{a,b}

The prefix averylongprefixstring has 20 characters. The full pattern has 26 characters, so onealt is allocated as 25 bytes. The copy of 20 bytes fits here — but as the prefix grows relative to the suffix, the math breaks down and overflow becomes possible with crafted input.

Step 3 — Compounding Overflow (Lines 472-473)

// Second vulnerable region — no bounds validation
memcpy(alt_start, p, next - p);
memcpy(alt_start + (next - p), rest, rest_len);

These two subsequent copies write additional data (next - p bytes, then rest_len bytes) into alt_start, a pointer derived from onealt. With no bounds validation whatsoever, a crafted pattern can trigger a second heap overflow into the same buffer — compounding the corruption.

A Concrete Attack Scenario

Imagine a web application that accepts user-supplied file glob patterns for a search feature:

GET /api/files/search?pattern={AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA,b}/config

If the server passes this pattern directly to a glob()-based file search without sanitization, the attacker controls both the prefix length and the alternation content. A sufficiently crafted pattern triggers the overflow in onealt, corrupting adjacent heap objects. Depending on heap layout, this could:

Crash the process (denial of service)
Corrupt adjacent data structures (data integrity violation)
Overwrite heap metadata to manipulate future allocations (potential code execution path)

Even in environments with modern mitigations (ASLR, heap hardening), a reliable crash on user-supplied input is a critical finding — it's a denial-of-service vector at minimum and an exploitation primitive at best.

The Fix

What Changed?

The patch addresses both overflow sites by introducing proper bounds validation before each memory copy. The core fix follows a straightforward principle: calculate the required size before allocating, and validate copy lengths before copying.

Before (Vulnerable Pattern)

/* Allocate based on a rough estimate without verifying actual copy sizes */
char *onealt = malloc(strlen(pattern) - 1);

/* Copy prefix with no bounds check */
memcpy(onealt, pattern, begin - pattern);

/* ... later ... */

/* Copy alternation body and rest with no bounds check */
memcpy(alt_start, p, next - p);
memcpy(alt_start + (next - p), rest, rest_len);

After (Fixed Pattern)

/* Calculate the actual required size before allocating */
size_t prefix_len = begin - pattern;
size_t alt_len    = next - p;
size_t required   = prefix_len + alt_len + rest_len + 1; /* +1 for null terminator */

char *onealt = malloc(required);
if (onealt == NULL) {
    /* Handle allocation failure gracefully */
    return GLOB_NOSPACE;
}

/* Bounds-safe copy of prefix */
memcpy(onealt, pattern, prefix_len);

/* ... later ... */

/* Bounds-safe copy of alternation body and rest */
memcpy(alt_start, p, alt_len);
memcpy(alt_start + alt_len, rest, rest_len);

Why This Works

Accurate sizing: The allocation now reflects the actual data that will be written, not a heuristic estimate. required is computed from the same lengths used in the memcpy calls, so the buffer is guaranteed to be large enough.
Null-terminator accounting: The +1 ensures the resulting string is always properly null-terminated, preventing a secondary read-past-end bug.
Allocation failure handling: The NULL check on malloc prevents a null-pointer dereference if memory is exhausted — a defense-in-depth improvement.
Consistent length variables: By computing prefix_len, alt_len, and rest_len once and reusing them in both the allocation and the copies, the fix eliminates the possibility of the two sites going out of sync in future refactors.

Prevention & Best Practices

This vulnerability is a textbook example of CWE-120 — and it's far from rare. Here's how to systematically prevent this class of bug:

1. Always Derive Allocation Size from Copy Size (Not Vice Versa)

The root cause here was allocating a buffer based on a guess (strlen(pattern) - 1) and then copying data whose length wasn't validated against that guess. The correct pattern is:

/* RIGHT: Compute what you need to copy, then allocate exactly that */
size_t needed = compute_exact_needed_size(input);
char *buf = malloc(needed);
memcpy(buf, input, needed);

/* WRONG: Allocate a guess, then copy without checking */
char *buf = malloc(strlen(input) - MAGIC_NUMBER);
memcpy(buf, input, some_other_length);  // May exceed allocation!

2. Use Safe Memory Functions Where Available

On platforms that support them, prefer bounds-checking variants:

/* Prefer over memcpy when size is uncertain */
memcpy_s(dest, dest_size, src, count);  /* C11 Annex K */

/* Or use strlcpy/strlcat for string operations */
strlcpy(dest, src, dest_size);

3. Enable Compiler and Runtime Mitigations

These won't prevent the bug, but they dramatically reduce exploitability and aid detection during testing:

# AddressSanitizer — catches heap overflows at runtime
gcc -fsanitize=address -g glob.c

# Stack/heap hardening flags
gcc -D_FORTIFY_SOURCE=2 -fstack-protector-strong glob.c

# Enable all warnings
gcc -Wall -Wextra -Wformat-security glob.c

4. Fuzz Your Parsers

Pattern parsers are a prime target for fuzzing because they process structured-but-variable input. Tools like libFuzzer or AFL++ are highly effective at discovering exactly this kind of length-mismatch bug:

# Example: fuzz a glob function with libFuzzer
clang -fsanitize=fuzzer,address glob_fuzz.c glob.c -o glob_fuzz
./glob_fuzz corpus/

5. Apply the Principle of Input Distrust

Never pass user-supplied strings directly to glob() or any pattern-matching function without:
- Length limiting the input
- Allowlist-validating permitted characters
- Sandboxing the process if glob results are acted upon (e.g., file access)

6. Reference Security Standards

Standard	Reference
CWE-120	Buffer Copy without Checking Size of Input
CWE-122	Heap-based Buffer Overflow
OWASP	Buffer Overflow
SEI CERT C	MEM35-C: Allocate sufficient memory for an object
SEI CERT C	ARR38-C: Guarantee that library functions do not form invalid pointers

Conclusion

The heap buffer overflow in glob.c is a reminder that even mature, widely-used utility code can harbor critical memory safety bugs. The root cause — allocating a buffer based on an imprecise estimate and then copying data without validating against that estimate — is a pattern that has caused vulnerabilities in projects ranging from small utilities to major operating system kernels.

The fix is elegant in its simplicity: compute the exact size you need before you allocate, and validate every copy against that size. Two lines of careful arithmetic eliminate a critical vulnerability that could otherwise lead to denial of service or worse.

Key takeaways for developers:

✅ Always derive allocation size from the actual data lengths you intend to copy
✅ Treat every malloc + memcpy pair as a potential vulnerability until proven safe
✅ Use AddressSanitizer during development and CI — it would have caught this immediately
✅ Fuzz any code that parses patterns, paths, or user-controlled structured input
✅ Never trust that an upstream estimate of "how big this needs to be" is correct for your specific code path

Security is built one careful allocation at a time. Stay safe, and keep fuzzing.

This vulnerability was identified and fixed by OrbisAI Security. Automated scanning, triage, and patch generation were performed by the OrbisAI multi-agent security pipeline.

cwe	CWE-120 (Buffer Copy without Checking Size of Input)
fix	Replace unsafe strcpy()/memcpy() with bounds-checked operations and add size validation
risk	Memory corruption leading to crashes or arbitrary code execution
language	C
root cause	Missing bounds validation when copying pattern prefixes into fixed-size heap buffer
vulnerability	Heap buffer overflow in glob pattern parsing

Heap Buffer Overflow in glob.c: How a Crafted Pattern Can Crash Your App

Answer Summary

Vulnerability at a Glance

Heap Buffer Overflow in glob.c: How a Crafted Pattern Can Crash Your App

Introduction

The Vulnerability Explained

What Is a Heap Buffer Overflow?

What Happened in glob.c?

Step 1 — Undersized Allocation (Line 401)

Step 2 — Unchecked memcpy (Line 414)

Step 3 — Compounding Overflow (Lines 472-473)

A Concrete Attack Scenario

The Fix

What Changed?

Before (Vulnerable Pattern)

After (Fixed Pattern)

Why This Works

Prevention & Best Practices

1. Always Derive Allocation Size from Copy Size (Not Vice Versa)

2. Use Safe Memory Functions Where Available

3. Enable Compiler and Runtime Mitigations

4. Fuzz Your Parsers

5. Apply the Principle of Input Distrust

6. Reference Security Standards

Conclusion

Frequently Asked Questions

What is heap buffer overflow in glob pattern parsing?

How do you prevent heap buffer overflow in C?

What CWE is heap buffer overflow?

Is using strncpy() enough to prevent heap buffer overflow?

Can static analysis detect heap buffer overflow?

View the Security Fix

Related Articles

How missing Dependabot cooldown happens in GitHub Actions and how to fix it

How Server-Sent Events Injection via Unsanitized Newlines happens in Node.js h3 and how to fix it

How Memory Exhaustion via Large Comma-Separated Selector Lists happens in Python Soup Sieve and how to fix it

How prototype pollution via `proto` key happens in Node.js defu and how to fix it

How buffer overflow in memcpy() happens in Node.js N-API bindings and how to fix it

How memory exhaustion via large comma-separated selector lists happens in Python soupsieve and how to fix it

Heap Buffer Overflow in glob.c: How a Crafted Pattern Can Crash Your App

Answer Summary

Vulnerability at a Glance

Heap Buffer Overflow in glob.c: How a Crafted Pattern Can Crash Your App

Introduction

The Vulnerability Explained

What Is a Heap Buffer Overflow?

What Happened in glob.c?

Step 1 — Undersized Allocation (Line 401)

Step 2 — Unchecked memcpy (Line 414)

Step 3 — Compounding Overflow (Lines 472-473)

A Concrete Attack Scenario

The Fix

What Changed?

Before (Vulnerable Pattern)

After (Fixed Pattern)

Why This Works

Prevention & Best Practices

1. Always Derive Allocation Size from Copy Size (Not Vice Versa)

2. Use Safe Memory Functions Where Available

3. Enable Compiler and Runtime Mitigations

4. Fuzz Your Parsers

5. Apply the Principle of Input Distrust

6. Reference Security Standards

Conclusion

Frequently Asked Questions

What is heap buffer overflow in glob pattern parsing?

How do you prevent heap buffer overflow in C?

What CWE is heap buffer overflow?

Is using strncpy() enough to prevent heap buffer overflow?

Can static analysis detect heap buffer overflow?

View the Security Fix

Related Articles

How missing Dependabot cooldown happens in GitHub Actions and how to fix it

How Server-Sent Events Injection via Unsanitized Newlines happens in Node.js h3 and how to fix it

How Memory Exhaustion via Large Comma-Separated Selector Lists happens in Python Soup Sieve and how to fix it

How prototype pollution via `__proto__` key happens in Node.js defu and how to fix it

How buffer overflow in memcpy() happens in Node.js N-API bindings and how to fix it

How memory exhaustion via large comma-separated selector lists happens in Python soupsieve and how to fix it

How prototype pollution via `proto` key happens in Node.js defu and how to fix it