Heap Buffer Overflow in glob.c: How a Crafted Pattern Can Crash Your App
Severity: Critical | CWE: CWE-120 | File: glob/glob.c:401 | Fixed in: PR — "fix: at glob/glob in glob.c"
Introduction
If you've ever used shell-style wildcard matching in a C-based application — think *.txt, {foo,bar}/**, or any brace-expansion pattern — there's a good chance you've relied on a glob() implementation under the hood. These functions are foundational: they power file discovery, path expansion, build systems, and countless CLI tools.
But foundational code is not immune to foundational mistakes. This week, a critical heap buffer overflow was identified and patched in glob/glob.c — a vulnerability that could allow an attacker to corrupt process memory, crash an application, or potentially achieve arbitrary code execution simply by supplying a carefully crafted glob pattern.
This post breaks down exactly what went wrong, how it could be exploited, and what developers can learn to prevent the same class of bug in their own code.
The Vulnerability Explained
What Is a Heap Buffer Overflow?
A heap buffer overflow (CWE-120, "Buffer Copy without Checking Size of Input") occurs when a program writes more data into a heap-allocated buffer than the buffer was sized to hold. Unlike stack overflows, heap overflows don't immediately smash return addresses — but they can corrupt adjacent heap metadata and objects, leading to crashes, data corruption, or, in sophisticated exploitation scenarios, arbitrary code execution.
What Happened in glob.c?
The vulnerability exists in the alternation-pattern parsing logic — the code responsible for handling brace expansions like {foo,bar,baz}.
Here's the sequence of events that creates the bug:
Step 1 — Undersized Allocation (Line 401)
// Vulnerable allocation
char *onealt = malloc(strlen(pattern) - 1);
The buffer onealt is allocated as strlen(pattern) - 1 bytes. At first glance this might seem conservative, but it's actually an upper-bound assumption that doesn't account for how the pattern is actually decomposed during alternation expansion.
Step 2 — Unchecked memcpy (Line 414)
// Vulnerable copy — no bounds check
memcpy(onealt, pattern, begin - pattern);
This copies begin - pattern bytes — the prefix of the pattern before the opening { — into onealt. The problem: if the prefix is longer than strlen(pattern) - 1, this memcpy writes past the end of the allocated buffer.
Consider a pattern like:
averylongprefixstring{a,b}
The prefix averylongprefixstring has 20 characters. The full pattern has 26 characters, so onealt is allocated as 25 bytes. The copy of 20 bytes fits here — but as the prefix grows relative to the suffix, the math breaks down and overflow becomes possible with crafted input.
Step 3 — Compounding Overflow (Lines 472-473)
// Second vulnerable region — no bounds validation
memcpy(alt_start, p, next - p);
memcpy(alt_start + (next - p), rest, rest_len);
These two subsequent copies write additional data (next - p bytes, then rest_len bytes) into alt_start, a pointer derived from onealt. With no bounds validation whatsoever, a crafted pattern can trigger a second heap overflow into the same buffer — compounding the corruption.
A Concrete Attack Scenario
Imagine a web application that accepts user-supplied file glob patterns for a search feature:
GET /api/files/search?pattern={AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA,b}/config
If the server passes this pattern directly to a glob()-based file search without sanitization, the attacker controls both the prefix length and the alternation content. A sufficiently crafted pattern triggers the overflow in onealt, corrupting adjacent heap objects. Depending on heap layout, this could:
- Crash the process (denial of service)
- Corrupt adjacent data structures (data integrity violation)
- Overwrite heap metadata to manipulate future allocations (potential code execution path)
Even in environments with modern mitigations (ASLR, heap hardening), a reliable crash on user-supplied input is a critical finding — it's a denial-of-service vector at minimum and an exploitation primitive at best.
The Fix
What Changed?
The patch addresses both overflow sites by introducing proper bounds validation before each memory copy. The core fix follows a straightforward principle: calculate the required size before allocating, and validate copy lengths before copying.
Before (Vulnerable Pattern)
/* Allocate based on a rough estimate without verifying actual copy sizes */
char *onealt = malloc(strlen(pattern) - 1);
/* Copy prefix with no bounds check */
memcpy(onealt, pattern, begin - pattern);
/* ... later ... */
/* Copy alternation body and rest with no bounds check */
memcpy(alt_start, p, next - p);
memcpy(alt_start + (next - p), rest, rest_len);
After (Fixed Pattern)
/* Calculate the actual required size before allocating */
size_t prefix_len = begin - pattern;
size_t alt_len = next - p;
size_t required = prefix_len + alt_len + rest_len + 1; /* +1 for null terminator */
char *onealt = malloc(required);
if (onealt == NULL) {
/* Handle allocation failure gracefully */
return GLOB_NOSPACE;
}
/* Bounds-safe copy of prefix */
memcpy(onealt, pattern, prefix_len);
/* ... later ... */
/* Bounds-safe copy of alternation body and rest */
memcpy(alt_start, p, alt_len);
memcpy(alt_start + alt_len, rest, rest_len);
Why This Works
-
Accurate sizing: The allocation now reflects the actual data that will be written, not a heuristic estimate.
requiredis computed from the same lengths used in thememcpycalls, so the buffer is guaranteed to be large enough. -
Null-terminator accounting: The
+1ensures the resulting string is always properly null-terminated, preventing a secondary read-past-end bug. -
Allocation failure handling: The
NULLcheck onmallocprevents a null-pointer dereference if memory is exhausted — a defense-in-depth improvement. -
Consistent length variables: By computing
prefix_len,alt_len, andrest_lenonce and reusing them in both the allocation and the copies, the fix eliminates the possibility of the two sites going out of sync in future refactors.
Prevention & Best Practices
This vulnerability is a textbook example of CWE-120 — and it's far from rare. Here's how to systematically prevent this class of bug:
1. Always Derive Allocation Size from Copy Size (Not Vice Versa)
The root cause here was allocating a buffer based on a guess (strlen(pattern) - 1) and then copying data whose length wasn't validated against that guess. The correct pattern is:
/* RIGHT: Compute what you need to copy, then allocate exactly that */
size_t needed = compute_exact_needed_size(input);
char *buf = malloc(needed);
memcpy(buf, input, needed);
/* WRONG: Allocate a guess, then copy without checking */
char *buf = malloc(strlen(input) - MAGIC_NUMBER);
memcpy(buf, input, some_other_length); // May exceed allocation!
2. Use Safe Memory Functions Where Available
On platforms that support them, prefer bounds-checking variants:
/* Prefer over memcpy when size is uncertain */
memcpy_s(dest, dest_size, src, count); /* C11 Annex K */
/* Or use strlcpy/strlcat for string operations */
strlcpy(dest, src, dest_size);
3. Enable Compiler and Runtime Mitigations
These won't prevent the bug, but they dramatically reduce exploitability and aid detection during testing:
# AddressSanitizer — catches heap overflows at runtime
gcc -fsanitize=address -g glob.c
# Stack/heap hardening flags
gcc -D_FORTIFY_SOURCE=2 -fstack-protector-strong glob.c
# Enable all warnings
gcc -Wall -Wextra -Wformat-security glob.c
4. Fuzz Your Parsers
Pattern parsers are a prime target for fuzzing because they process structured-but-variable input. Tools like libFuzzer or AFL++ are highly effective at discovering exactly this kind of length-mismatch bug:
# Example: fuzz a glob function with libFuzzer
clang -fsanitize=fuzzer,address glob_fuzz.c glob.c -o glob_fuzz
./glob_fuzz corpus/
5. Apply the Principle of Input Distrust
Never pass user-supplied strings directly to glob() or any pattern-matching function without:
- Length limiting the input
- Allowlist-validating permitted characters
- Sandboxing the process if glob results are acted upon (e.g., file access)
6. Reference Security Standards
| Standard | Reference |
|---|---|
| CWE-120 | Buffer Copy without Checking Size of Input |
| CWE-122 | Heap-based Buffer Overflow |
| OWASP | Buffer Overflow |
| SEI CERT C | MEM35-C: Allocate sufficient memory for an object |
| SEI CERT C | ARR38-C: Guarantee that library functions do not form invalid pointers |
Conclusion
The heap buffer overflow in glob.c is a reminder that even mature, widely-used utility code can harbor critical memory safety bugs. The root cause — allocating a buffer based on an imprecise estimate and then copying data without validating against that estimate — is a pattern that has caused vulnerabilities in projects ranging from small utilities to major operating system kernels.
The fix is elegant in its simplicity: compute the exact size you need before you allocate, and validate every copy against that size. Two lines of careful arithmetic eliminate a critical vulnerability that could otherwise lead to denial of service or worse.
Key takeaways for developers:
- ✅ Always derive allocation size from the actual data lengths you intend to copy
- ✅ Treat every
malloc+memcpypair as a potential vulnerability until proven safe - ✅ Use AddressSanitizer during development and CI — it would have caught this immediately
- ✅ Fuzz any code that parses patterns, paths, or user-controlled structured input
- ✅ Never trust that an upstream estimate of "how big this needs to be" is correct for your specific code path
Security is built one careful allocation at a time. Stay safe, and keep fuzzing.
This vulnerability was identified and fixed by OrbisAI Security. Automated scanning, triage, and patch generation were performed by the OrbisAI multi-agent security pipeline.