Heap Buffer Overflow in Path Normalization: How Two Unsafe memcpy Calls Almost Became a Critical Exploit
Introduction
Buffer overflows are among the oldest vulnerabilities in software security, yet they continue to appear in production codebases — often in the least glamorous corners of a project, like utility functions and path helpers. This post dives into a critical heap buffer overflow (CWE-120) discovered and patched in src/aux.c, specifically inside a normalize_path function that failed to validate buffer sizes before copying data.
If you write C or C++, work with file system paths, or simply want to understand how a seemingly mundane utility function can become a critical security liability, read on.
The Vulnerability Explained
What Is a Heap Buffer Overflow?
A heap buffer overflow occurs when a program writes more data into a heap-allocated buffer than the buffer was sized to hold. Unlike stack overflows (which often crash immediately or overwrite return addresses), heap overflows can silently corrupt adjacent heap metadata or other allocated objects — making them notoriously difficult to detect and potentially very powerful to exploit.
What Went Wrong Here
The vulnerable code lived inside normalize_path() in src/aux.c. Two memcpy calls were the culprits:
At line 541: pwd_len bytes were copied into a result buffer res without first confirming that res had been allocated with enough capacity to hold that data.
At line 571: Additional data was appended starting at offset res_len into the same buffer, without checking whether res_len + len exceeded the total allocated size.
In pseudocode, the dangerous pattern looked roughly like this:
// VULNERABLE (simplified illustration)
char *res = malloc(some_size);
size_t res_len = 0;
// Line ~541: Copy pwd into res — but is res big enough?
memcpy(res, pwd, pwd_len);
// Line ~571: Append more data — but does res_len + len fit?
memcpy(res + res_len, extra_data, len);
Neither copy validated that the destination had sufficient space. The root cause was an integer arithmetic issue: the allocation size calculation involved adding pwd_len (which can be as large as PATH_MAX) plus a constant offset plus the length of the input. If these values were large enough, the addition could theoretically wrap around due to size_t overflow, resulting in a tiny allocation that subsequent memcpy calls would then massively overwrite.
How Could It Be Exploited?
The attack surface is any mechanism that allows an adversary to influence the current working directory path seen by the application. Concrete examples include:
- Deeply nested directories: Creating a directory hierarchy with path components summing to near
PATH_MAXbytes. - Symlinks with very long names: Crafting symlinks whose resolved targets produce unusually long paths.
- Controlled working directory in multi-user or containerized environments: In scenarios where an attacker can set or influence
$PWDbefore the process runs.
If an attacker triggers the overflow, they can corrupt adjacent heap memory. Depending on what lives next to the res buffer in the heap, this could mean:
| Corrupted Object | Potential Impact |
|---|---|
| Heap metadata (free-list pointers) | Arbitrary write primitive, potential code execution |
| Another buffer containing sensitive data | Information disclosure |
| A function pointer or vtable | Control flow hijacking |
| Allocator bookkeeping | Denial of service / crash |
On modern systems with heap hardening (ASLR, guard pages, hardened allocators), exploitation is harder — but not impossible, especially in long-running server processes or privileged system utilities.
Real-World Attack Scenario
Imagine this function is part of a file manager, backup tool, or security scanner that processes user-supplied paths. An attacker on a shared system creates:
/tmp/attacker/AAAA...AAAA/ (hundreds of nested dirs, total path ~PATH_MAX bytes)
They then trigger the application to normalize this path. The pwd_len + 2 + l calculation silently overflows size_t, malloc receives a tiny size (e.g., 3 bytes), and the subsequent memcpy of hundreds of bytes obliterates the heap. Game over.
The Fix
What Changed
The patch introduces a single, surgical overflow guard immediately before the buffer allocation:
// BEFORE: No size validation before allocation
char *res = NULL;
size_t res_len = 0;
// ... allocation and memcpy proceed with potentially overflowed sizes
// AFTER: Guard against size_t overflow before any allocation occurs
/* Guard against theoretical size_t overflow in buffer allocation.
* Ensures pwd_len (at most PATH_MAX) + 2 + l will not overflow. */
if (l >= SIZE_MAX - PATH_MAX - 2)
return NULL;
char *res = NULL;
size_t res_len = 0;
// ... now safe to proceed with allocation
Why This Fix Works
The guard checks whether l (the length of the normalized input) is so large that adding PATH_MAX + 2 to it would wrap around SIZE_MAX. Let's break down the math:
pwd_lenis bounded byPATH_MAX(typically 4096 on Linux)- The constant
2accounts for a separator character and null terminator lis the variable-length input component
The condition l >= SIZE_MAX - PATH_MAX - 2 is equivalent to asking: "Would l + PATH_MAX + 2 overflow a size_t?" If yes, return NULL immediately — no allocation, no copy, no overflow.
This is a classic pre-condition check pattern for safe arithmetic in C, and it costs essentially nothing at runtime (a single comparison before a heap allocation).
The Diff at a Glance
@@ -516,6 +516,11 @@ normalize_path(char *src, const size_t src_len)
char *s = tmp ? tmp : src;
const size_t l = tmp ? strlen(tmp) : src_len;
+ /* Guard against theoretical size_t overflow in buffer allocation.
+ * Ensures pwd_len (at most PATH_MAX) + 2 + l will not overflow. */
+ if (l >= SIZE_MAX - PATH_MAX - 2)
+ return NULL;
+
/* Resolve references to . and .. */
char *res = NULL;
size_t res_len = 0;
Five lines. That's all it took to close a critical vulnerability. This is a good reminder that security fixes don't need to be complex — they need to be correct.
Prevention & Best Practices
1. Always Validate Sizes Before memcpy / memset / memmove
Whenever you copy into a buffer, ask yourself: "Do I know, with certainty, that the destination is large enough?" If the answer involves arithmetic on user-influenced values, validate first.
// Safe pattern
if (src_len > dst_capacity) {
return ERROR_BUFFER_TOO_SMALL;
}
memcpy(dst, src, src_len);
2. Guard Integer Arithmetic in Size Calculations
Addition and multiplication on size_t values can overflow silently. Always check before computing allocation sizes:
// Check before: a + b
if (b > SIZE_MAX - a) { /* overflow! */ }
// Check before: a * b
if (a > SIZE_MAX / b) { /* overflow! */ }
Consider using safe integer libraries like safe_math or compiler builtins (__builtin_add_overflow in GCC/Clang).
3. Use Safer Abstractions When Possible
In new C code, prefer functions that require explicit size parameters:
| Unsafe | Safer Alternative |
|---|---|
strcpy |
strlcpy or strncpy + manual null-term |
strcat |
strlcat |
gets |
fgets |
sprintf |
snprintf |
memcpy with unchecked size |
memcpy with pre-validated size |
In C++, prefer std::string, std::vector, and std::span over raw pointer arithmetic.
4. Enable Compiler and Runtime Protections
Modern toolchains offer multiple layers of defense:
# Compile-time hardening flags (GCC/Clang)
-D_FORTIFY_SOURCE=2 # Runtime buffer overflow detection
-fstack-protector-strong # Stack canaries
-fsanitize=address # AddressSanitizer (development/CI)
-fsanitize=undefined # UBSanitizer catches integer overflows
AddressSanitizer (ASan) would have caught this exact bug at runtime during testing — making it a valuable addition to any C/C++ CI pipeline.
5. Fuzz Path-Handling Code
Path normalization functions are prime targets for fuzzing because:
- They accept highly variable-length inputs
- They perform complex string manipulation
- Edge cases (empty strings, all-slash paths, max-length paths) are easy to miss
Tools like libFuzzer or AFL++ can automatically generate inputs that stress-test boundary conditions:
# Example: fuzz normalize_path with libFuzzer
clang -fsanitize=fuzzer,address -o fuzz_normalize fuzz_normalize.c src/aux.c
./fuzz_normalize -max_len=8192
6. Reference Security Standards
This vulnerability maps to well-known classifications:
- CWE-120: Buffer Copy without Checking Size of Input ("Classic Buffer Overflow")
- CWE-190: Integer Overflow or Wraparound
- OWASP: A03:2021 – Injection (memory corruption as a class)
- SEI CERT C: Rule
ARR38-C— Guarantee that library functions do not form invalid pointers; RuleINT30-C— Ensure unsigned integer operations do not wrap
Conclusion
This vulnerability is a textbook example of why C's lack of memory safety requires constant vigilance. A path normalization utility — the kind of function that gets written once and forgotten — contained a critical heap overflow hiding behind a simple arithmetic assumption: that the numbers would never get big enough to wrap around.
The key takeaways:
- Integer overflow in size calculations is a real attack vector, not a theoretical one.
- Heap overflows can be exploited even without direct stack control, especially in long-running processes.
- The fix was five lines — pre-condition validation before allocation is cheap and effective.
- Tooling helps: ASan, fuzzing, and static analysis can catch these issues before they reach production.
- Path-handling code deserves extra scrutiny — it frequently combines user-influenced data with system constants in arithmetic operations.
Security isn't about writing perfect code the first time. It's about building systems — code review, automated scanning, fuzzing, hardening flags — that catch imperfections before attackers do.
This vulnerability was automatically detected and patched by OrbisAI Security. Automated security tooling identified the unsafe memcpy pattern, generated the fix, verified it with a re-scan, and submitted it for human review — all without manual triage.