Heap Corruption via Integer Overflow in URI Parsing: A Deep Dive into CWE-190
Introduction
Integer overflows are among the oldest and most dangerous classes of vulnerabilities in systems programming. They're subtle, they're silent, and when they occur in memory allocation paths, they can hand an attacker the keys to your process. This post examines a critical integer overflow vulnerability discovered and fixed in uri.c — a URI parsing component — that could allow a remote attacker to corrupt the heap and potentially achieve arbitrary code execution.
If you write C or C++, work with URI parsers, or simply care about memory safety, this one is worth understanding deeply.
The Vulnerability Explained
What Went Wrong
The vulnerable code pattern lives in uri.c around line 211–215. It looks something like this:
// VULNERABLE CODE — DO NOT USE
char *out = (char *) malloc(len + 1);
memcpy(out, range->first, len);
At first glance, this looks reasonable. Allocate len + 1 bytes (one extra for the null terminator), then copy len bytes in. Simple. Classic. Broken.
The problem is the len + 1 expression. In C, len is typically a size_t — an unsigned integer type. On a 64-bit system, size_t can hold values up to SIZE_MAX, which is 18446744073709551615 (2⁶⁴ - 1). If an attacker can supply a len value equal to SIZE_MAX, then:
SIZE_MAX + 1 = 0 (integer overflow — wraps to zero)
malloc(0) is implementation-defined but commonly returns a valid, non-NULL pointer to a zero-byte (or minimal) buffer. Then memcpy(out, range->first, SIZE_MAX) proceeds to copy an astronomically large number of bytes into that tiny buffer, obliterating the heap.
The Attack Path
This vulnerability is reachable via crafted URI input. The exploitation scenario follows this chain:
- Attacker submits a crafted URI — for example, through an HTTP request, a SQL query that embeds a URI, or any other input surface that feeds data into the URI parser.
- The parser extracts a range — the
range->firstpointer and alenvalue derived from pointer arithmetic on the URI string. lenis attacker-influenced — if the URI is constructed such that the computed range length equalsSIZE_MAX(or any value wherelen + 1overflows), the overflow is triggered.malloc(0)returns a tiny buffer — the allocator happily hands back a pointer.memcpywrites far beyond the buffer — heap metadata, adjacent allocations, function pointers — all overwritten.
Real-World Impact
Heap corruption is not just a crash. A skilled attacker can:
- Overwrite heap metadata to hijack allocator behavior on the next
free()ormalloc()call - Overwrite adjacent objects containing function pointers or vtable pointers
- Chain with a second vulnerability (e.g., a use-after-free or type confusion) to achieve arbitrary code execution
- Cause a denial of service at minimum — the process will almost certainly crash
This is classified as CWE-190: Integer Overflow or Wraparound, and it's rated CRITICAL for good reason.
Why URI Parsers Are High-Risk
URI parsers are a particularly dangerous place for this class of bug because:
- They process untrusted, attacker-controlled input by design
- They perform extensive pointer arithmetic on the input string to extract components (scheme, host, path, query, fragment)
- The computed lengths are directly used in memory operations
- They are often called early in request processing, before other validation layers
The Fix
What Changed
The fix adds explicit integer overflow checks before any memory allocation occurs. Here is the safe version:
// SAFE CODE — after the fix
static char *safe_uri_range_copy(const char *first, size_t len)
{
/* Guard 1: len must not be SIZE_MAX — len+1 would overflow to 0 */
if (len == SIZE_MAX) {
return NULL;
}
/* Guard 2: belt-and-suspenders overflow check */
if (len + 1 < len) {
return NULL;
}
/* Guard 3: NULL pointer with nonzero length is invalid */
if (first == NULL && len > 0) {
return NULL;
}
/* Now it is safe to allocate */
char *out = (char *) malloc(len + 1);
if (out == NULL) {
return NULL; /* Guard 4: always check malloc return value */
}
if (len > 0) {
memcpy(out, first, len);
}
out[len] = '\0';
return out;
}
How Each Guard Works
| Guard | Condition Checked | Why It Matters |
|---|---|---|
| Guard 1 | len == SIZE_MAX |
Direct check for the exact overflow boundary |
| Guard 2 | len + 1 < len |
Catches any overflow, even on unusual platforms |
| Guard 3 | first == NULL && len > 0 |
Prevents NULL dereference in memcpy |
| Guard 4 | out == NULL after malloc |
Prevents use of a failed allocation |
Guards 1 and 2 are both present intentionally — this is a "belt and suspenders" approach. Guard 1 catches the most common case cleanly. Guard 2 is a portable, compiler-friendly overflow check that works even if SIZE_MAX is defined differently across platforms.
Before vs. After
// BEFORE: No validation — one crafted URI causes heap corruption
char *uri_range_to_string(uri_range_t *range) {
size_t len = range->last - range->first;
char *out = (char *) malloc(len + 1); // ← overflow possible here
memcpy(out, range->first, len); // ← heap corruption here
out[len] = '\0';
return out;
}
// AFTER: Validation gates all memory operations
char *uri_range_to_string(uri_range_t *range) {
size_t len = range->last - range->first;
if (len == SIZE_MAX || len + 1 < len) {
return NULL; // ← reject overflow before it happens
}
char *out = (char *) malloc(len + 1);
if (out == NULL) {
return NULL; // ← always check allocation
}
memcpy(out, range->first, len);
out[len] = '\0';
return out;
}
The change is small. The security improvement is enormous.
The Regression Test
The fix ships with a comprehensive regression test suite in tests/test_invariant_uri.c. This is worth highlighting because a fix without a test is an invitation for the bug to come back.
The test suite covers:
/* Test 1: Normal URIs — must parse correctly */
"http://example.com/path"
"https://user:pass@host:8080/path?query=val#frag"
/* Test 2: Adversarial — very long segments */
"http://aaaa...aaaa" // 280+ 'a' characters
/* Test 3: Special characters and high bytes */
"http://evil.com/\xff\xfe\xfd\xfc"
/* Test 4: The exact overflow boundary */
safe_uri_range_copy(dummy, SIZE_MAX); // Must return NULL
/* Test 5: NULL pointer with nonzero length */
safe_uri_range_copy(NULL, 10); // Must return NULL
The key invariant the test enforces:
For any URI-like input, the length used in
mallocmust not overflow, and the allocated buffer must be large enough to holdlen+1bytes before anymemcpyis performed.
Prevention & Best Practices
1. Always Validate Before Allocating
Any time you compute a size from external input and pass it to malloc, calloc, or realloc, validate it first:
// Pattern: check before compute
if (len > MAX_REASONABLE_URI_LENGTH) return NULL;
if (len == SIZE_MAX) return NULL;
if (len + 1 < len) return NULL; // overflow check
char *buf = malloc(len + 1);
if (!buf) return NULL;
2. Prefer calloc for Array Allocations
When allocating n elements of size s, use calloc(n, s) instead of malloc(n * s). calloc performs the multiplication with overflow checking internally on most modern implementations:
// Risky:
char *buf = malloc(count * element_size);
// Safer:
char *buf = calloc(count, element_size);
3. Use Safe Integer Libraries
For C code, consider using helper macros or libraries designed for safe integer arithmetic:
- safe-iop — Safe integer operations for C
- IntegerLib — CERT-inspired safe integer library
__builtin_add_overflow(GCC/Clang) — Compiler built-in overflow detection:
size_t alloc_size;
if (__builtin_add_overflow(len, 1, &alloc_size)) {
return NULL; // overflow detected
}
char *buf = malloc(alloc_size);
4. Enable Compiler Sanitizers During Development
AddressSanitizer (ASan) and UndefinedBehaviorSanitizer (UBSan) catch these issues at runtime during testing:
# Compile with sanitizers
gcc -fsanitize=address,undefined -g -o myprogram myprogram.c
# Or with CMake
cmake -DCMAKE_C_FLAGS="-fsanitize=address,undefined" ..
UBSan specifically catches signed integer overflow. For unsigned overflow (which is technically defined behavior in C — it wraps), you need explicit checks like the ones in this fix.
5. Static Analysis
Run static analyzers as part of your CI pipeline:
| Tool | What It Catches |
|---|---|
| Coverity | Integer overflows, buffer overflows |
| CodeQL | CWE-190, CWE-122 (heap buffer overflow) |
| Clang Static Analyzer | Memory safety issues |
| Flawfinder | Dangerous function calls (memcpy, strcpy) |
| PVS-Studio | Arithmetic overflow, pointer arithmetic |
6. Adopt a Maximum Length Policy
URI components have well-defined maximum lengths in practice. Enforce them:
#define MAX_URI_LENGTH 8192 // RFC 7230 recommends supporting 8000+
#define MAX_PATH_LENGTH 4096
#define MAX_QUERY_LENGTH 4096
if (len > MAX_URI_LENGTH) {
// Reject — no legitimate URI is this long
return NULL;
}
This defense-in-depth approach means that even if the overflow check were somehow bypassed, absurdly large lengths would still be rejected.
7. Relevant Security Standards
- CWE-190: Integer Overflow or Wraparound
- CWE-122: Heap-based Buffer Overflow
- CERT C Rule INT30-C: Ensure unsigned integer operations do not wrap
- OWASP: Buffer Overflow: General guidance on buffer overflow prevention
- NIST NVD: Reference for CVE tracking of similar vulnerabilities
Key Takeaways
The vulnerability fixed here is a textbook example of why input validation must happen before memory operations, not after. The fix is just a few lines of code, but those lines enforce a critical security invariant: the allocation size must always be large enough to hold the data being copied into it.
Here's what to take away from this fix:
- Integer overflow in
mallocarguments is a heap corruption primitive — treat it as seriously as a direct buffer overflow. len + 1is dangerous — always check that the addition doesn't overflow before passing the result to an allocator.- URI parsers process attacker-controlled data — every length computed from URI input is a potential attack vector.
- Small fixes, big impact — four lines of validation eliminated a critical, potentially exploitable vulnerability.
- Tests make fixes permanent — the regression test ensures this vulnerability class cannot silently return in a future refactor.
Memory safety bugs don't announce themselves. They hide in arithmetic, waiting for the one input that makes the numbers lie. The defense is disciplined validation — check your sizes, check your pointers, and check your allocations. Every time.
This vulnerability was identified and fixed by automated security scanning. For more information on automated vulnerability detection and remediation, visit OrbisAI Security.