What is an integer overflow vulnerability in URI parsing?

It occurs when the length of a URI string is so large that adding 1 to it wraps the numeric value back to zero (or a very small number). The code then allocates a near-zero-byte buffer but copies the full URI into it, corrupting adjacent heap memory.

How do you prevent integer overflow in C memory allocation?

Always validate that arithmetic on size values cannot wrap before passing the result to `malloc` or `memcpy`. A simple guard such as `if (len == SIZE_MAX) return NULL;` is sufficient for the `len + 1` pattern.

What CWE is integer overflow?

CWE-190 — Integer Overflow or Wraparound. It is closely related to CWE-122 (Heap-Based Buffer Overflow), which is the downstream consequence in this case.

Is input length limiting alone enough to prevent this integer overflow?

Only if the limit is enforced *before* the arithmetic. If the length is read from an untrusted source and passed directly to `len + 1` without a prior cap or overflow check, an attacker can still trigger the wrap even when a theoretical limit exists elsewhere.

Can static analysis detect integer overflow in C allocations?

Yes. Tools such as Semgrep, Coverity, and CodeQL have rules that flag unchecked arithmetic in `malloc` arguments. Orbis AppSec detected this exact pattern automatically and opened a remediation pull request.

Heap Corruption via Integer Overflow in URI Parsing: A Deep Dive into CWE-190

Introduction

Integer overflows are among the oldest and most dangerous classes of vulnerabilities in systems programming. They're subtle, they're silent, and when they occur in memory allocation paths, they can hand an attacker the keys to your process. This post examines a critical integer overflow vulnerability discovered and fixed in uri.c — a URI parsing component — that could allow a remote attacker to corrupt the heap and potentially achieve arbitrary code execution.

If you write C or C++, work with URI parsers, or simply care about memory safety, this one is worth understanding deeply.

The Vulnerability Explained

What Went Wrong

The vulnerable code pattern lives in uri.c around line 211–215. It looks something like this:

// VULNERABLE CODE — DO NOT USE
char *out = (char *) malloc(len + 1);
memcpy(out, range->first, len);

At first glance, this looks reasonable. Allocate len + 1 bytes (one extra for the null terminator), then copy len bytes in. Simple. Classic. Broken.

The problem is the len + 1 expression. In C, len is typically a size_t — an unsigned integer type. On a 64-bit system, size_t can hold values up to SIZE_MAX, which is 18446744073709551615 (2⁶⁴ - 1). If an attacker can supply a len value equal to SIZE_MAX, then:

SIZE_MAX + 1 = 0   (integer overflow — wraps to zero)

malloc(0) is implementation-defined but commonly returns a valid, non-NULL pointer to a zero-byte (or minimal) buffer. Then memcpy(out, range->first, SIZE_MAX) proceeds to copy an astronomically large number of bytes into that tiny buffer, obliterating the heap.

The Attack Path

This vulnerability is reachable via crafted URI input. The exploitation scenario follows this chain:

Attacker submits a crafted URI — for example, through an HTTP request, a SQL query that embeds a URI, or any other input surface that feeds data into the URI parser.
The parser extracts a range — the range->first pointer and a len value derived from pointer arithmetic on the URI string.
len is attacker-influenced — if the URI is constructed such that the computed range length equals SIZE_MAX (or any value where len + 1 overflows), the overflow is triggered.
malloc(0) returns a tiny buffer — the allocator happily hands back a pointer.
memcpy writes far beyond the buffer — heap metadata, adjacent allocations, function pointers — all overwritten.

Real-World Impact

Heap corruption is not just a crash. A skilled attacker can:

Overwrite heap metadata to hijack allocator behavior on the next free() or malloc() call
Overwrite adjacent objects containing function pointers or vtable pointers
Chain with a second vulnerability (e.g., a use-after-free or type confusion) to achieve arbitrary code execution
Cause a denial of service at minimum — the process will almost certainly crash

This is classified as CWE-190: Integer Overflow or Wraparound, and it's rated CRITICAL for good reason.

Why URI Parsers Are High-Risk

URI parsers are a particularly dangerous place for this class of bug because:

They process untrusted, attacker-controlled input by design
They perform extensive pointer arithmetic on the input string to extract components (scheme, host, path, query, fragment)
The computed lengths are directly used in memory operations
They are often called early in request processing, before other validation layers

The Fix

What Changed

The fix adds explicit integer overflow checks before any memory allocation occurs. Here is the safe version:

// SAFE CODE — after the fix
static char *safe_uri_range_copy(const char *first, size_t len)
{
    /* Guard 1: len must not be SIZE_MAX — len+1 would overflow to 0 */
    if (len == SIZE_MAX) {
        return NULL;
    }

    /* Guard 2: belt-and-suspenders overflow check */
    if (len + 1 < len) {
        return NULL;
    }

    /* Guard 3: NULL pointer with nonzero length is invalid */
    if (first == NULL && len > 0) {
        return NULL;
    }

    /* Now it is safe to allocate */
    char *out = (char *) malloc(len + 1);
    if (out == NULL) {
        return NULL;  /* Guard 4: always check malloc return value */
    }

    if (len > 0) {
        memcpy(out, first, len);
    }
    out[len] = '\0';

    return out;
}

How Each Guard Works

Guard	Condition Checked	Why It Matters
Guard 1	`len == SIZE_MAX`	Direct check for the exact overflow boundary
Guard 2	`len + 1 < len`	Catches any overflow, even on unusual platforms
Guard 3	`first == NULL && len > 0`	Prevents NULL dereference in `memcpy`
Guard 4	`out == NULL` after `malloc`	Prevents use of a failed allocation

Guards 1 and 2 are both present intentionally — this is a "belt and suspenders" approach. Guard 1 catches the most common case cleanly. Guard 2 is a portable, compiler-friendly overflow check that works even if SIZE_MAX is defined differently across platforms.

Before vs. After

// BEFORE: No validation — one crafted URI causes heap corruption
char *uri_range_to_string(uri_range_t *range) {
    size_t len = range->last - range->first;
    char *out = (char *) malloc(len + 1);  // ← overflow possible here
    memcpy(out, range->first, len);         // ← heap corruption here
    out[len] = '\0';
    return out;
}

// AFTER: Validation gates all memory operations
char *uri_range_to_string(uri_range_t *range) {
    size_t len = range->last - range->first;

    if (len == SIZE_MAX || len + 1 < len) {
        return NULL;  // ← reject overflow before it happens
    }

    char *out = (char *) malloc(len + 1);
    if (out == NULL) {
        return NULL;  // ← always check allocation
    }

    memcpy(out, range->first, len);
    out[len] = '\0';
    return out;
}

The change is small. The security improvement is enormous.

The Regression Test

The fix ships with a comprehensive regression test suite in tests/test_invariant_uri.c. This is worth highlighting because a fix without a test is an invitation for the bug to come back.

The test suite covers:

/* Test 1: Normal URIs — must parse correctly */
"http://example.com/path"
"https://user:pass@host:8080/path?query=val#frag"

/* Test 2: Adversarial — very long segments */
"http://aaaa...aaaa"  // 280+ 'a' characters

/* Test 3: Special characters and high bytes */
"http://evil.com/\xff\xfe\xfd\xfc"

/* Test 4: The exact overflow boundary */
safe_uri_range_copy(dummy, SIZE_MAX);  // Must return NULL

/* Test 5: NULL pointer with nonzero length */
safe_uri_range_copy(NULL, 10);  // Must return NULL

The key invariant the test enforces:

For any URI-like input, the length used in malloc must not overflow, and the allocated buffer must be large enough to hold len+1 bytes before any memcpy is performed.

Prevention & Best Practices

1. Always Validate Before Allocating

Any time you compute a size from external input and pass it to malloc, calloc, or realloc, validate it first:

// Pattern: check before compute
if (len > MAX_REASONABLE_URI_LENGTH) return NULL;
if (len == SIZE_MAX) return NULL;
if (len + 1 < len) return NULL;  // overflow check
char *buf = malloc(len + 1);
if (!buf) return NULL;

2. Prefer `calloc` for Array Allocations

When allocating n elements of size s, use calloc(n, s) instead of malloc(n * s). calloc performs the multiplication with overflow checking internally on most modern implementations:

// Risky:
char *buf = malloc(count * element_size);

// Safer:
char *buf = calloc(count, element_size);

3. Use Safe Integer Libraries

For C code, consider using helper macros or libraries designed for safe integer arithmetic:

safe-iop — Safe integer operations for C
IntegerLib — CERT-inspired safe integer library
__builtin_add_overflow (GCC/Clang) — Compiler built-in overflow detection:

size_t alloc_size;
if (__builtin_add_overflow(len, 1, &alloc_size)) {
    return NULL;  // overflow detected
}
char *buf = malloc(alloc_size);

4. Enable Compiler Sanitizers During Development

AddressSanitizer (ASan) and UndefinedBehaviorSanitizer (UBSan) catch these issues at runtime during testing:

# Compile with sanitizers
gcc -fsanitize=address,undefined -g -o myprogram myprogram.c

# Or with CMake
cmake -DCMAKE_C_FLAGS="-fsanitize=address,undefined" ..

UBSan specifically catches signed integer overflow. For unsigned overflow (which is technically defined behavior in C — it wraps), you need explicit checks like the ones in this fix.

5. Static Analysis

Run static analyzers as part of your CI pipeline:

Tool	What It Catches
Coverity	Integer overflows, buffer overflows
CodeQL	CWE-190, CWE-122 (heap buffer overflow)
Clang Static Analyzer	Memory safety issues
Flawfinder	Dangerous function calls (`memcpy`, `strcpy`)
PVS-Studio	Arithmetic overflow, pointer arithmetic

6. Adopt a Maximum Length Policy

URI components have well-defined maximum lengths in practice. Enforce them:

#define MAX_URI_LENGTH    8192   // RFC 7230 recommends supporting 8000+
#define MAX_PATH_LENGTH   4096
#define MAX_QUERY_LENGTH  4096

if (len > MAX_URI_LENGTH) {
    // Reject — no legitimate URI is this long
    return NULL;
}

This defense-in-depth approach means that even if the overflow check were somehow bypassed, absurdly large lengths would still be rejected.

7. Relevant Security Standards

CWE-190: Integer Overflow or Wraparound
CWE-122: Heap-based Buffer Overflow
CERT C Rule INT30-C: Ensure unsigned integer operations do not wrap
OWASP: Buffer Overflow: General guidance on buffer overflow prevention
NIST NVD: Reference for CVE tracking of similar vulnerabilities

Key Takeaways

The vulnerability fixed here is a textbook example of why input validation must happen before memory operations, not after. The fix is just a few lines of code, but those lines enforce a critical security invariant: the allocation size must always be large enough to hold the data being copied into it.

Here's what to take away from this fix:

Integer overflow in malloc arguments is a heap corruption primitive — treat it as seriously as a direct buffer overflow.
len + 1 is dangerous — always check that the addition doesn't overflow before passing the result to an allocator.
URI parsers process attacker-controlled data — every length computed from URI input is a potential attack vector.
Small fixes, big impact — four lines of validation eliminated a critical, potentially exploitable vulnerability.
Tests make fixes permanent — the regression test ensures this vulnerability class cannot silently return in a future refactor.

Memory safety bugs don't announce themselves. They hide in arithmetic, waiting for the one input that makes the numbers lie. The defense is disciplined validation — check your sizes, check your pointers, and check your allocations. Every time.

This vulnerability was identified and fixed by automated security scanning. For more information on automated vulnerability detection and remediation, visit OrbisAI Security.

cwe	CWE-190
fix	Add `if (len == SIZE_MAX) return NULL;` guard before the allocation in `uri.c`
risk	Heap corruption enabling remote code execution via crafted URI input
language	C
root cause	`len + 1` wraps to 0 when `len == SIZE_MAX`, producing an undersized `malloc` followed by an oversized `memcpy`
vulnerability	Integer Overflow leading to Heap Buffer Overflow

Heap Corruption via Integer Overflow in URI Parsing: A Deep Dive into CWE-190

Answer Summary

Vulnerability at a Glance

Heap Corruption via Integer Overflow in URI Parsing: A Deep Dive into CWE-190

Introduction

The Vulnerability Explained

What Went Wrong

The Attack Path

Real-World Impact

Why URI Parsers Are High-Risk

The Fix

What Changed

How Each Guard Works

Before vs. After

The Regression Test

Prevention & Best Practices

1. Always Validate Before Allocating

2. Prefer `calloc` for Array Allocations

3. Use Safe Integer Libraries

4. Enable Compiler Sanitizers During Development

5. Static Analysis

6. Adopt a Maximum Length Policy

7. Relevant Security Standards

Key Takeaways

Frequently Asked Questions

What is an integer overflow vulnerability in URI parsing?

How do you prevent integer overflow in C memory allocation?

What CWE is integer overflow?

Is input length limiting alone enough to prevent this integer overflow?

Can static analysis detect integer overflow in C allocations?

View the Security Fix

Related Articles

How integer overflow in malloc happens in C bipartite matching and how to fix it

How buffer overflow via sprintf() happens in C networking code and how to fix it

How weak cryptographic randomness happens in C CSPRNG fallback paths and how to fix it

How integer overflow happens in C reliable.c and how to fix it

How insecure string copy functions happen in C (cyw43.c) and how to fix it

How buffer overflow happens in C sprintf() calls and how to fix it

Heap Corruption via Integer Overflow in URI Parsing: A Deep Dive into CWE-190

Answer Summary

Vulnerability at a Glance

Heap Corruption via Integer Overflow in URI Parsing: A Deep Dive into CWE-190

Introduction

The Vulnerability Explained

What Went Wrong

The Attack Path

Real-World Impact

Why URI Parsers Are High-Risk

The Fix

What Changed

How Each Guard Works

Before vs. After

The Regression Test

Prevention & Best Practices

1. Always Validate Before Allocating

2. Prefer calloc for Array Allocations

3. Use Safe Integer Libraries

4. Enable Compiler Sanitizers During Development

5. Static Analysis

6. Adopt a Maximum Length Policy

7. Relevant Security Standards

Key Takeaways

Frequently Asked Questions

What is an integer overflow vulnerability in URI parsing?

How do you prevent integer overflow in C memory allocation?

What CWE is integer overflow?

Is input length limiting alone enough to prevent this integer overflow?

Can static analysis detect integer overflow in C allocations?

View the Security Fix

Related Articles

How integer overflow in malloc happens in C bipartite matching and how to fix it

How buffer overflow via sprintf() happens in C networking code and how to fix it

How weak cryptographic randomness happens in C CSPRNG fallback paths and how to fix it

How integer overflow happens in C reliable.c and how to fix it

How insecure string copy functions happen in C (cyw43.c) and how to fix it

How buffer overflow happens in C sprintf() calls and how to fix it

2. Prefer `calloc` for Array Allocations