What is an integer overflow in heap allocation?

An integer overflow in heap allocation occurs when a size calculation (such as multiplying element count by element size) exceeds the maximum value of the integer type, wrapping around to a small number. The allocator then returns a buffer too small for the intended data, and subsequent writes go beyond the allocation boundary — a classic heap buffer overflow.

How do you prevent integer overflow in C heap allocations?

Always validate size arguments before calling malloc or equivalent. The safest pattern is a division-based pre-check: `if (count > SIZE_MAX / sizeof(T)) { /* handle error */ }`. This avoids the overflow entirely rather than trying to detect it after the fact.

What CWE is integer overflow leading to buffer overflow?

The resulting condition is classified as CWE-120 (Buffer Copy without Checking Size of Input), though the root arithmetic issue is also related to CWE-190 (Integer Overflow or Wraparound). Together they describe the full attack chain: overflow in size math → undersized allocation → out-of-bounds write.

Is checking the return value of malloc enough to prevent this vulnerability?

No. The overflow happens before malloc is called — the corrupted size is passed to the allocator, which successfully returns a small (but valid) buffer. Checking for a NULL return only catches allocation failures, not the case where malloc succeeds but returns an undersized buffer.

Can static analysis detect integer overflow in allocation size calculations?

Yes. Tools like Coverity, CodeQL, and AI-powered scanners (such as Orbis AppSec) can trace tainted size values through arithmetic operations and flag unchecked multiplications used as allocation arguments. This specific vulnerability was detected by the `multi_agent_ai` scanner rule `V-002`.

Integer Overflow in regexJIT.c — Safe Fix

How Integer Overflow in regexJIT.c Heap Allocation Happens in C and How to Fix It

The Short Answer

This is a heap buffer overflow (CWE-120) in regex_src/regexJIT.c caused by an integer overflow in sizeof(struct stack_item) * dfa_size. On 32-bit platforms, a large dfa_size wraps the multiplication to a small value, so SLJIT_MALLOC returns an undersized buffer. Subsequent writes overflow that buffer. The fix adds a division-based pre-check that returns REGEX_MEMORY_ERROR before any allocation is attempted.

Introduction

The regex_src/regexJIT.c file is the heart of a JIT-compiled regex engine — it takes a compiled DFA (Deterministic Finite Automaton) and generates native machine code for fast pattern matching. Inside generate_transitions(), the compiler allocates a contiguous array of struct stack_item objects sized to hold every DFA state. A flaw at line 983 of that function means a sufficiently complex regex pattern can cause the size calculation to silently overflow, handing SLJIT_MALLOC a number far smaller than needed.

This matters for any developer working with C-level memory allocation: the bug pattern — multiplying a user-influenced count by a struct size without overflow checking — is one of the most common sources of heap corruption in systems software.

The Vulnerability Explained

What Goes Wrong at Line 983

Inside generate_transitions(), the original code allocates the DFA transition table like this:

// regex_src/regexJIT.c — BEFORE fix (line 983)
compiler_common->dfa_transitions = (struct stack_item *)SLJIT_MALLOC(
    sizeof(struct stack_item) * compiler_common->dfa_size,
    NULL
);
if (!compiler_common->dfa_transitions)
    return REGEX_MEMORY_ERROR;

The problem is the expression sizeof(struct stack_item) * compiler_common->dfa_size. Both operands are sljit_uw (an unsigned word-sized integer). On a 32-bit platform, sljit_uw is 32 bits wide, with a maximum value of 0xFFFFFFFF (4,294,967,295).

If dfa_size is large enough — say, derived from a deeply nested alternation pattern — the multiplication wraps around modulo 2³², producing a small number. For example:

sizeof(struct stack_item) = 16 bytes
dfa_size = 0x10000001  (268,435,457)

16 * 268,435,457 = 4,294,967,312  →  overflows to 16 (on 32-bit)

SLJIT_MALLOC(16, NULL) succeeds and returns a 16-byte buffer. The code then tries to write dfa_size entries into it — hundreds of millions of writes into a 16-byte allocation. The heap is immediately corrupted.

What Makes `dfa_size` Attacker-Controlled?

dfa_size is derived from parsing the regex pattern itself. The PR's regression test illustrates the attack surface directly:

// Deeply nested alternation — maximizes dfa_size
"((((((((((a|b|c|d|e|f|g|h|i|j){100}){100}){100}){100}){10}){10}){10}){5}){5}){5}"

// Large repeated group — stresses dfa_size calculation  
"(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z){65535}"

Any caller of regex_compile() that accepts a regex string from an external source — a configuration file, a network protocol, a user-supplied search field — is a potential attack vector.

The Attack Scenario

An attacker submits a crafted regex pattern (e.g., the deeply nested alternation above) to any API that calls regex_compile().
The DFA construction phase computes a large dfa_size.
generate_transitions() multiplies sizeof(struct stack_item) * dfa_size, which overflows to a small value on a 32-bit build.
SLJIT_MALLOC allocates a tiny buffer and returns a valid pointer.
The subsequent loop writes DFA transition data far beyond the end of the buffer, corrupting adjacent heap objects.
Depending on heap layout and what gets corrupted, this can cause: a crash (denial of service), data corruption, or — with careful heap grooming — arbitrary code execution.

The Secondary Bug: Uninitialized Pointer

The diff also reveals a secondary issue in regex_compile():

// BEFORE fix — compiler_common.dfa_transitions is uninitialized
error_code = generate_transitions(&compiler_common);
stack_destroy(&compiler_common.stack);
stack_destroy(&compiler_common.depth);

If generate_transitions() returns early (e.g., due to the overflow check added by the fix), any cleanup code that conditionally frees dfa_transitions could dereference an uninitialized pointer. The fix initializes it to NULL first, so free(NULL) is always safe.

The Fix

Change 1: Pre-Allocation Overflow Guard in `generate_transitions()`

// regex_src/regexJIT.c — AFTER fix
stack_init(depth);
if (compiler_common->dfa_size > (~(sljit_uw)0) / sizeof(struct stack_item))
    return REGEX_MEMORY_ERROR;
compiler_common->dfa_transitions = (struct stack_item *)SLJIT_MALLOC(
    sizeof(struct stack_item) * compiler_common->dfa_size,
    NULL
);
if (!compiler_common->dfa_transitions)
    return REGEX_MEMORY_ERROR;

The guard condition compiler_common->dfa_size > (~(sljit_uw)0) / sizeof(struct stack_item) deserves a close look:

(~(sljit_uw)0) is the maximum value of sljit_uw on any platform — all bits set, regardless of whether it's 32 or 64 bits. This avoids hardcoding UINT32_MAX or SIZE_MAX.
Dividing by sizeof(struct stack_item) gives the maximum safe element count before multiplication would overflow.
If dfa_size exceeds that threshold, the function returns REGEX_MEMORY_ERROR immediately — no allocation, no corruption.

This is the canonical safe pattern for multiplication overflow checks in C. It works because division is the inverse of multiplication: if a > MAX / b, then a * b > MAX.

Change 2: Initialize `dfa_transitions` to `NULL` Before Calling `generate_transitions()`

// regex_src/regexJIT.c — AFTER fix (in regex_compile)
compiler_common.dfa_transitions = NULL;   // ← added
error_code = generate_transitions(&compiler_common);

This one-line change ensures that if generate_transitions() returns early (including via the new overflow guard), any subsequent code that frees compiler_common.dfa_transitions will safely call free(NULL) rather than freeing a garbage pointer.

Change 3: Regression Test in `regexMain.c`

// regex_src/regexMain.c — added regression test
{
    struct regex_machine *machine;
    int err = REGEX_NO_ERROR;
    machine = regex_compile(
        "(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z){1000}",
        64, 0, &err);
    if (machine)
        regex_free_machine(machine);
    else if (err != REGEX_MEMORY_ERROR) {
        printf("FAIL: overflow regression returned unexpected error %d\n", err);
        return 1;
    }
}

This test is embedded directly in the main test runner. It verifies that the overflow-inducing pattern either compiles successfully (acceptable on 64-bit where no overflow occurs) or fails with exactly REGEX_MEMORY_ERROR — not a crash, not a silent heap corruption, not any other error code.

Before vs. After Summary

Aspect	Before	After
Overflow check	None	Division-based guard before `SLJIT_MALLOC`
`dfa_transitions` init	Uninitialized	Set to `NULL` before `generate_transitions()`
Oversized pattern	Heap corruption	Returns `REGEX_MEMORY_ERROR`
Regression coverage	None	Inline test + standalone `check` test

Prevention & Best Practices

1. Always Use the Division Pattern for Allocation Size Checks

// ✅ Safe: check before multiplying
if (count > SIZE_MAX / sizeof(T)) {
    return ERROR_OVERFLOW;
}
T *buf = malloc(count * sizeof(T));

// ❌ Unsafe: multiply first, check after
size_t total = count * sizeof(T);
if (total < count) { /* too late — UB may have already occurred */ }

The division check is portable, branchless-friendly, and works for any integer width.

2. Use `reallocarray()` or `calloc()` Where Available

On modern Linux/BSD systems, reallocarray(ptr, nmemb, size) performs the overflow check internally:

// ✅ Overflow-safe on glibc 2.26+, OpenBSD, macOS
compiler_common->dfa_transitions = reallocarray(
    NULL,
    compiler_common->dfa_size,
    sizeof(struct stack_item)
);

calloc(nmemb, size) also performs an internal overflow check on most implementations. However, since this code targets portability across platforms (it's part of an SLJIT abstraction layer), the explicit guard is the right choice here.

3. Initialize All Pointers at Declaration

// ✅ Always initialize struct members before use
struct compiler_common compiler_common;
memset(&compiler_common, 0, sizeof(compiler_common));
// or explicitly:
compiler_common.dfa_transitions = NULL;

Uninitialized pointers in structs that are conditionally freed are a persistent source of undefined behavior in C.

4. Fuzz Regex Input

Regex engines are classic fuzzing targets. Tools like AFL++ and libFuzzer can generate the exact kind of deeply nested alternation patterns that triggered this bug:

# Example: fuzz regex_compile() with AFL++
afl-fuzz -i seeds/ -o findings/ -- ./regex_fuzz_harness @@

5. Enable Compiler Sanitizers During Development

# AddressSanitizer catches heap overflows immediately
gcc -fsanitize=address,undefined -g regex_src/regexJIT.c

# UBSan specifically catches integer overflow
gcc -fsanitize=undefined -fsanitize=integer-overflow -g ...

6. Reference Standards

CWE-120: Buffer Copy without Checking Size of Input
CWE-190: Integer Overflow or Wraparound
OWASP: Memory Management Cheat Sheet

Key Takeaways

The multiplication sizeof(struct stack_item) * dfa_size in generate_transitions() was the exact dangerous pattern — not a generic "unchecked input" issue, but a specific arithmetic operation on a compiler-internal size variable derived from regex complexity.
The fix uses (~(sljit_uw)0) / sizeof(struct stack_item) as the safe threshold — this is portable to both 32-bit and 64-bit platforms without hardcoding any platform-specific constants.
Checking malloc's return value is not sufficient — the overflow produces a valid but undersized allocation; only a pre-allocation size check can prevent the corruption.
Initializing dfa_transitions = NULL before calling generate_transitions() prevents a separate use-of-uninitialized-pointer bug in cleanup paths triggered by the new early return.
Deeply nested alternation patterns like ((a|b|...|j){100}){100}... are the concrete exploit payload — any API that passes user-supplied regex strings to regex_compile() on a 32-bit build was vulnerable.

How Orbis AppSec Detected This

Source: The regex_string parameter passed to regex_compile() — a caller-controlled regex pattern that influences the computed dfa_size value during DFA construction.
Sink: SLJIT_MALLOC(sizeof(struct stack_item) * compiler_common->dfa_size, NULL) at regex_src/regexJIT.c:983 — an allocation whose size argument is the unchecked product of a user-influenced value.
Missing control: No overflow check on the multiplication sizeof(struct stack_item) * dfa_size before passing the result to the allocator; no upper bound validation on dfa_size during regex compilation.
CWE: CWE-120 — Buffer Copy without Checking Size of Input (heap buffer overflow via undersized allocation).
Fix: Added if (compiler_common->dfa_size > (~(sljit_uw)0) / sizeof(struct stack_item)) return REGEX_MEMORY_ERROR; immediately before the SLJIT_MALLOC call in generate_transitions().

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

Integer overflow in allocation size calculations is one of those bugs that looks harmless in isolation — a multiplication of two small-seeming numbers — but becomes critical when the inputs are attacker-controlled. In regexJIT.c, the dfa_size field is computed from regex pattern complexity, meaning any caller that accepts external regex input on a 32-bit platform was exposed to heap corruption.

The fix is a textbook two-line guard: check that dfa_size doesn't exceed MAX / sizeof(T) before multiplying, and return a clean error if it does. Paired with the NULL initialization of dfa_transitions, the fix closes both the overflow and the uninitialized-pointer cleanup risk.

For C developers working with JIT compilers, pattern matchers, or any code that sizes allocations based on parsed input: make the division-based overflow check a reflex, enable AddressSanitizer in your CI pipeline, and consider fuzzing any parser that feeds into a size calculation. The patterns that trigger these bugs are exactly the kind of edge cases that automated fuzzers find in minutes.

cwe	CWE-120
fix	Pre-allocation overflow guard using division-based check before `SLJIT_MALLOC` call in `generate_transitions()`
risk	Heap corruption, process crash, potential remote code execution via crafted regex input
language	C
root cause	Unchecked multiplication `sizeof(struct stack_item) * dfa_size` overflows on 32-bit platforms, producing an undersized heap allocation
vulnerability	Heap Buffer Overflow via Integer Overflow in DFA Allocation

How integer overflow in regexJIT.c heap allocation happens in C and how to fix it

Answer Summary

Vulnerability at a Glance

How Integer Overflow in regexJIT.c Heap Allocation Happens in C and How to Fix It

The Short Answer

Introduction

The Vulnerability Explained

What Goes Wrong at Line 983

What Makes dfa_size Attacker-Controlled?

The Attack Scenario

The Secondary Bug: Uninitialized Pointer

The Fix

Change 1: Pre-Allocation Overflow Guard in generate_transitions()

Change 2: Initialize dfa_transitions to NULL Before Calling generate_transitions()

Change 3: Regression Test in regexMain.c

Before vs. After Summary

Prevention & Best Practices

1. Always Use the Division Pattern for Allocation Size Checks

2. Use reallocarray() or calloc() Where Available

3. Initialize All Pointers at Declaration

4. Fuzz Regex Input

5. Enable Compiler Sanitizers During Development

6. Reference Standards

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is an integer overflow in heap allocation?

How do you prevent integer overflow in C heap allocations?

What CWE is integer overflow leading to buffer overflow?

Is checking the return value of malloc enough to prevent this vulnerability?

Can static analysis detect integer overflow in allocation size calculations?

View the Security Fix

Related Articles

How out-of-bounds reads happen in C gettext .mo file parsers and how to fix it

How buffer overflow in SMS response buffer handling happens in C and how to fix it

How kernel stack buffer overflow happens in C vsprintf() and how to fix it

How buffer overflow happens in C MCP protocol parsing and how to fix it

How buffer overflow happens in C kernel PTY subsystem (tty_ptmx.c) and how to fix it

How command injection happens in Python subprocess and how to fix it

What Makes `dfa_size` Attacker-Controlled?

Change 1: Pre-Allocation Overflow Guard in `generate_transitions()`

Change 2: Initialize `dfa_transitions` to `NULL` Before Calling `generate_transitions()`

Change 3: Regression Test in `regexMain.c`

2. Use `reallocarray()` or `calloc()` Where Available