Back to Blog
critical SEVERITY9 min read

How integer overflow in regexJIT.c heap allocation happens in C and how to fix it

A critical integer overflow vulnerability in `regex_src/regexJIT.c` allowed crafted regex patterns to trigger a heap buffer overflow by causing an unchecked multiplication of `sizeof(struct stack_item) * dfa_size` to wrap around on 32-bit platforms, resulting in an undersized allocation. The fix adds a pre-allocation overflow guard that returns `REGEX_MEMORY_ERROR` before any dangerous write can occur. Left unpatched, this vulnerability could be exploited to corrupt heap memory, crash the proces

O
By Orbis AppSec
Published June 18, 2026Reviewed June 18, 2026

Answer Summary

This is a heap buffer overflow vulnerability (CWE-120) in the C regex JIT compiler file `regex_src/regexJIT.c`, caused by an integer overflow in the multiplication `sizeof(struct stack_item) * dfa_size` at line 983. On 32-bit platforms, a sufficiently large `dfa_size` derived from a complex regex pattern causes the multiplication to wrap around, producing a small allocation that is then overwritten by subsequent writes. The fix adds a pre-allocation bounds check — `if (compiler_common->dfa_size > (~(sljit_uw)0) / sizeof(struct stack_item)) return REGEX_MEMORY_ERROR;` — that safely rejects overflow-inducing inputs before any memory is allocated.

Vulnerability at a Glance

cweCWE-120
fixPre-allocation overflow guard using division-based check before `SLJIT_MALLOC` call in `generate_transitions()`
riskHeap corruption, process crash, potential remote code execution via crafted regex input
languageC
root causeUnchecked multiplication `sizeof(struct stack_item) * dfa_size` overflows on 32-bit platforms, producing an undersized heap allocation
vulnerabilityHeap Buffer Overflow via Integer Overflow in DFA Allocation

How Integer Overflow in regexJIT.c Heap Allocation Happens in C and How to Fix It


The Short Answer

This is a heap buffer overflow (CWE-120) in regex_src/regexJIT.c caused by an integer overflow in sizeof(struct stack_item) * dfa_size. On 32-bit platforms, a large dfa_size wraps the multiplication to a small value, so SLJIT_MALLOC returns an undersized buffer. Subsequent writes overflow that buffer. The fix adds a division-based pre-check that returns REGEX_MEMORY_ERROR before any allocation is attempted.


Introduction

The regex_src/regexJIT.c file is the heart of a JIT-compiled regex engine — it takes a compiled DFA (Deterministic Finite Automaton) and generates native machine code for fast pattern matching. Inside generate_transitions(), the compiler allocates a contiguous array of struct stack_item objects sized to hold every DFA state. A flaw at line 983 of that function means a sufficiently complex regex pattern can cause the size calculation to silently overflow, handing SLJIT_MALLOC a number far smaller than needed.

This matters for any developer working with C-level memory allocation: the bug pattern — multiplying a user-influenced count by a struct size without overflow checking — is one of the most common sources of heap corruption in systems software.


The Vulnerability Explained

What Goes Wrong at Line 983

Inside generate_transitions(), the original code allocates the DFA transition table like this:

// regex_src/regexJIT.c — BEFORE fix (line 983)
compiler_common->dfa_transitions = (struct stack_item *)SLJIT_MALLOC(
    sizeof(struct stack_item) * compiler_common->dfa_size,
    NULL
);
if (!compiler_common->dfa_transitions)
    return REGEX_MEMORY_ERROR;

The problem is the expression sizeof(struct stack_item) * compiler_common->dfa_size. Both operands are sljit_uw (an unsigned word-sized integer). On a 32-bit platform, sljit_uw is 32 bits wide, with a maximum value of 0xFFFFFFFF (4,294,967,295).

If dfa_size is large enough — say, derived from a deeply nested alternation pattern — the multiplication wraps around modulo 2³², producing a small number. For example:

sizeof(struct stack_item) = 16 bytes
dfa_size = 0x10000001  (268,435,457)

16 * 268,435,457 = 4,294,967,312  →  overflows to 16 (on 32-bit)

SLJIT_MALLOC(16, NULL) succeeds and returns a 16-byte buffer. The code then tries to write dfa_size entries into it — hundreds of millions of writes into a 16-byte allocation. The heap is immediately corrupted.

What Makes dfa_size Attacker-Controlled?

dfa_size is derived from parsing the regex pattern itself. The PR's regression test illustrates the attack surface directly:

// Deeply nested alternation — maximizes dfa_size
"((((((((((a|b|c|d|e|f|g|h|i|j){100}){100}){100}){100}){10}){10}){10}){5}){5}){5}"

// Large repeated group — stresses dfa_size calculation  
"(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z){65535}"

Any caller of regex_compile() that accepts a regex string from an external source — a configuration file, a network protocol, a user-supplied search field — is a potential attack vector.

The Attack Scenario

  1. An attacker submits a crafted regex pattern (e.g., the deeply nested alternation above) to any API that calls regex_compile().
  2. The DFA construction phase computes a large dfa_size.
  3. generate_transitions() multiplies sizeof(struct stack_item) * dfa_size, which overflows to a small value on a 32-bit build.
  4. SLJIT_MALLOC allocates a tiny buffer and returns a valid pointer.
  5. The subsequent loop writes DFA transition data far beyond the end of the buffer, corrupting adjacent heap objects.
  6. Depending on heap layout and what gets corrupted, this can cause: a crash (denial of service), data corruption, or — with careful heap grooming — arbitrary code execution.

The Secondary Bug: Uninitialized Pointer

The diff also reveals a secondary issue in regex_compile():

// BEFORE fix — compiler_common.dfa_transitions is uninitialized
error_code = generate_transitions(&compiler_common);
stack_destroy(&compiler_common.stack);
stack_destroy(&compiler_common.depth);

If generate_transitions() returns early (e.g., due to the overflow check added by the fix), any cleanup code that conditionally frees dfa_transitions could dereference an uninitialized pointer. The fix initializes it to NULL first, so free(NULL) is always safe.


The Fix

Change 1: Pre-Allocation Overflow Guard in generate_transitions()

// regex_src/regexJIT.c — AFTER fix
stack_init(depth);
if (compiler_common->dfa_size > (~(sljit_uw)0) / sizeof(struct stack_item))
    return REGEX_MEMORY_ERROR;
compiler_common->dfa_transitions = (struct stack_item *)SLJIT_MALLOC(
    sizeof(struct stack_item) * compiler_common->dfa_size,
    NULL
);
if (!compiler_common->dfa_transitions)
    return REGEX_MEMORY_ERROR;

The guard condition compiler_common->dfa_size > (~(sljit_uw)0) / sizeof(struct stack_item) deserves a close look:

  • (~(sljit_uw)0) is the maximum value of sljit_uw on any platform — all bits set, regardless of whether it's 32 or 64 bits. This avoids hardcoding UINT32_MAX or SIZE_MAX.
  • Dividing by sizeof(struct stack_item) gives the maximum safe element count before multiplication would overflow.
  • If dfa_size exceeds that threshold, the function returns REGEX_MEMORY_ERROR immediately — no allocation, no corruption.

This is the canonical safe pattern for multiplication overflow checks in C. It works because division is the inverse of multiplication: if a > MAX / b, then a * b > MAX.

Change 2: Initialize dfa_transitions to NULL Before Calling generate_transitions()

// regex_src/regexJIT.c — AFTER fix (in regex_compile)
compiler_common.dfa_transitions = NULL;   // ← added
error_code = generate_transitions(&compiler_common);

This one-line change ensures that if generate_transitions() returns early (including via the new overflow guard), any subsequent code that frees compiler_common.dfa_transitions will safely call free(NULL) rather than freeing a garbage pointer.

Change 3: Regression Test in regexMain.c

// regex_src/regexMain.c — added regression test
{
    struct regex_machine *machine;
    int err = REGEX_NO_ERROR;
    machine = regex_compile(
        "(a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z){1000}",
        64, 0, &err);
    if (machine)
        regex_free_machine(machine);
    else if (err != REGEX_MEMORY_ERROR) {
        printf("FAIL: overflow regression returned unexpected error %d\n", err);
        return 1;
    }
}

This test is embedded directly in the main test runner. It verifies that the overflow-inducing pattern either compiles successfully (acceptable on 64-bit where no overflow occurs) or fails with exactly REGEX_MEMORY_ERROR — not a crash, not a silent heap corruption, not any other error code.

Before vs. After Summary

Aspect Before After
Overflow check None Division-based guard before SLJIT_MALLOC
dfa_transitions init Uninitialized Set to NULL before generate_transitions()
Oversized pattern Heap corruption Returns REGEX_MEMORY_ERROR
Regression coverage None Inline test + standalone check test

Prevention & Best Practices

1. Always Use the Division Pattern for Allocation Size Checks

// ✅ Safe: check before multiplying
if (count > SIZE_MAX / sizeof(T)) {
    return ERROR_OVERFLOW;
}
T *buf = malloc(count * sizeof(T));

// ❌ Unsafe: multiply first, check after
size_t total = count * sizeof(T);
if (total < count) { /* too late — UB may have already occurred */ }

The division check is portable, branchless-friendly, and works for any integer width.

2. Use reallocarray() or calloc() Where Available

On modern Linux/BSD systems, reallocarray(ptr, nmemb, size) performs the overflow check internally:

// ✅ Overflow-safe on glibc 2.26+, OpenBSD, macOS
compiler_common->dfa_transitions = reallocarray(
    NULL,
    compiler_common->dfa_size,
    sizeof(struct stack_item)
);

calloc(nmemb, size) also performs an internal overflow check on most implementations. However, since this code targets portability across platforms (it's part of an SLJIT abstraction layer), the explicit guard is the right choice here.

3. Initialize All Pointers at Declaration

// ✅ Always initialize struct members before use
struct compiler_common compiler_common;
memset(&compiler_common, 0, sizeof(compiler_common));
// or explicitly:
compiler_common.dfa_transitions = NULL;

Uninitialized pointers in structs that are conditionally freed are a persistent source of undefined behavior in C.

4. Fuzz Regex Input

Regex engines are classic fuzzing targets. Tools like AFL++ and libFuzzer can generate the exact kind of deeply nested alternation patterns that triggered this bug:

# Example: fuzz regex_compile() with AFL++
afl-fuzz -i seeds/ -o findings/ -- ./regex_fuzz_harness @@

5. Enable Compiler Sanitizers During Development

# AddressSanitizer catches heap overflows immediately
gcc -fsanitize=address,undefined -g regex_src/regexJIT.c

# UBSan specifically catches integer overflow
gcc -fsanitize=undefined -fsanitize=integer-overflow -g ...

6. Reference Standards


Key Takeaways

  • The multiplication sizeof(struct stack_item) * dfa_size in generate_transitions() was the exact dangerous pattern — not a generic "unchecked input" issue, but a specific arithmetic operation on a compiler-internal size variable derived from regex complexity.
  • The fix uses (~(sljit_uw)0) / sizeof(struct stack_item) as the safe threshold — this is portable to both 32-bit and 64-bit platforms without hardcoding any platform-specific constants.
  • Checking malloc's return value is not sufficient — the overflow produces a valid but undersized allocation; only a pre-allocation size check can prevent the corruption.
  • Initializing dfa_transitions = NULL before calling generate_transitions() prevents a separate use-of-uninitialized-pointer bug in cleanup paths triggered by the new early return.
  • Deeply nested alternation patterns like ((a|b|...|j){100}){100}... are the concrete exploit payload — any API that passes user-supplied regex strings to regex_compile() on a 32-bit build was vulnerable.

How Orbis AppSec Detected This

  • Source: The regex_string parameter passed to regex_compile() — a caller-controlled regex pattern that influences the computed dfa_size value during DFA construction.
  • Sink: SLJIT_MALLOC(sizeof(struct stack_item) * compiler_common->dfa_size, NULL) at regex_src/regexJIT.c:983 — an allocation whose size argument is the unchecked product of a user-influenced value.
  • Missing control: No overflow check on the multiplication sizeof(struct stack_item) * dfa_size before passing the result to the allocator; no upper bound validation on dfa_size during regex compilation.
  • CWE: CWE-120 — Buffer Copy without Checking Size of Input (heap buffer overflow via undersized allocation).
  • Fix: Added if (compiler_common->dfa_size > (~(sljit_uw)0) / sizeof(struct stack_item)) return REGEX_MEMORY_ERROR; immediately before the SLJIT_MALLOC call in generate_transitions().

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.


Conclusion

Integer overflow in allocation size calculations is one of those bugs that looks harmless in isolation — a multiplication of two small-seeming numbers — but becomes critical when the inputs are attacker-controlled. In regexJIT.c, the dfa_size field is computed from regex pattern complexity, meaning any caller that accepts external regex input on a 32-bit platform was exposed to heap corruption.

The fix is a textbook two-line guard: check that dfa_size doesn't exceed MAX / sizeof(T) before multiplying, and return a clean error if it does. Paired with the NULL initialization of dfa_transitions, the fix closes both the overflow and the uninitialized-pointer cleanup risk.

For C developers working with JIT compilers, pattern matchers, or any code that sizes allocations based on parsed input: make the division-based overflow check a reflex, enable AddressSanitizer in your CI pipeline, and consider fuzzing any parser that feeds into a size calculation. The patterns that trigger these bugs are exactly the kind of edge cases that automated fuzzers find in minutes.


References

Frequently Asked Questions

What is an integer overflow in heap allocation?

An integer overflow in heap allocation occurs when a size calculation (such as multiplying element count by element size) exceeds the maximum value of the integer type, wrapping around to a small number. The allocator then returns a buffer too small for the intended data, and subsequent writes go beyond the allocation boundary — a classic heap buffer overflow.

How do you prevent integer overflow in C heap allocations?

Always validate size arguments before calling malloc or equivalent. The safest pattern is a division-based pre-check: `if (count > SIZE_MAX / sizeof(T)) { /* handle error */ }`. This avoids the overflow entirely rather than trying to detect it after the fact.

What CWE is integer overflow leading to buffer overflow?

The resulting condition is classified as CWE-120 (Buffer Copy without Checking Size of Input), though the root arithmetic issue is also related to CWE-190 (Integer Overflow or Wraparound). Together they describe the full attack chain: overflow in size math → undersized allocation → out-of-bounds write.

Is checking the return value of malloc enough to prevent this vulnerability?

No. The overflow happens before malloc is called — the corrupted size is passed to the allocator, which successfully returns a small (but valid) buffer. Checking for a NULL return only catches allocation failures, not the case where malloc succeeds but returns an undersized buffer.

Can static analysis detect integer overflow in allocation size calculations?

Yes. Tools like Coverity, CodeQL, and AI-powered scanners (such as Orbis AppSec) can trace tainted size values through arithmetic operations and flag unchecked multiplications used as allocation arguments. This specific vulnerability was detected by the `multi_agent_ai` scanner rule `V-002`.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #361

Related Articles

critical

How out-of-bounds reads happen in C gettext .mo file parsers and how to fix it

A missing bounds check in the gettext `.mo` file parser inside `compose/asc-utils-l10n.c` allowed a malformed or truncated file to trigger out-of-bounds reads from heap memory. The vulnerability affected two distinct read sites — a `memcpy` of the full `AscLocaleGettextHeader` struct at line 131 and a 4-byte offset read at line 224 — neither of which validated that the source buffer was large enough. The fix adds explicit size checks before both reads, rejecting invalid files with a descriptive

critical

How buffer overflow in SMS response buffer handling happens in C and how to fix it

A critical buffer overflow vulnerability was discovered in `sm_at_sms.c`, where three consecutive unsafe string operations — `sprintf()`, `strcpy()`, and `strcat()` — wrote SMS payload data into a fixed-size buffer without any bounds checking. An attacker capable of crafting an oversized SMS message could overflow `sms_ctx.concat_rsp_buf`, corrupting adjacent stack or heap memory. The fix replaces all three unsafe calls with their bounds-aware counterparts: `snprintf()` and `strcat_s()`.

critical

How kernel stack buffer overflow happens in C vsprintf() and how to fix it

A critical stack buffer overflow vulnerability was discovered in `sys/kern/debug.c` where the kernel's `printf()` function called a custom `vsprintf()` implementation without any length constraint on the output buffer `db_msg`. By replacing the unbounded `vsprintf()` call with a size-aware `vsnprintf()` implementation, the fix prevents crafted format strings or oversized arguments from overwriting kernel stack memory, closing a path to arbitrary kernel code execution.

critical

How buffer overflow happens in C MCP protocol parsing and how to fix it

A critical buffer overflow vulnerability (CWE-120) was discovered in the `mcp_frame_process_input()` function in `src/mcp.c` at line 1384. The function used unsafe `strncpy()` calls to copy network-sourced MCP protocol messages into fixed-size buffers without proper bounds checking, allowing remote attackers to overflow the buffer and potentially execute arbitrary code. The fix replaced all `strncpy()` calls with `snprintf()` and added a buffer size validation check.

medium

How buffer overflow happens in C kernel PTY subsystem (tty_ptmx.c) and how to fix it

A stack buffer overflow vulnerability was discovered in `tty_ptmx.c`, the kernel-level pseudo-terminal multiplexer component, where an unchecked `sprintf()` call at line 293 could overflow the `device_name` buffer by combining `root_path` and `dev_rel_path` without bounds validation. Because this code executes in kernel context during PTY device creation, successful exploitation could lead to kernel memory corruption, privilege escalation, or system crashes. The fix replaces the unbounded `sprin

critical

How command injection happens in Python subprocess and how to fix it

A command injection vulnerability in `skills/skill-comply/scripts/runner.py` allowed attackers who could influence skill definition files to execute arbitrary binaries on the host system via `subprocess.run()`. The fix introduces an explicit allowlist of permitted executables (`ALLOWED_SETUP_EXECUTABLES`) that gates every command before it reaches the subprocess call at line 110. This closes a significant attack surface in the skill-comply pipeline without breaking legitimate setup workflows.