Introduction
In QuickJS's regular expression library, we discovered a high-severity integer overflow vulnerability at line 3429 of libregexp.c. The main() function's test harness allocated memory for regex capture groups using an unsafe multiplication pattern that could be exploited through specially crafted regular expressions.
The vulnerable code computed allocation size by multiplying sizeof(capture[0]) by the return value of lre_get_alloc_count(bc):
capture = malloc(sizeof(capture[0]) * lre_get_alloc_count(bc));
When lre_get_alloc_count(bc) returns a sufficiently large value—controlled by the complexity of the compiled regex bytecode—this multiplication can overflow, wrapping around to a small number. The result? A tiny buffer is allocated, but subsequent regex execution writes capture group data as if the full size were available, corrupting the heap.
The Vulnerability Explained
How Integer Overflow Leads to Heap Corruption
In C, arithmetic operations on integers have no built-in overflow protection. When you multiply two values and the result exceeds the maximum representable value for that type, the result "wraps around" to a small number due to modular arithmetic.
Consider the vulnerable line:
capture = malloc(sizeof(capture[0]) * lre_get_alloc_count(bc));
Let's say sizeof(capture[0]) is 16 bytes (typical for a capture group structure containing two pointers). If an attacker crafts a regex with enough capture groups such that lre_get_alloc_count(bc) returns a value like 0x1000000000000001 on a 64-bit system, the multiplication becomes:
16 * 0x1000000000000001 = 0x10000000000000010
This exceeds 64 bits, so it wraps to just 0x10 (16 bytes). The malloc() call allocates only 16 bytes, but lre_exec() then attempts to write capture data for billions of groups into this tiny buffer.
Attack Scenario Specific to libregexp.c
An attacker targeting this vulnerability would:
- Craft a malicious regex pattern with an enormous number of capture groups—the test case in the PR demonstrates this with thousands of nested parentheses
(((((...))))) - Compile the regex through QuickJS's regex compilation, which generates bytecode where
lre_get_alloc_count()returns a massive value - Trigger the allocation in the test harness's
main()function, causing the overflow - Execute the regex against any input, causing
lre_exec()to write beyond the allocated buffer
The heap overflow enables:
- Arbitrary code execution by overwriting function pointers or vtables in adjacent heap objects
- Information disclosure by corrupting heap metadata to leak memory contents
- Denial of service by crashing the application through heap corruption
The Fix
The fix replaces the unsafe malloc() with multiplication pattern with calloc():
Before (Vulnerable)
capture = malloc(sizeof(capture[0]) * lre_get_alloc_count(bc));
After (Fixed)
capture = calloc(lre_get_alloc_count(bc), sizeof(capture[0]));
Why calloc() Solves This Problem
The calloc() function takes two arguments—the number of elements and the size of each element—and performs the multiplication internally with overflow checking. Per the C standard and common implementations:
-
calloc() checks for overflow before allocating. If
nmemb * sizewould overflow,calloc()returnsNULLinstead of allocating an undersized buffer. -
Zero-initialization provides defense in depth. Even if there were edge cases, the buffer is zeroed, preventing information leakage from uninitialized memory.
-
Semantic clarity makes the intent obvious—we're allocating an array of
lre_get_alloc_count(bc)elements, each of sizesizeof(capture[0]).
The fix at line 3429 ensures that when an attacker provides a regex designed to trigger overflow, the allocation fails safely with a NULL return rather than succeeding with a corrupted size.
Prevention & Best Practices
Safe Allocation Patterns in C
Always prefer calloc() for arrays:
// UNSAFE
ptr = malloc(count * element_size);
// SAFE
ptr = calloc(count, element_size);
When malloc() is required, check for overflow explicitly:
// Safe multiplication check
if (count > 0 && element_size > SIZE_MAX / count) {
// Overflow would occur
return NULL;
}
ptr = malloc(count * element_size);
Use compiler built-ins when available:
size_t total;
if (__builtin_mul_overflow(count, element_size, &total)) {
return NULL;
}
ptr = malloc(total);
Static Analysis Integration
Configure your CI/CD pipeline to flag dangerous allocation patterns:
- Semgrep rules can detect malloc() calls with multiplication in the argument
- Compiler warnings like -Walloc-size-larger-than catch some cases
- Memory sanitizers (ASan) detect the resulting heap overflow at runtime
Code Review Checklist
When reviewing C code that handles dynamic allocation:
- [ ] Is the allocation size computed safely?
- [ ] Could any input influence the size calculation?
- [ ] Is calloc() used for array allocations?
- [ ] Is the return value checked for NULL?
Key Takeaways
- Never use
malloc(a * b)for array allocation—the multiplication can overflow silently, andcalloc(a, b)handles this safely - The
lre_get_alloc_count()return value is influenced by regex complexity, making this a user-controllable attack vector in any application that compiles untrusted regexes - QuickJS's test harness in
main()was vulnerable, demonstrating that even test code can have security implications if it processes untrusted input - calloc() provides two protections: overflow-checked multiplication AND zero-initialization
- Regex engines are high-value targets because they process complex, attacker-controlled input—extra scrutiny on memory operations is essential
How Orbis AppSec Detected This
- Source: The
bc(bytecode) parameter passed tolre_get_alloc_count(), derived from compiling a user-provided regex pattern viaargv[1] - Sink:
malloc(sizeof(capture[0]) * lre_get_alloc_count(bc))atquickjs/libregexp.c:3429 - Missing control: No overflow check on the multiplication before allocation
- CWE: CWE-190 (Integer Overflow or Wraparound) leading to CWE-122 (Heap-based Buffer Overflow)
- Fix: Replaced
malloc()withcalloc()which performs overflow-checked multiplication internally
Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.
Conclusion
Integer overflow vulnerabilities in memory allocation are among the most dangerous bugs in C programs. They're subtle—the code looks correct at first glance—but can lead to complete system compromise through heap corruption.
The fix in QuickJS's libregexp.c demonstrates the simplest and most effective mitigation: use calloc() instead of malloc() with multiplication. This single-line change transforms a high-severity vulnerability into a safe allocation that fails gracefully when given malicious input.
When working with C code that handles untrusted input—especially complex parsers like regex engines—always assume that any size calculation could be manipulated. Design your allocation strategy to fail safely rather than corrupt memory silently.