Heap Buffer Overflow in md2html: How Integer Overflow Corrupts Memory
Introduction
If you've ever processed user-supplied Markdown content in a C-based application, this vulnerability should get your attention. A critical severity heap buffer overflow was identified and patched in md2html/md2html.c — a C library responsible for converting Markdown documents to HTML. The root cause? A deceptively simple arithmetic mistake: failing to check for integer overflow before calling realloc.
This type of bug, catalogued under CWE-120 (Buffer Copy without Checking Size of Input), has been responsible for some of the most severe exploits in computing history — from the classic gets() disasters to modern browser engine exploits. It's a reminder that in C, the programmer is the last line of defense against memory corruption.
Whether you're a systems programmer, a security engineer reviewing C codebases, or a developer who ships software that processes untrusted documents, understanding this vulnerability is essential.
The Vulnerability Explained
What Is a Heap Buffer Overflow?
A heap buffer overflow occurs when a program writes more data into a heap-allocated buffer than the buffer can hold. Unlike stack overflows (which corrupt return addresses and local variables), heap overflows corrupt adjacent heap metadata or other allocated objects — which can be just as dangerous, and often harder to detect.
The Vulnerable Code: membuf_grow
The vulnerability lives in the membuf_grow function in md2html/md2html.c at line 88. This function is responsible for dynamically growing a memory buffer to accommodate more data. Here's the conceptual shape of the vulnerable logic:
// VULNERABLE CODE (simplified for illustration)
static int membuf_grow(membuf_t *buf, size_t size) {
size_t new_asize = buf->asize + size; // ⚠️ No overflow check!
char *new_data = realloc(buf->data, new_asize);
if (new_data == NULL) {
return -1;
}
buf->data = new_data;
buf->asize = new_asize;
// Line 103: copies 'size' bytes into potentially undersized buffer
memcpy(buf->data + buf->size, source, size);
return 0;
}
The critical flaw is on the line computing new_asize:
size_t new_asize = buf->asize + size; // Integer overflow possible!
On a 64-bit system, size_t is an unsigned 64-bit integer. If buf->asize is very large and size is also large, their sum can wrap around to a small value — a phenomenon called unsigned integer overflow (or more precisely, unsigned integer wraparound, since it's defined behavior in C but still catastrophically wrong here).
How the Overflow Leads to Heap Corruption
Here's the chain of events in an attack:
- Attacker crafts a Markdown document designed to cause the buffer to grow to a size near
SIZE_MAX(the maximum value ofsize_t). membuf_growis called with asizeargument that, when added to the currentbuf->asize, wraps around to a small value (e.g.,0x10).realloc(buf->data, new_asize)is called with that small wrapped value, so it returns a tiny buffer — maybe just 16 bytes.memcpyat line 103 then copies the original (large)sizebytes into this tiny buffer, blasting past its boundaries and corrupting adjacent heap memory.
Before overflow:
buf->asize = 0xFFFFFFFFFFFFFFF0 (near SIZE_MAX)
size = 0x0000000000000020 (32 bytes)
new_asize = 0x0000000000000010 (wraps to 16!) ← BUG
realloc returns a 16-byte buffer.
memcpy writes 32 bytes into it.
💥 Heap corruption.
Real-World Impact
Depending on how the application uses the heap and what lives in adjacent memory, this vulnerability could enable:
- Denial of Service (DoS): The most reliable outcome — corrupting heap metadata causes a crash (
SIGABRT,SIGSEGV, or a malloc consistency check failure). - Arbitrary Code Execution: A skilled attacker can potentially shape the heap layout to overwrite function pointers, vtable entries, or other critical data, achieving full code execution.
- Information Disclosure: Heap corruption can sometimes cause the program to read and expose memory it shouldn't.
The attack surface is any code path that processes attacker-controlled Markdown input — which includes web servers, document converters, content management systems, and desktop applications that render user-provided content.
The Fix
What Changed
The fix adds an integer overflow check before computing new_asize, ensuring that if the addition would overflow, the function returns an error instead of proceeding with a corrupted size value.
Here is the corrected logic:
// FIXED CODE (simplified for illustration)
static int membuf_grow(membuf_t *buf, size_t size) {
// ✅ Check for integer overflow BEFORE the addition
if (size > SIZE_MAX - buf->asize) {
return -1; // Refuse to proceed — overflow would occur
}
size_t new_asize = buf->asize + size; // Now safe
char *new_data = realloc(buf->data, new_asize);
if (new_data == NULL) {
return -1;
}
buf->data = new_data;
buf->asize = new_asize;
memcpy(buf->data + buf->size, source, size);
return 0;
}
Why This Fix Works
The guard condition size > SIZE_MAX - buf->asize is the standard, correct idiom for detecting unsigned integer overflow in C before it happens:
SIZE_MAX - buf->asizecomputes the maximum value that can be safely added tobuf->asizewithout wrapping.- If
sizeexceeds that maximum, we know the addition would overflow, and we return an error immediately. - Because we check before the addition, the overflow never occurs, and
reallocis never called with a dangerously small size.
This is a minimal, surgical fix: no performance impact, no behavioral change for valid inputs, and complete protection against the overflow attack vector.
Why Not Use calloc or Other Alternatives?
Some might suggest restructuring the allocation entirely. That's a valid long-term improvement, but the overflow guard is the correct and direct fix for this specific bug. It follows the principle of least change — patching the exact flaw without introducing unintended side effects.
Prevention & Best Practices
This vulnerability is a textbook example of a class of bugs that C programmers must actively guard against. Here's how to prevent similar issues:
1. Always Check for Integer Overflow Before Arithmetic on Sizes
Never compute a buffer size without validating the inputs first. Use the standard idiom:
// Safe addition check for size_t
if (b > SIZE_MAX - a) {
// overflow would occur — handle error
}
size_t result = a + b;
// Safe multiplication check for size_t
if (a != 0 && b > SIZE_MAX / a) {
// overflow would occur — handle error
}
size_t result = a * b;
2. Use Safe Integer Libraries
For complex arithmetic on sizes, consider using safe integer libraries:
- SafeInt (C++) — throws on overflow
- CERT's safe integer library — C-compatible
- Compiler builtins: GCC and Clang provide
__builtin_add_overflow,__builtin_mul_overflow, etc.:
size_t new_asize;
if (__builtin_add_overflow(buf->asize, size, &new_asize)) {
return -1; // Overflow detected
}
3. Enable Compiler and Sanitizer Warnings
Modern tooling can catch these issues at development time:
# AddressSanitizer catches heap overflows at runtime
clang -fsanitize=address,undefined -o myapp myapp.c
# UndefinedBehaviorSanitizer catches integer issues
clang -fsanitize=integer -o myapp myapp.c
# Enable all warnings
gcc -Wall -Wextra -Wconversion -o myapp myapp.c
4. Use Static Analysis Tools
Integrate static analysis into your CI/CD pipeline:
- Coverity — excellent at finding integer overflow bugs
- CodeQL — GitHub's semantic code analysis engine
- Flawfinder — lightweight C/C++ security scanner
- PVS-Studio — commercial but powerful
5. Fuzz Test Your Parsers
Any code that processes untrusted input — especially document parsers — should be fuzz tested:
# Using AFL++ to fuzz a Markdown parser
afl-fuzz -i corpus/ -o findings/ -- ./md2html @@
Fuzzing is extraordinarily effective at finding exactly this type of bug, because it generates crafted inputs that exercise edge cases like near-SIZE_MAX sizes.
6. Consider Memory-Safe Alternatives
For new projects, consider languages with built-in memory safety:
- Rust — prevents buffer overflows and integer overflows at compile time (with debug builds panicking on overflow)
- Go — garbage collected with bounds checking
- Zig — explicit overflow handling with @addWithOverflow
If you must use C, follow the CERT C Coding Standard, particularly:
- INT30-C: Ensure that unsigned integer operations do not wrap
- MEM35-C: Allocate sufficient memory for an object
Security Standards References
| Standard | Reference |
|---|---|
| CWE | CWE-120: Buffer Copy without Checking Size of Input |
| CWE | CWE-190: Integer Overflow or Wraparound |
| OWASP | A03:2021 – Injection / Memory Corruption |
| CERT C | INT30-C |
Conclusion
The membuf_grow heap buffer overflow is a crisp illustration of why arithmetic on sizes in C demands explicit overflow checking. A single missing guard — if (size > SIZE_MAX - buf->asize) — was the difference between safe memory management and a critical, potentially exploitable heap corruption vulnerability.
Key takeaways:
- ✅ Always validate size arithmetic before calling
malloc,realloc, ormemcpy. - ✅ Use compiler sanitizers (
-fsanitize=address,undefined) during development and testing. - ✅ Fuzz your parsers — document parsers are a prime target for crafted-input attacks.
- ✅ Integrate static analysis into your CI pipeline to catch these issues before they ship.
- ✅ Follow CERT C and CWE guidance for integer and memory safety in C code.
The fix here was small — just a few lines — but its impact is enormous. This is the nature of memory safety bugs in C: the vulnerability is tiny, the consequences are not.
If your project processes untrusted Markdown, HTML, or any structured document format in C or C++, take this as an opportunity to audit your buffer growth and size calculation logic. The next crafted document in your input stream might be looking for exactly this kind of mistake.
This vulnerability was identified and patched by OrbisAI Security. Automated security scanning and remediation helps teams find and fix issues like this before they reach production.