Heap Buffer Overflow in md2html: How Integer Overflow Corrupts Memory

Introduction

If you've ever processed user-supplied Markdown content in a C-based application, this vulnerability should get your attention. A critical severity heap buffer overflow was identified and patched in md2html/md2html.c — a C library responsible for converting Markdown documents to HTML. The root cause? A deceptively simple arithmetic mistake: failing to check for integer overflow before calling realloc.

This type of bug, catalogued under CWE-120 (Buffer Copy without Checking Size of Input), has been responsible for some of the most severe exploits in computing history — from the classic gets() disasters to modern browser engine exploits. It's a reminder that in C, the programmer is the last line of defense against memory corruption.

Whether you're a systems programmer, a security engineer reviewing C codebases, or a developer who ships software that processes untrusted documents, understanding this vulnerability is essential.

The Vulnerability Explained

What Is a Heap Buffer Overflow?

A heap buffer overflow occurs when a program writes more data into a heap-allocated buffer than the buffer can hold. Unlike stack overflows (which corrupt return addresses and local variables), heap overflows corrupt adjacent heap metadata or other allocated objects — which can be just as dangerous, and often harder to detect.

The Vulnerable Code: `membuf_grow`

The vulnerability lives in the membuf_grow function in md2html/md2html.c at line 88. This function is responsible for dynamically growing a memory buffer to accommodate more data. Here's the conceptual shape of the vulnerable logic:

// VULNERABLE CODE (simplified for illustration)
static int membuf_grow(membuf_t *buf, size_t size) {
    size_t new_asize = buf->asize + size;  // ⚠️ No overflow check!

    char *new_data = realloc(buf->data, new_asize);
    if (new_data == NULL) {
        return -1;
    }

    buf->data  = new_data;
    buf->asize = new_asize;

    // Line 103: copies 'size' bytes into potentially undersized buffer
    memcpy(buf->data + buf->size, source, size);
    return 0;
}

The critical flaw is on the line computing new_asize:

size_t new_asize = buf->asize + size;  // Integer overflow possible!

On a 64-bit system, size_t is an unsigned 64-bit integer. If buf->asize is very large and size is also large, their sum can wrap around to a small value — a phenomenon called unsigned integer overflow (or more precisely, unsigned integer wraparound, since it's defined behavior in C but still catastrophically wrong here).

How the Overflow Leads to Heap Corruption

Here's the chain of events in an attack:

Attacker crafts a Markdown document designed to cause the buffer to grow to a size near SIZE_MAX (the maximum value of size_t).
membuf_grow is called with a size argument that, when added to the current buf->asize, wraps around to a small value (e.g., 0x10).
realloc(buf->data, new_asize) is called with that small wrapped value, so it returns a tiny buffer — maybe just 16 bytes.
memcpy at line 103 then copies the original (large) size bytes into this tiny buffer, blasting past its boundaries and corrupting adjacent heap memory.

Before overflow:
  buf->asize = 0xFFFFFFFFFFFFFFF0  (near SIZE_MAX)
  size       = 0x0000000000000020  (32 bytes)
  new_asize  = 0x0000000000000010  (wraps to 16!) ← BUG

realloc returns a 16-byte buffer.
memcpy writes 32 bytes into it.
💥 Heap corruption.

Real-World Impact

Depending on how the application uses the heap and what lives in adjacent memory, this vulnerability could enable:

Denial of Service (DoS): The most reliable outcome — corrupting heap metadata causes a crash (SIGABRT, SIGSEGV, or a malloc consistency check failure).
Arbitrary Code Execution: A skilled attacker can potentially shape the heap layout to overwrite function pointers, vtable entries, or other critical data, achieving full code execution.
Information Disclosure: Heap corruption can sometimes cause the program to read and expose memory it shouldn't.

The attack surface is any code path that processes attacker-controlled Markdown input — which includes web servers, document converters, content management systems, and desktop applications that render user-provided content.

The Fix

What Changed

The fix adds an integer overflow check before computing new_asize, ensuring that if the addition would overflow, the function returns an error instead of proceeding with a corrupted size value.

Here is the corrected logic:

// FIXED CODE (simplified for illustration)
static int membuf_grow(membuf_t *buf, size_t size) {
    // ✅ Check for integer overflow BEFORE the addition
    if (size > SIZE_MAX - buf->asize) {
        return -1;  // Refuse to proceed — overflow would occur
    }

    size_t new_asize = buf->asize + size;  // Now safe

    char *new_data = realloc(buf->data, new_asize);
    if (new_data == NULL) {
        return -1;
    }

    buf->data  = new_data;
    buf->asize = new_asize;

    memcpy(buf->data + buf->size, source, size);
    return 0;
}

Why This Fix Works

The guard condition size > SIZE_MAX - buf->asize is the standard, correct idiom for detecting unsigned integer overflow in C before it happens:

SIZE_MAX - buf->asize computes the maximum value that can be safely added to buf->asize without wrapping.
If size exceeds that maximum, we know the addition would overflow, and we return an error immediately.
Because we check before the addition, the overflow never occurs, and realloc is never called with a dangerously small size.

This is a minimal, surgical fix: no performance impact, no behavioral change for valid inputs, and complete protection against the overflow attack vector.

Why Not Use `calloc` or Other Alternatives?

Some might suggest restructuring the allocation entirely. That's a valid long-term improvement, but the overflow guard is the correct and direct fix for this specific bug. It follows the principle of least change — patching the exact flaw without introducing unintended side effects.

Prevention & Best Practices

This vulnerability is a textbook example of a class of bugs that C programmers must actively guard against. Here's how to prevent similar issues:

1. Always Check for Integer Overflow Before Arithmetic on Sizes

Never compute a buffer size without validating the inputs first. Use the standard idiom:

// Safe addition check for size_t
if (b > SIZE_MAX - a) {
    // overflow would occur — handle error
}
size_t result = a + b;

// Safe multiplication check for size_t
if (a != 0 && b > SIZE_MAX / a) {
    // overflow would occur — handle error
}
size_t result = a * b;

2. Use Safe Integer Libraries

For complex arithmetic on sizes, consider using safe integer libraries:

SafeInt (C++) — throws on overflow
CERT's safe integer library — C-compatible
Compiler builtins: GCC and Clang provide __builtin_add_overflow, __builtin_mul_overflow, etc.:

size_t new_asize;
if (__builtin_add_overflow(buf->asize, size, &new_asize)) {
    return -1;  // Overflow detected
}

3. Enable Compiler and Sanitizer Warnings

Modern tooling can catch these issues at development time:

# AddressSanitizer catches heap overflows at runtime
clang -fsanitize=address,undefined -o myapp myapp.c

# UndefinedBehaviorSanitizer catches integer issues
clang -fsanitize=integer -o myapp myapp.c

# Enable all warnings
gcc -Wall -Wextra -Wconversion -o myapp myapp.c

4. Use Static Analysis Tools

Integrate static analysis into your CI/CD pipeline:

Coverity — excellent at finding integer overflow bugs
CodeQL — GitHub's semantic code analysis engine
Flawfinder — lightweight C/C++ security scanner
PVS-Studio — commercial but powerful

5. Fuzz Test Your Parsers

Any code that processes untrusted input — especially document parsers — should be fuzz tested:

# Using AFL++ to fuzz a Markdown parser
afl-fuzz -i corpus/ -o findings/ -- ./md2html @@

Fuzzing is extraordinarily effective at finding exactly this type of bug, because it generates crafted inputs that exercise edge cases like near-SIZE_MAX sizes.

6. Consider Memory-Safe Alternatives

For new projects, consider languages with built-in memory safety:
- Rust — prevents buffer overflows and integer overflows at compile time (with debug builds panicking on overflow)
- Go — garbage collected with bounds checking
- Zig — explicit overflow handling with @addWithOverflow

If you must use C, follow the CERT C Coding Standard, particularly:
- INT30-C: Ensure that unsigned integer operations do not wrap
- MEM35-C: Allocate sufficient memory for an object

Security Standards References

Standard	Reference
CWE	CWE-120: Buffer Copy without Checking Size of Input
CWE	CWE-190: Integer Overflow or Wraparound
OWASP	A03:2021 – Injection / Memory Corruption
CERT C	INT30-C

Conclusion

The membuf_grow heap buffer overflow is a crisp illustration of why arithmetic on sizes in C demands explicit overflow checking. A single missing guard — if (size > SIZE_MAX - buf->asize) — was the difference between safe memory management and a critical, potentially exploitable heap corruption vulnerability.

Key takeaways:

✅ Always validate size arithmetic before calling malloc, realloc, or memcpy.
✅ Use compiler sanitizers (-fsanitize=address,undefined) during development and testing.
✅ Fuzz your parsers — document parsers are a prime target for crafted-input attacks.
✅ Integrate static analysis into your CI pipeline to catch these issues before they ship.
✅ Follow CERT C and CWE guidance for integer and memory safety in C code.

The fix here was small — just a few lines — but its impact is enormous. This is the nature of memory safety bugs in C: the vulnerability is tiny, the consequences are not.

If your project processes untrusted Markdown, HTML, or any structured document format in C or C++, take this as an opportunity to audit your buffer growth and size calculation logic. The next crafted document in your input stream might be looking for exactly this kind of mistake.

This vulnerability was identified and patched by OrbisAI Security. Automated security scanning and remediation helps teams find and fix issues like this before they reach production.

Heap Buffer Overflow in md2html: How Integer Overflow Corrupts Memory

Heap Buffer Overflow in md2html: How Integer Overflow Corrupts Memory

Introduction

The Vulnerability Explained

What Is a Heap Buffer Overflow?

The Vulnerable Code: `membuf_grow`

How the Overflow Leads to Heap Corruption

Real-World Impact

The Fix

What Changed

Why This Fix Works

Why Not Use `calloc` or Other Alternatives?

Prevention & Best Practices

1. Always Check for Integer Overflow Before Arithmetic on Sizes

2. Use Safe Integer Libraries

3. Enable Compiler and Sanitizer Warnings

4. Use Static Analysis Tools

5. Fuzz Test Your Parsers

6. Consider Memory-Safe Alternatives

Security Standards References

Conclusion

View the Security Fix

Related Articles

Code Injection via eval(): How a Critical Python Flaw Was Fixed

Heap Buffer Overflow in Dubbo Module: When memcpy Goes Wrong

Heap Buffer Overflow in Wayland Mesh Gradient: How a Missing Bounds Check Nearly Enabled Arbitrary Code Execution

Heap Buffer Overflow in md2html: How Integer Overflow Corrupts Memory

Heap Buffer Overflow in md2html: How Integer Overflow Corrupts Memory

Introduction

The Vulnerability Explained

What Is a Heap Buffer Overflow?

The Vulnerable Code: membuf_grow

How the Overflow Leads to Heap Corruption

Real-World Impact

The Fix

What Changed

Why This Fix Works

Why Not Use calloc or Other Alternatives?

Prevention & Best Practices

1. Always Check for Integer Overflow Before Arithmetic on Sizes

2. Use Safe Integer Libraries

3. Enable Compiler and Sanitizer Warnings

4. Use Static Analysis Tools

5. Fuzz Test Your Parsers

6. Consider Memory-Safe Alternatives

Security Standards References

Conclusion

View the Security Fix

Related Articles

Code Injection via eval(): How a Critical Python Flaw Was Fixed

Heap Buffer Overflow in Dubbo Module: When memcpy Goes Wrong

Heap Buffer Overflow in Wayland Mesh Gradient: How a Missing Bounds Check Nearly Enabled Arbitrary Code Execution

The Vulnerable Code: `membuf_grow`

Why Not Use `calloc` or Other Alternatives?