Back to Blog
critical SEVERITY7 min read

Heap Overflow in TOML Parser: How Integer Overflow Leads to Memory Corruption

A critical heap buffer overflow vulnerability was discovered and patched in the centitoml TOML parser, where missing integer overflow validation on a `MALLOC(len+1)` call could allow an attacker to trigger memory corruption via a crafted TOML configuration file. The vulnerability (CWE-190) is reachable through community-distributed mod or map files that the game loads from its `config/` directory, making it a realistic attack vector for remote code execution. A targeted one-line guard now preven

O
By orbisai0security
May 28, 2026

Heap Overflow in TOML Parser: How Integer Overflow Leads to Memory Corruption

Introduction

Configuration file parsers are everywhere — they quietly read your settings, load your preferences, and initialize your application. Because they sit at the boundary between untrusted data and trusted program logic, they are a historically rich target for attackers. When that parser is written in C and handles attacker-controlled input without careful bounds checking, the consequences can be severe.

This post breaks down a critical heap buffer overflow (CWE-190) discovered and patched in deps/centitoml/toml_api.c, a C-based TOML parser embedded in a game engine. The root cause is a classic integer overflow that produces an undersized heap allocation, followed by a memcpy that happily writes beyond the buffer's end. We'll walk through exactly how it works, how an attacker could exploit it, and what the fix does to close the door.


The Vulnerability Explained

What Is an Integer Overflow Leading to Heap Overflow?

Integer overflow bugs in C are deceptively simple: arithmetic on an integer type wraps around when it exceeds the type's maximum value. In the context of memory allocation, this means a calculation meant to produce a large buffer size can silently produce a tiny one — or even zero. The allocator happily returns a valid pointer to that undersized buffer, and any subsequent write that assumes the buffer is full-sized corrupts adjacent heap memory.

The Vulnerable Code

Here is the relevant function before the fix:

// deps/centitoml/toml_api.c (BEFORE)
static char *STRNDUP(const char *s, size_t n)
{
    size_t len = strnlen(s, n);
    char *p = MALLOC(len + 1);   // ← integer overflow possible here
    if (p)
    {
        memcpy(p, s, len);       // ← writes `len` bytes into undersized buffer
        p[len] = '\0';
    }
    return p;
}

Let's trace the problem step by step:

  1. len comes from attacker-controlled TOML data. The n parameter passed to strnlen is derived from values parsed out of a TOML file — a file that could be a community-distributed mod or map.

  2. len + 1 can overflow. size_t is an unsigned type. On a 64-bit system, SIZE_MAX is 0xFFFFFFFFFFFFFFFF. If len equals SIZE_MAX (i.e., (size_t)-1), then len + 1 wraps to 0.

  3. MALLOC(0) returns a valid but tiny pointer. The C standard permits malloc(0) to return a non-NULL pointer. The buffer is effectively zero bytes (or implementation-defined minimum), but the pointer is valid.

  4. memcpy(p, s, len) writes len bytes anyway. With len equal to SIZE_MAX, this attempts to copy an astronomically large number of bytes starting from the tiny allocation — smashing everything on the heap after it.

Real-World Attack Scenario

The game loads TOML configuration files from its config/ directory. Community channels distribute mod and map files, which may include or reference TOML configs. An attacker crafts a malicious TOML file containing a string value whose encoded length causes strnlen to return (size_t)-1. When the game loads this file:

  • The parser calls STRNDUP with the oversized string.
  • MALLOC(0) returns a small allocation.
  • memcpy overflows the heap, corrupting allocator metadata or adjacent objects.
  • The attacker can potentially achieve arbitrary code execution by carefully controlling heap layout — a well-known technique in heap exploitation.

Because this is triggered simply by opening a file, the attack requires no authentication, no network access, and no special privileges. A player downloading a popular-looking mod could silently execute attacker code.

CWE Reference: CWE-190: Integer Overflow or Wraparound


The Fix

What Changed

The patch adds a single guard clause that checks for the sentinel overflow value before any allocation occurs:

// deps/centitoml/toml_api.c (AFTER)
static char *STRNDUP(const char *s, size_t n)
{
    size_t len = strnlen(s, n);
+   if (len == (size_t)-1)       // ← guard: overflow sentinel check
+       return NULL;
    char *p = MALLOC(len + 1);
    if (p)
    {
        memcpy(p, s, len);
        p[len] = '\0';
    }
    return p;
}

How Does It Work?

strnlen(s, n) returns at most n. If n itself is (size_t)-1 (the maximum value of size_t), it signals that the input length is at or beyond the type boundary — a condition that should never occur with legitimate TOML data.

The guard if (len == (size_t)-1) return NULL; intercepts this before len + 1 can wrap to zero. The caller receives NULL, which the existing if (p) check already handles gracefully — no allocation, no copy, no overflow.

Why This Is the Right Fix

Concern Before After
Integer overflow on len+1 Possible Prevented by early return
Undersized MALLOC call Possible Never reached
Heap corruption via memcpy Possible Never reached
Graceful failure on bad input No Yes — returns NULL

The fix is minimal, targeted, and does not change behavior for any legitimate input. It follows the fail-fast principle: detect the anomaly at the earliest possible point and return a safe error value rather than proceeding with corrupted state.


Prevention & Best Practices

1. Always Validate Sizes Before Arithmetic

Before any malloc(n + k) call, check that n + k does not overflow:

// Safe size addition pattern
if (n > SIZE_MAX - k) {
    // overflow would occur — handle error
    return NULL;
}
char *p = malloc(n + k);

Many projects use helper macros or inline functions for this:

static inline int size_add_overflow(size_t a, size_t b, size_t *result) {
    if (a > SIZE_MAX - b) return 1; // overflow
    *result = a + b;
    return 0;
}

2. Treat All Parser Inputs as Untrusted

Even "local" files like config files can be attacker-controlled in many threat models (malicious mods, path traversal, symlink attacks). Apply the same input validation you would to network data.

3. Use Safe String Functions

Where possible, prefer higher-level abstractions that track buffer sizes:

  • In C: use strndup() from the standard library (POSIX), which handles the null terminator internally.
  • In C++: use std::string or std::string_view.
  • In Rust: string types are bounds-checked by default.

4. Enable Compiler and Runtime Protections

Modern toolchains offer several mitigations:

# AddressSanitizer — catches heap overflows at runtime
clang -fsanitize=address -o myapp myapp.c

# UndefinedBehaviorSanitizer — catches integer overflows
clang -fsanitize=undefined -o myapp myapp.c

# Stack and heap hardening (GCC/Clang)
-D_FORTIFY_SOURCE=2 -fstack-protector-strong

These don't replace correct code, but they catch bugs during development and testing.

5. Use Static Analysis Tools

Tools that can detect this class of vulnerability:

  • Coverity — commercial, excellent at integer overflow and buffer overrun detection
  • CodeQL — GitHub's semantic analysis engine, has queries for CWE-190
  • clang-tidy — catches some arithmetic overflow patterns
  • Semgrep — customizable rules for unsafe malloc/memcpy patterns

6. Fuzz Your Parsers

Configuration file parsers are ideal fuzzing targets. Tools like libFuzzer or AFL++ can generate millions of malformed TOML files and report crashes:

# Example: fuzz the TOML parser entry point
clang -fsanitize=fuzzer,address -o toml_fuzz toml_fuzz_target.c toml_api.c
./toml_fuzz corpus/

A fuzzer would likely have caught this bug by generating a string that triggers the SIZE_MAX path.

Relevant Standards and References


Conclusion

This vulnerability is a textbook example of how a single missing bounds check in a low-level parser can open the door to critical heap corruption. The attack path is realistic — community mod files are a well-established vector for game engine exploits — and the impact is severe: heap corruption in C can escalate to arbitrary code execution in the hands of a skilled attacker.

The fix is elegantly simple: one if statement, two lines of code, zero behavior change for legitimate inputs. But reaching that fix requires understanding why len + 1 can overflow, what MALLOC(0) returns, and how memcpy blindly trusts its size argument.

Key takeaways for developers:

  • 🔢 Treat size arithmetic as a security boundary — always check for overflow before allocation.
  • 📂 Parser inputs are attacker-controlled — even local config files in user-modifiable directories.
  • 🛡️ Fail fast and fail safely — return NULL early rather than proceeding with invalid state.
  • 🔍 Fuzz your parsers — automated input generation finds these bugs faster than code review alone.
  • 🧰 Use sanitizers in CI — AddressSanitizer and UBSan catch these issues before they reach production.

Memory safety bugs don't disappear on their own. Systematic validation, modern tooling, and a security-first mindset are the best defenses we have — and as this fix shows, applying them is often much simpler than the damage they prevent.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #4762

Related Articles

critical

Heap Buffer Overflow in Audio Ring Buffer: How a Missing Bounds Check Could Crash Your App

A critical heap buffer overflow vulnerability was discovered in `audio_backend.c`, where the audio ring buffer's `memcpy` operations lacked bounds validation before writing PCM data. Without checking that incoming data sizes fell within the allocated buffer's capacity, a maliciously crafted audio file could corrupt adjacent heap memory, potentially enabling arbitrary code execution. The fix adds a concise pre-flight validation guard that rejects out-of-range write requests before any memory oper

critical

Critical Integer Sign Bug in runtime_malloc(): How a Missing Check Enables Heap Corruption

A critical vulnerability in `runtime/zenith_runtime.c` allowed the `runtime_malloc()` function to accept negative size values, which when cast to an unsigned type could either trigger a massive failed allocation or produce a dangerously undersized buffer ripe for overflow. The fix adds a simple but essential guard clause that rejects non-positive sizes before they ever reach `malloc()`. Left unpatched, this class of bug can lead to heap metadata corruption, process crashes, or even arbitrary cod

critical

Heap Buffer Overflow in Path Normalization: How Two Unsafe memcpy Calls Almost Became a Critical Exploit

A critical heap buffer overflow vulnerability was discovered and patched in `src/aux.c`, where two `memcpy` calls in a path normalization function copied data into buffers without verifying sufficient capacity. An attacker capable of influencing the current working directory path — through deeply nested directories or crafted symlinks — could trigger heap corruption with potentially severe consequences. The fix introduces an integer overflow guard that ensures buffer allocation math cannot wrap

critical

Critical Buffer Overflow in iiod Parser: How a Missing Bounds Check Opened the Door to Remote Code Execution

A critical buffer overflow vulnerability was discovered in the `iiod` parser's `yy_input()` function, where an off-by-one bounds check allowed an oversized network input stream to overflow a fixed-size buffer, potentially overwriting adjacent stack or heap memory. Because this code path is reachable from the network without authentication, a remote attacker could exploit this flaw to achieve arbitrary code execution. The fix tightens the bounds enforcement and ensures the function returns the co

critical

Integer Overflow to Heap Buffer Overflow: How a Missing Size Check Almost Took Down an Embedded Web Server

A critical integer overflow vulnerability (CWE-190 → CWE-122) was discovered and fixed in an embedded ESP web server, where the HTTP Content-Length header value was cast to a signed integer and used directly in a `malloc()` call without proper size validation. On 32-bit systems, a crafted request with a maximum-sized Content-Length value could cause the allocation size to wrap to zero, allowing an attacker to overflow the heap with arbitrary data. The fix correctly validates the signed header va

critical

Critical Memory Safety Bug: Free of Uninitialized Memory in Rust Telemetry (CVE-2021-29937)

CVE-2021-29937 is a critical memory safety vulnerability in the Rust `telemetry` crate (versions prior to 0.1.3) that allows freeing uninitialized memory, leading to undefined behavior, potential crashes, and possible code execution. The fix involves upgrading the crate from version 0.1.0 to 0.1.3, which patches the unsafe memory handling at the root cause. Despite Rust's reputation for memory safety, this vulnerability demonstrates that `unsafe` code blocks can still introduce serious bugs that