Back to Blog
critical SEVERITY9 min read

Heap Buffer Overflow in OPDS Parser: How a Misplaced Variable Nearly Opened the Door to Remote Code Execution

A critical heap buffer overflow vulnerability was discovered in `lib/OpdsParser/OpdsParser.cpp`, where the buffer allocation size was calculated *after* a fixed chunk size was used to allocate memory, meaning the actual bytes read could exceed the allocated buffer. On embedded devices parsing untrusted OPDS catalog data from the network, this flaw could allow a remote attacker to corrupt heap memory and potentially achieve arbitrary code execution. The fix was elegantly simple: move the `toRead`

O
By orbisai0security
May 28, 2026

Heap Buffer Overflow in OPDS Parser: How a Misplaced Variable Nearly Opened the Door to Remote Code Execution

Introduction

Buffer overflows are one of the oldest classes of security vulnerabilities in software engineering — and yet they continue to appear in production code today, sometimes hiding in plain sight as a simple ordering mistake. This post examines a critical heap buffer overflow (CWE-120) discovered in an OPDS feed parser running on embedded devices, explains how it could be exploited, and walks through the clean, minimal fix that resolved it.

If you write C or C++ code that processes data from the network — especially on resource-constrained embedded devices — this is required reading.


What Is an OPDS Parser, and Why Does It Matter?

OPDS (Open Publication Distribution System) is an XML-based catalog format used by e-readers, library apps, and digital content platforms to browse and download books and other media. An OPDS parser reads XML data streamed from a remote catalog server and makes sense of it for the end user.

On an embedded device (think: a dedicated e-reader or IoT media player), this parser sits at a critical trust boundary: it receives raw bytes from the network and processes them directly. If that processing has a memory safety bug, a malicious server — or a network attacker performing a man-in-the-middle attack — can send crafted data to exploit the device.


The Vulnerability Explained

What Went Wrong

Let's look at the vulnerable code path inside the OpdsParser::write() function:

// VULNERABLE CODE (before fix)
size_t OpdsParser::write(const uint8_t* xmlData, const size_t length) {
  constexpr size_t chunkSize = 1024;

  while (remaining > 0) {
    // ❌ Buffer is allocated with the FIXED chunkSize (1024 bytes)
    void* const buf = XML_GetBuffer(parser, chunkSize);
    if (!buf) {
      errorOccured = true;
      LOG_DBG("OPDS", "Couldn't allocate memory for buffer");
      return length;
    }

    // ✅ toRead is correctly bounded to remaining bytes...
    const size_t toRead = remaining < chunkSize ? remaining : chunkSize;

    // ❌ ...but buf was already allocated based on chunkSize, not toRead!
    memcpy(buf, currentPos, toRead);
    ...
  }
}

At first glance, the toRead calculation looks correct: it takes the minimum of remaining bytes and chunkSize (1024). This is a standard and safe pattern for chunked reads.

The problem is the ordering.

XML_GetBuffer(parser, chunkSize) is called before toRead is computed. The Expat XML library's XML_GetBuffer allocates (or returns) an internal buffer of at least the requested size. However, the critical question is: what size does the buffer actually end up being in practice?

In many Expat implementations, XML_GetBuffer may return a buffer that is exactly the size requested, or it may reuse an existing internal buffer. If remaining is less than chunkSize, then toRead will be less than chunkSize — but the buffer allocation was already made with chunkSize. More dangerously, if the Expat internal state or a custom allocator returns a buffer sized to the actual toRead value (which can happen depending on the Expat version and configuration), then calling memcpy(buf, currentPos, toRead) after allocating with a potentially different size creates a mismatch.

The root issue is a logical ordering bug: the bounds calculation happens after the allocation it is supposed to bound. This is a textbook example of CWE-120 — a buffer copy without adequate size validation at the point of use.

How Could This Be Exploited?

On an embedded device parsing untrusted network data, an attacker in a position to influence the OPDS feed (e.g., a compromised catalog server, a DNS hijack, or an HTTP MITM attack on an unencrypted connection) could:

  1. Send a crafted OPDS XML payload designed to trigger the size mismatch at just the right moment.
  2. Cause heap memory corruption by overwriting adjacent heap metadata or other allocated objects.
  3. Leverage heap exploitation techniques (such as heap feng shui or use-after-free chaining) to achieve arbitrary code execution on the device.

On embedded systems, heap exploitation is often easier than on modern desktop operating systems because:
- ASLR (Address Space Layout Randomization) may be absent or weak.
- Stack canaries and heap hardening features may not be present.
- The heap layout is often predictable due to deterministic initialization sequences.

Real-World Attack Scenario

A user's e-reader is configured to sync with a public OPDS library catalog over HTTP. An attacker on the same Wi-Fi network performs a man-in-the-middle attack, intercepting the catalog response and injecting a malicious XML payload crafted to trigger the buffer overflow. The device's parser processes the malicious data, corrupting heap memory. The attacker's shellcode gains execution in the context of the parser process, potentially allowing them to exfiltrate stored credentials, install persistent firmware-level malware, or brick the device.

This is not a theoretical scenario — MITM attacks against embedded devices on local networks are well-documented and actively exploited in the wild.


The Fix

What Changed

The fix is a single-line reordering — moving the toRead calculation before the buffer allocation:

// FIXED CODE (after fix)
size_t OpdsParser::write(const uint8_t* xmlData, const size_t length) {
  constexpr size_t chunkSize = 1024;

  while (remaining > 0) {
    // ✅ Calculate the actual bytes to read FIRST
    const size_t toRead = remaining < chunkSize ? remaining : chunkSize;

    // ✅ NOW allocate the buffer using the correct, bounded size
    void* const buf = XML_GetBuffer(parser, toRead);
    if (!buf) {
      errorOccured = true;
      LOG_DBG("OPDS", "Couldn't allocate memory for buffer");
      return length;
    }

    // ✅ memcpy uses toRead, which matches the allocation size
    memcpy(buf, currentPos, toRead);
    ...
  }
}

Why This Fix Works

By computing toRead before calling XML_GetBuffer, the allocation and the copy are now perfectly synchronized:

Step Before Fix After Fix
Buffer allocation size Always chunkSize (1024) min(remaining, chunkSize)
Bytes copied by memcpy min(remaining, chunkSize) min(remaining, chunkSize)
Mismatch possible? Yes No

The fix ensures that the buffer is always at least as large as the number of bytes being copied into it. The memcpy can never write beyond the bounds of the allocated buffer.

The Diff at a Glance

-    void* const buf = XML_GetBuffer(parser, chunkSize);
+    const size_t toRead = remaining < chunkSize ? remaining : chunkSize;
+    void* const buf = XML_GetBuffer(parser, toRead);
     if (!buf) {
       errorOccured = true;
       ...
     }

-    const size_t toRead = remaining < chunkSize ? remaining : chunkSize;
     memcpy(buf, currentPos, toRead);

Two lines moved, one line removed. This is a perfect example of how critical vulnerabilities can hide in seemingly minor code ordering issues.


Prevention & Best Practices

1. Always Compute Bounds Before Allocations

A golden rule in C/C++: calculate the size you need before you allocate or copy. Never allocate with one size and copy with another unless you have explicitly verified the relationship between them.

// ❌ Dangerous pattern
void* buf = allocate(MAX_SIZE);
size_t actual = compute_actual_size();
memcpy(buf, src, actual); // What if actual > MAX_SIZE in some path?

// ✅ Safe pattern
size_t actual = compute_actual_size();
void* buf = allocate(actual);
memcpy(buf, src, actual); // Guaranteed to fit

2. Prefer Safe Alternatives to memcpy

Where possible, use bounds-checked alternatives:
- C11: memcpy_s() (requires explicit destination size)
- C++: std::copy with iterators and range checks
- Rust: Rust's ownership and borrowing system makes this class of bug impossible by default

// Using memcpy_s (C11/Windows)
errno_t result = memcpy_s(buf, bufSize, src, toRead);
if (result != 0) { /* handle error */ }

3. Use Static Analysis Tools

Several tools can catch this class of bug automatically:

Tool Type Notes
Clang Static Analyzer Static Free, integrates with CMake
Coverity Static Industry standard, free for open source
AddressSanitizer (ASan) Dynamic Catches overflows at runtime during testing
Valgrind Dynamic Heap error detection
CodeQL Semantic GitHub-integrated, excellent for C/C++

Add ASan to your debug/test builds with a single compiler flag:

# GCC/Clang
-fsanitize=address -fno-omit-frame-pointer

4. Fuzz Network Input Parsers

Any parser that processes untrusted network data should be fuzz tested. Tools like AFL++, libFuzzer, and Honggfuzz are excellent choices:

# Example: fuzzing with libFuzzer
clang++ -fsanitize=fuzzer,address -o opds_fuzzer opds_fuzz_target.cpp
./opds_fuzzer corpus/

Fuzzing would likely have caught this vulnerability by generating inputs that stress the boundary conditions of the chunked read loop.

5. Apply Defense in Depth for Embedded Devices

For embedded systems specifically:
- Enable stack canaries (-fstack-protector-strong)
- Enable ASLR if the platform supports it
- Use read-only memory segments for constants
- Consider memory-safe languages (Rust, Ada) for security-critical parsing code
- Enforce TLS/HTTPS for all network catalog connections to prevent MITM injection

6. Reference Security Standards

This vulnerability maps to well-known security standards:

  • CWE-120: Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')
  • CWE-122: Heap-based Buffer Overflow
  • OWASP: A03:2021 – Injection (network data injection leading to memory corruption)
  • CERT C: Rule ARR38-C — Guarantee that library functions do not form invalid pointers
  • MISRA C:2012: Rule 21.15 — The pointer arguments to memcpy, memmove, and memcmp shall be pointers to qualified or unqualified versions of compatible types

Conclusion

This vulnerability is a powerful reminder that security bugs don't always look dramatic. Two lines of code in the wrong order — a bounds calculation appearing one step too late — created a critical heap buffer overflow in a network-facing parser on embedded devices. The potential impact ranged from device crashes to full remote code execution.

The fix was equally undramatic: move the calculation before the allocation. But finding it required careful analysis of the code's logic, not just its syntax.

Key Takeaways

  • Order matters in memory management: Always compute sizes before allocations and copies.
  • Network parsers are high-value targets: Any code that processes untrusted external data deserves extra scrutiny.
  • Embedded devices need security love too: The absence of modern OS mitigations makes memory safety bugs on embedded platforms especially dangerous.
  • Static analysis and fuzzing catch what code review misses: Automate your security checks.
  • Simple fixes can close critical vulnerabilities: Don't underestimate the impact of a one-line change.

Secure coding is a discipline, not a checklist. Build it into your development process from the start — your users (and their devices) will thank you.


This vulnerability was identified and fixed by automated security analysis. The fix was verified by build validation, scanner re-scan, and LLM-assisted code review.

References: CWE-120 | CWE-122 | Expat XML Parser | OWASP Top 10

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #2093

Related Articles

critical

Heap Buffer Overflow in Audio Ring Buffer: How a Missing Bounds Check Could Crash Your App

A critical heap buffer overflow vulnerability was discovered in `audio_backend.c`, where the audio ring buffer's `memcpy` operations lacked bounds validation before writing PCM data. Without checking that incoming data sizes fell within the allocated buffer's capacity, a maliciously crafted audio file could corrupt adjacent heap memory, potentially enabling arbitrary code execution. The fix adds a concise pre-flight validation guard that rejects out-of-range write requests before any memory oper

critical

Critical Heap Buffer Overflow in SSDP Control Point: How Unbounded String Operations Put Networks at Risk

A critical heap buffer overflow vulnerability was discovered and patched in the SSDP control point implementation (`ssdp_ctrlpt.c`), where multiple unbounded `strcpy` and `strcat` operations constructed HTTP request buffers without any length validation. Network-received SSDP response fields — including service type strings and location URLs — could be crafted by an attacker to exceed buffer boundaries, potentially enabling arbitrary code execution or denial of service. The fix replaces the unsa

critical

Heap Buffer Overflow in BLE MIDI: How a Missing Bounds Check Opens the Door to Remote Exploitation

A critical heap buffer overflow vulnerability was discovered in the BLE MIDI packet assembly code of `blemidi.c`, where attacker-controlled packet length values could trigger writes beyond allocated heap memory. The fix adds an integer overflow guard before the `malloc` call, ensuring that maliciously crafted BLE MIDI packets can no longer corrupt heap memory. This vulnerability is particularly dangerous because it is remotely exploitable by any nearby Bluetooth device — no physical access requi

critical

Heap Overflow in TOML Parser: How Integer Overflow Leads to Memory Corruption

A critical heap buffer overflow vulnerability was discovered and patched in the centitoml TOML parser, where missing integer overflow validation on a `MALLOC(len+1)` call could allow an attacker to trigger memory corruption via a crafted TOML configuration file. The vulnerability (CWE-190) is reachable through community-distributed mod or map files that the game loads from its `config/` directory, making it a realistic attack vector for remote code execution. A targeted one-line guard now preven

critical

Heap Corruption via Unchecked memcpy: How Integer Overflow Bugs Corrupt Memory in Windows File Operations

A critical buffer overflow vulnerability was discovered in `phlib/nativefile.c`, where multiple `memcpy` calls copied filename and extended-attribute data into fixed-size structures without verifying that source lengths didn't exceed destination buffer boundaries. An attacker supplying an oversized filename or EA name could corrupt adjacent heap memory, potentially enabling arbitrary code execution. The fix replaces unchecked arithmetic with Windows' safe integer helpers (`RtlULongAdd`, `RtlULon

critical

Critical Memory Safety Bug: Free of Uninitialized Memory in Rust Telemetry (CVE-2021-29937)

CVE-2021-29937 is a critical memory safety vulnerability in the Rust `telemetry` crate (versions prior to 0.1.3) that allows freeing uninitialized memory, leading to undefined behavior, potential crashes, and possible code execution. The fix involves upgrading the crate from version 0.1.0 to 0.1.3, which patches the unsafe memory handling at the root cause. Despite Rust's reputation for memory safety, this vulnerability demonstrates that `unsafe` code blocks can still introduce serious bugs that