Back to Blog
high SEVERITY8 min read

How heap buffer overflow happens in C JMA archive extraction and how to fix it

A heap buffer overflow vulnerability in `jma/jma.cpp` allowed a crafted JMA ROM archive to trigger out-of-bounds memory writes during file extraction. The flaw existed at line 446, where `memcpy` was called with `first_chunk_offset` and `copy_amount` values derived directly from archive header metadata without any validation that those values stayed within the bounds of either the source or destination buffer. The fix adds a pre-copy bounds check that rejects malformed archives before the danger

O
By Orbis AppSec
Published June 7, 2026Reviewed June 7, 2026

Answer Summary

This is a heap buffer overflow vulnerability (CWE-120) in C++ code inside `jma/jma.cpp`, specifically in the `jma_open::extract_file()` function at line 446. The root cause is that `memcpy` was called with `first_chunk_offset` and `copy_amount` values read directly from a JMA archive header with no validation that they stayed within the bounds of the source (`decomp_buffer`) or destination (`buffer`) buffers. The fix adds an explicit bounds check before the `memcpy` call: if `first_chunk_offset + copy_amount > chunk_size` or `i + copy_amount > our_file_size`, the function throws a `JMA_BAD_FILE` error and cleans up allocated memory. This prevents attackers from using crafted JMA archives to corrupt heap memory.

Vulnerability at a Glance

cweCWE-120
fixAdded a pre-copy bounds check that throws `JMA_BAD_FILE` if computed offsets exceed either buffer's declared size
riskAttacker-controlled archive metadata can corrupt heap memory, potentially enabling remote code execution or denial of service
languageC++
root cause`memcpy` at jma.cpp:446 used header-derived offsets and lengths without validating they fit within the source or destination buffer bounds
vulnerabilityHeap Buffer Overflow via Unvalidated memcpy Parameters

How heap buffer overflow happens in C JMA archive extraction and how to fix it

Introduction

The jma/jma.cpp file is responsible for parsing and extracting files from JMA-format ROM archives — a compressed archive format used in emulation tooling. Inside jma_open::extract_file(), the code decompresses chunk data and copies it into an output buffer. It's the kind of low-level file parsing code that handles attacker-controlled input by design, which makes it a high-value target.

A flaw at line 446 of jma.cpp meant that every call to extract_file() with a malicious archive was a potential heap corruption event. The memcpy call trusted two values — first_chunk_offset and copy_amount — that were derived from the archive's own header metadata, without ever checking whether those values actually fit inside the buffers they were indexing into.

This is a textbook example of CWE-120: Buffer Copy without Checking Size of Input, and it's particularly dangerous because JMA archives are user-supplied files: any user who can get the application to open an archive controls the values that feed this memcpy.


The Vulnerability Explained

Here is the vulnerable code as it existed before the fix, centered on line 446:

// jma/jma.cpp — BEFORE FIX (vulnerable)
size_t copy_amount = our_file_size - i > chunk_size - first_chunk_offset
    ? chunk_size - first_chunk_offset
    : our_file_size - i;

// ⚠️ No bounds check — first_chunk_offset and copy_amount come from archive header
memcpy(buffer + i, decomp_buffer + first_chunk_offset, copy_amount);
first_chunk_offset = 0;
i += copy_amount;

At first glance, the copy_amount calculation looks like it's doing something safe — it takes the minimum of two values. But the problem is what those values represent and where they come from.

  • chunk_size is the declared size of the decompressed chunk, read from the archive header.
  • first_chunk_offset is a byte offset into decomp_buffer, also from the archive header.
  • our_file_size is the declared output file size, again from the archive.
  • i is the running write position into buffer.

None of these values are validated against the actual allocated sizes of decomp_buffer or buffer. The copy_amount ternary only ensures the value is consistent with the archive's own declared sizes — but a crafted archive can declare whatever sizes it wants.

The Attack Scenario

An attacker creates a malicious JMA archive with the following crafted metadata:

  1. first_chunk_offset is set to a value near the end of decomp_buffer — say, chunk_size - 4.
  2. copy_amount is computed as chunk_size - first_chunk_offset, which evaluates to 4.
  3. But our_file_size is declared as much larger than the actual buffer allocation.

Because i accumulates across iterations and the loop doesn't stop when i approaches the real buffer boundary, the write target buffer + i can march past the end of the heap allocation. Meanwhile, decomp_buffer + first_chunk_offset can point past the end of the source allocation.

The result: attacker-controlled bytes written to arbitrary heap memory. Depending on what lives adjacent on the heap (vtable pointers, allocator metadata, other objects), this can lead to:

  • Heap metadata corruption → crash or allocator exploitation
  • Object pointer overwrite → potential code execution
  • Silent data corruption → logic vulnerabilities downstream

This is not theoretical. The scanner confirmed this as exploitable with a concrete exploitation scenario, and the vulnerability affects production code that processes real user-supplied archive files.


The Fix

The fix is surgical and correct. A bounds check was inserted immediately before the memcpy call, validating both the source and destination sides of the copy:

// jma/jma.cpp — AFTER FIX (safe)
size_t copy_amount = our_file_size - i > chunk_size - first_chunk_offset
    ? chunk_size - first_chunk_offset
    : our_file_size - i;

// ✅ Bounds check added — validates BOTH source and destination
if (first_chunk_offset + copy_amount > chunk_size || i + copy_amount > our_file_size) {
    delete[] comp_buffer;
    throw(JMA_BAD_FILE);
}

memcpy(buffer + i, decomp_buffer + first_chunk_offset, copy_amount);
first_chunk_offset = 0;
i += copy_amount;

Why Both Checks Are Necessary

The check has two independent conditions, and both matter:

first_chunk_offset + copy_amount > chunk_size
This guards the read side (decomp_buffer). It ensures the source read doesn't go past the end of the decompressed chunk buffer. Without this, a crafted offset could cause reads from unallocated heap memory.

i + copy_amount > our_file_size
This guards the write side (buffer). It ensures the destination write doesn't go past the end of the output buffer. Without this, accumulated writes across loop iterations could overflow the output allocation.

Cleanup Before Throw

Notice that the fix also calls delete[] comp_buffer before throwing. This is important — comp_buffer was heap-allocated earlier in the function, and throwing without freeing it would cause a memory leak. The fix correctly handles this cleanup, maintaining both safety and resource hygiene.

Before/After Comparison

Aspect Before After
Source bounds checked ❌ No first_chunk_offset + copy_amount <= chunk_size
Destination bounds checked ❌ No i + copy_amount <= our_file_size
Malformed archive handling Silent overflow JMA_BAD_FILE exception
Memory cleanup on error N/A delete[] comp_buffer before throw

Related Lines to Review

The PR notes that similar memcpy patterns exist at lines 451 and 467 in the same file. These should be audited with the same lens — any memcpy whose parameters flow from archive-controlled data needs the same treatment.


Prevention & Best Practices

1. Treat All File-Derived Values as Untrusted

Any value read from a file, network packet, or serialized format is attacker-controlled. This is especially true for archive parsers, image decoders, and protocol implementations. Validate every offset and length before use.

2. The Golden Rule for memcpy Safety

Before every memcpy(dst, src, n) call, verify:

// Always check both sides
assert(src_offset + n <= src_buffer_size);   // Source won't overread
assert(dst_offset + n <= dst_buffer_size);   // Destination won't overwrite

3. Use AddressSanitizer During Development

# Compile with AddressSanitizer to catch buffer overflows at runtime
g++ -fsanitize=address -fno-omit-frame-pointer -g jma.cpp -o jma_asan

AddressSanitizer would have caught this exact overflow the first time a malformed archive was processed in a test environment.

4. Fuzz Your Archive Parsers

Archive parsing code is an ideal target for fuzzing. Tools like libFuzzer or AFL++ can generate thousands of malformed archives per second and will quickly find cases where header values cause crashes:

# Example: fuzz the JMA extraction path
clang++ -fsanitize=fuzzer,address jma_fuzz_target.cpp jma.cpp -o jma_fuzzer
./jma_fuzzer corpus/

5. Consider Bounds-Checked Abstractions

In modern C++, prefer std::span for buffer views, which carries size information and enables bounds checking:

#include <span>
// Instead of raw pointer + separate size variable,
// use std::span which keeps size and pointer together
void extract_chunk(std::span<uint8_t> dest, std::span<const uint8_t> src, size_t offset) {
    if (offset >= src.size()) throw std::out_of_range("bad offset");
    auto src_view = src.subspan(offset);  // Bounds-checked subspan
    // ...
}

6. Reference Standards


Key Takeaways

  • Archive header values are attacker-controlled input. first_chunk_offset, chunk_size, and our_file_size all come from the JMA file itself — never assume they are safe to use directly as memcpy parameters.
  • The ternary minimum trick is not a bounds check. The copy_amount calculation in extract_file() looked protective but only enforced internal consistency of the archive's own declared values, not their validity against actual buffer allocations.
  • Both source and destination must be validated independently. Checking only the write side (i + copy_amount) would have left the read side (decomp_buffer + first_chunk_offset) vulnerable to out-of-bounds reads, and vice versa.
  • Lines 451 and 467 in the same file use similar patterns and should be reviewed with the same scrutiny — a single-function audit is not enough when the same pattern recurs.
  • AddressSanitizer + fuzzing would have caught this before production. For any code that parses binary formats, these tools should be part of the standard development workflow.

How Orbis AppSec Detected This

  • Source: JMA archive file header metadata — specifically the first_chunk_offset, chunk_size, and our_file_size fields parsed from the archive by jma_open::extract_file()
  • Sink: memcpy(buffer + i, decomp_buffer + first_chunk_offset, copy_amount) at jma/jma.cpp:446
  • Missing control: No validation that first_chunk_offset + copy_amount <= chunk_size (source bounds) or i + copy_amount <= our_file_size (destination bounds) before the copy
  • CWE: CWE-120 — Buffer Copy without Checking Size of Input
  • Fix: Added a pre-copy bounds check at line 446 that throws JMA_BAD_FILE and frees comp_buffer if either the source or destination bounds would be exceeded

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.


Conclusion

The heap buffer overflow in jma_open::extract_file() is a clear example of why archive parsers require a fundamentally adversarial mindset: every byte in the input file is a potential attack vector. The vulnerability wasn't caused by a complex logic error — it was a straightforward missing validation before a memcpy. The fix is equally straightforward: five lines of bounds checking that transform a silent heap corruption into a clean, recoverable error.

For developers working on any binary format parser in C or C++, this case reinforces a non-negotiable rule: validate every offset and length derived from external data before using it in a memory operation, on both the source and destination sides, every time.


References

Frequently Asked Questions

What is a heap buffer overflow?

A heap buffer overflow occurs when a program writes data beyond the end of a heap-allocated buffer. In C/C++, functions like `memcpy` will blindly copy as many bytes as told, so if the length or offset comes from untrusted input without validation, an attacker can overwrite adjacent heap memory — potentially corrupting data structures or enabling code execution.

How do you prevent heap buffer overflows in C++ archive parsing?

Always validate that offset + length <= buffer_size before any `memcpy`, `memmove`, or similar operation. Treat every value read from a file or network as untrusted. Use bounds-checked abstractions like `std::span` where possible, and enable compiler mitigations like AddressSanitizer during testing.

What CWE is heap buffer overflow?

Heap buffer overflows are classified under CWE-120 ("Buffer Copy without Checking Size of Input"), and more specifically CWE-122 ("Heap-based Buffer Overflow"). CWE-120 is the parent category covering all buffer copy operations that lack input size validation.

Is checking the copy_amount alone enough to prevent this overflow?

No. This vulnerability demonstrates why you must check both the source and destination sides. The fix validates that `first_chunk_offset + copy_amount <= chunk_size` (source side) AND `i + copy_amount <= our_file_size` (destination side). Checking only one side leaves the other open to overflow.

Can static analysis detect heap buffer overflows like this one?

Yes. Static analysis tools like Semgrep, CodeQL, and specialized C/C++ analyzers can flag `memcpy` calls where length parameters are derived from external data without visible bounds checks. The Orbis AppSec scanner detected this exact pattern in `jma.cpp` using multi-agent AI analysis.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #51

Related Articles

critical

How unsafe buffer copying happens in C credential storage and how to fix it

A critical vulnerability in `lib/server.c` allowed attackers to trigger out-of-bounds memory reads when copying credentials via unsafe `memcpy()` calls. By replacing `memcpy()` with bounds-safe `strlcpy()`, the fix ensures credentials are safely stored without buffer overruns or null-termination issues.

critical

How buffer overflow happens in C Bluetooth device handling and how to fix it

A critical buffer overflow vulnerability in `src/wiiuse.c` allowed attackers within Bluetooth range to trigger heap corruption by sending specially crafted HID packets with oversized length values. The fix adds strict bounds checking to validate that data lengths don't exceed buffer capacity before performing memory operations, preventing exploitation by malicious or intercepted Bluetooth devices.

critical

How buffer overflow happens in C patches.c sprintf macros and how to fix it

A critical buffer overflow vulnerability was discovered in `src/patches.c` where the `_EPRINT_I`, `_EPRINT_F`, and `_EPRINT_COEF` macros used `sprintf()` to write formatted AMY event data into a fixed-size buffer without any bounds checking. By replacing every `sprintf()` call with `snprintf()` and tracking remaining buffer space using a `s_entry` base pointer, the fix ensures that formatting 22 event fields — even at maximum values — can never write beyond the buffer boundary.

critical

How buffer overflow happens in C dcraw_lz.c nikon_3700() and how to fix it

A critical buffer overflow vulnerability was discovered in `lightcrafts/coprocesses/dcraw/dcraw_lz.c` at line 1334, where the `nikon_3700()` function used `strcpy()` to copy camera make and model strings into fixed 64-byte buffers without any bounds checking. A crafted RAW image file with oversized make/model metadata could trigger a heap or stack corruption, potentially enabling arbitrary code execution. The fix replaces both `strcpy()` calls with `strncpy()` and explicit null-termination, enfo

critical

How buffer overflow in modxo_queue.c memcpy happens in C embedded systems and how to fix it

A critical buffer overflow vulnerability was discovered in `modxo/modxo_queue.c`, where two `memcpy` operations in the `modxo_queue_insert` and `modxo_queue_remove` functions used `queue->item_size` as the copy length without validating it against the destination buffer's bounds. If `item_size` was corrupted or maliciously set to an oversized value, both the enqueue (line 49) and dequeue (line 61) operations could overflow adjacent heap or stack memory on the embedded target. The fix adds bounds

high

How Spring Boot EndpointRequest.to() security bypass happens in Java Spring Boot and how to fix it

CVE-2025-22235 is a high-severity vulnerability in Spring Boot where `EndpointRequest.to()` creates an incorrect request matcher when an actuator endpoint is not exposed, potentially allowing unauthorized access to protected endpoints. The fix upgrades Spring Boot from 3.4.4 to 3.4.5 in the anti-corruption-layer service's `pom.xml`. This is particularly dangerous because actuator endpoints can expose sensitive operational data and administrative functions.