What is a heap buffer overflow?

A heap buffer overflow occurs when a program writes data beyond the end of a heap-allocated buffer. In C/C++, functions like `memcpy` will blindly copy as many bytes as told, so if the length or offset comes from untrusted input without validation, an attacker can overwrite adjacent heap memory — potentially corrupting data structures or enabling code execution.

How do you prevent heap buffer overflows in C++ archive parsing?

Always validate that offset + length <= buffer_size before any `memcpy`, `memmove`, or similar operation. Treat every value read from a file or network as untrusted. Use bounds-checked abstractions like `std::span` where possible, and enable compiler mitigations like AddressSanitizer during testing.

What CWE is heap buffer overflow?

Heap buffer overflows are classified under CWE-120 ("Buffer Copy without Checking Size of Input"), and more specifically CWE-122 ("Heap-based Buffer Overflow"). CWE-120 is the parent category covering all buffer copy operations that lack input size validation.

Is checking the copy_amount alone enough to prevent this overflow?

No. This vulnerability demonstrates why you must check both the source and destination sides. The fix validates that `first_chunk_offset + copy_amount <= chunk_size` (source side) AND `i + copy_amount <= our_file_size` (destination side). Checking only one side leaves the other open to overflow.

Can static analysis detect heap buffer overflows like this one?

Yes. Static analysis tools like Semgrep, CodeQL, and specialized C/C++ analyzers can flag `memcpy` calls where length parameters are derived from external data without visible bounds checks. The Orbis AppSec scanner detected this exact pattern in `jma.cpp` using multi-agent AI analysis.

How heap buffer overflow happens in C JMA archive extraction and how to fix it

Q: What CWE is heap buffer overflow?

Heap buffer overflows are classified under CWE-120 ("Buffer Copy without Checking Size of Input"), and more specifically CWE-122 ("Heap-based Buffer Overflow"). CWE-120 is the parent category covering all buffer copy operations that lack input size validation.

Q: Is checking the copy_amount alone enough to prevent this overflow?

No. This vulnerability demonstrates why you must check both the source and destination sides. The fix validates that `first_chunk_offset + copy_amount <= chunk_size` (source side) AND `i + copy_amount <= our_file_size` (destination side). Checking only one side leaves the other open to overflow.

Q: Can static analysis detect heap buffer overflows like this one?

Yes. Static analysis tools like Semgrep, CodeQL, and specialized C/C++ analyzers can flag `memcpy` calls where length parameters are derived from external data without visible bounds checks. The Orbis AppSec scanner detected this exact pattern in `jma.cpp` using multi-agent AI analysis.

Introduction

The jma/jma.cpp file is responsible for parsing and extracting files from JMA-format ROM archives — a compressed archive format used in emulation tooling. Inside jma_open::extract_file(), the code decompresses chunk data and copies it into an output buffer. It's the kind of low-level file parsing code that handles attacker-controlled input by design, which makes it a high-value target.

A flaw at line 446 of jma.cpp meant that every call to extract_file() with a malicious archive was a potential heap corruption event. The memcpy call trusted two values — first_chunk_offset and copy_amount — that were derived from the archive's own header metadata, without ever checking whether those values actually fit inside the buffers they were indexing into.

This is a textbook example of CWE-120: Buffer Copy without Checking Size of Input, and it's particularly dangerous because JMA archives are user-supplied files: any user who can get the application to open an archive controls the values that feed this memcpy.

The Vulnerability Explained

Here is the vulnerable code as it existed before the fix, centered on line 446:

// jma/jma.cpp — BEFORE FIX (vulnerable)
size_t copy_amount = our_file_size - i > chunk_size - first_chunk_offset
    ? chunk_size - first_chunk_offset
    : our_file_size - i;

// ⚠️ No bounds check — first_chunk_offset and copy_amount come from archive header
memcpy(buffer + i, decomp_buffer + first_chunk_offset, copy_amount);
first_chunk_offset = 0;
i += copy_amount;

At first glance, the copy_amount calculation looks like it's doing something safe — it takes the minimum of two values. But the problem is what those values represent and where they come from.

chunk_size is the declared size of the decompressed chunk, read from the archive header.
first_chunk_offset is a byte offset into decomp_buffer, also from the archive header.
our_file_size is the declared output file size, again from the archive.
i is the running write position into buffer.

None of these values are validated against the actual allocated sizes of decomp_buffer or buffer. The copy_amount ternary only ensures the value is consistent with the archive's own declared sizes — but a crafted archive can declare whatever sizes it wants.

The Attack Scenario

An attacker creates a malicious JMA archive with the following crafted metadata:

first_chunk_offset is set to a value near the end of decomp_buffer — say, chunk_size - 4.
copy_amount is computed as chunk_size - first_chunk_offset, which evaluates to 4.
But our_file_size is declared as much larger than the actual buffer allocation.

Because i accumulates across iterations and the loop doesn't stop when i approaches the real buffer boundary, the write target buffer + i can march past the end of the heap allocation. Meanwhile, decomp_buffer + first_chunk_offset can point past the end of the source allocation.

The result: attacker-controlled bytes written to arbitrary heap memory. Depending on what lives adjacent on the heap (vtable pointers, allocator metadata, other objects), this can lead to:

Heap metadata corruption → crash or allocator exploitation
Object pointer overwrite → potential code execution
Silent data corruption → logic vulnerabilities downstream

This is not theoretical. The scanner confirmed this as exploitable with a concrete exploitation scenario, and the vulnerability affects production code that processes real user-supplied archive files.

The Fix

The fix is surgical and correct. A bounds check was inserted immediately before the memcpy call, validating both the source and destination sides of the copy:

// jma/jma.cpp — AFTER FIX (safe)
size_t copy_amount = our_file_size - i > chunk_size - first_chunk_offset
    ? chunk_size - first_chunk_offset
    : our_file_size - i;

// ✅ Bounds check added — validates BOTH source and destination
if (first_chunk_offset + copy_amount > chunk_size || i + copy_amount > our_file_size) {
    delete[] comp_buffer;
    throw(JMA_BAD_FILE);
}

memcpy(buffer + i, decomp_buffer + first_chunk_offset, copy_amount);
first_chunk_offset = 0;
i += copy_amount;

Why Both Checks Are Necessary

The check has two independent conditions, and both matter:

first_chunk_offset + copy_amount > chunk_size
This guards the read side (decomp_buffer). It ensures the source read doesn't go past the end of the decompressed chunk buffer. Without this, a crafted offset could cause reads from unallocated heap memory.

i + copy_amount > our_file_size
This guards the write side (buffer). It ensures the destination write doesn't go past the end of the output buffer. Without this, accumulated writes across loop iterations could overflow the output allocation.

Cleanup Before Throw

Notice that the fix also calls delete[] comp_buffer before throwing. This is important — comp_buffer was heap-allocated earlier in the function, and throwing without freeing it would cause a memory leak. The fix correctly handles this cleanup, maintaining both safety and resource hygiene.

Before/After Comparison

Aspect	Before	After
Source bounds checked	❌ No	✅ `first_chunk_offset + copy_amount <= chunk_size`
Destination bounds checked	❌ No	✅ `i + copy_amount <= our_file_size`
Malformed archive handling	Silent overflow	`JMA_BAD_FILE` exception
Memory cleanup on error	N/A	`delete[] comp_buffer` before throw

Related Lines to Review

The PR notes that similar memcpy patterns exist at lines 451 and 467 in the same file. These should be audited with the same lens — any memcpy whose parameters flow from archive-controlled data needs the same treatment.

Prevention & Best Practices

1. Treat All File-Derived Values as Untrusted

Any value read from a file, network packet, or serialized format is attacker-controlled. This is especially true for archive parsers, image decoders, and protocol implementations. Validate every offset and length before use.

2. The Golden Rule for memcpy Safety

Before every memcpy(dst, src, n) call, verify:

// Always check both sides
assert(src_offset + n <= src_buffer_size);   // Source won't overread
assert(dst_offset + n <= dst_buffer_size);   // Destination won't overwrite

3. Use AddressSanitizer During Development

# Compile with AddressSanitizer to catch buffer overflows at runtime
g++ -fsanitize=address -fno-omit-frame-pointer -g jma.cpp -o jma_asan

AddressSanitizer would have caught this exact overflow the first time a malformed archive was processed in a test environment.

4. Fuzz Your Archive Parsers

Archive parsing code is an ideal target for fuzzing. Tools like libFuzzer or AFL++ can generate thousands of malformed archives per second and will quickly find cases where header values cause crashes:

# Example: fuzz the JMA extraction path
clang++ -fsanitize=fuzzer,address jma_fuzz_target.cpp jma.cpp -o jma_fuzzer
./jma_fuzzer corpus/

5. Consider Bounds-Checked Abstractions

In modern C++, prefer std::span for buffer views, which carries size information and enables bounds checking:

#include <span>
// Instead of raw pointer + separate size variable,
// use std::span which keeps size and pointer together
void extract_chunk(std::span<uint8_t> dest, std::span<const uint8_t> src, size_t offset) {
    if (offset >= src.size()) throw std::out_of_range("bad offset");
    auto src_view = src.subspan(offset);  // Bounds-checked subspan
    // ...
}

6. Reference Standards

CWE-120: Buffer Copy without Checking Size of Input
CWE-122: Heap-based Buffer Overflow
OWASP: Buffer Overflow Prevention Cheat Sheet

Key Takeaways

Archive header values are attacker-controlled input. first_chunk_offset, chunk_size, and our_file_size all come from the JMA file itself — never assume they are safe to use directly as memcpy parameters.
The ternary minimum trick is not a bounds check. The copy_amount calculation in extract_file() looked protective but only enforced internal consistency of the archive's own declared values, not their validity against actual buffer allocations.
Both source and destination must be validated independently. Checking only the write side (i + copy_amount) would have left the read side (decomp_buffer + first_chunk_offset) vulnerable to out-of-bounds reads, and vice versa.
Lines 451 and 467 in the same file use similar patterns and should be reviewed with the same scrutiny — a single-function audit is not enough when the same pattern recurs.
AddressSanitizer + fuzzing would have caught this before production. For any code that parses binary formats, these tools should be part of the standard development workflow.

How Orbis AppSec Detected This

Source: JMA archive file header metadata — specifically the first_chunk_offset, chunk_size, and our_file_size fields parsed from the archive by jma_open::extract_file()
Sink: memcpy(buffer + i, decomp_buffer + first_chunk_offset, copy_amount) at jma/jma.cpp:446
Missing control: No validation that first_chunk_offset + copy_amount <= chunk_size (source bounds) or i + copy_amount <= our_file_size (destination bounds) before the copy
CWE: CWE-120 — Buffer Copy without Checking Size of Input
Fix: Added a pre-copy bounds check at line 446 that throws JMA_BAD_FILE and frees comp_buffer if either the source or destination bounds would be exceeded

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

The heap buffer overflow in jma_open::extract_file() is a clear example of why archive parsers require a fundamentally adversarial mindset: every byte in the input file is a potential attack vector. The vulnerability wasn't caused by a complex logic error — it was a straightforward missing validation before a memcpy. The fix is equally straightforward: five lines of bounds checking that transform a silent heap corruption into a clean, recoverable error.

For developers working on any binary format parser in C or C++, this case reinforces a non-negotiable rule: validate every offset and length derived from external data before using it in a memory operation, on both the source and destination sides, every time.

cwe	CWE-120
fix	Added a pre-copy bounds check that throws `JMA_BAD_FILE` if computed offsets exceed either buffer's declared size
risk	Attacker-controlled archive metadata can corrupt heap memory, potentially enabling remote code execution or denial of service
language	C++
root cause	`memcpy` at jma.cpp:446 used header-derived offsets and lengths without validating they fit within the source or destination buffer bounds
vulnerability	Heap Buffer Overflow via Unvalidated memcpy Parameters

How heap buffer overflow happens in C JMA archive extraction and how to fix it

Answer Summary

Vulnerability at a Glance

How heap buffer overflow happens in C JMA archive extraction and how to fix it

Introduction

The Vulnerability Explained

The Attack Scenario

The Fix

Why Both Checks Are Necessary

Cleanup Before Throw

Before/After Comparison

Related Lines to Review

Prevention & Best Practices

1. Treat All File-Derived Values as Untrusted

2. The Golden Rule for memcpy Safety

3. Use AddressSanitizer During Development

4. Fuzz Your Archive Parsers

5. Consider Bounds-Checked Abstractions

6. Reference Standards

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is a heap buffer overflow?

How do you prevent heap buffer overflows in C++ archive parsing?

What CWE is heap buffer overflow?

Is checking the copy_amount alone enough to prevent this overflow?

Can static analysis detect heap buffer overflows like this one?

View the Security Fix

Related Articles

How integer underflow in array splice operations happens in C and how to fix it

How buffer overflow via sprintf() happens in C string formatting and how to fix it

How buffer overflow happens in C tar header parsing and how to fix it

How buffer overflow happens in C ieee80211_input() and how to fix it

How buffer overflow from unsafe string copy functions happens in C network interface code and how to fix it

How buffer overflow in FuzzIxml.c sprintf() happens in C and how to fix it