Back to Blog
critical SEVERITY7 min read

Heap Overflow in LZMA Decompression: When Attacker-Controlled Data Meets memcpy

A critical heap buffer overflow vulnerability was discovered in the LZMA decompression library (`LzmaDec.c`), where attacker-controlled compressed input could manipulate copy lengths passed directly to `memcpy` without bounds validation. This class of vulnerability can allow attackers to overwrite adjacent heap memory, potentially leading to arbitrary code execution or process crashes. A targeted bounds check was added to validate the output size before the copy operation, closing the attack pat

O
By orbisai0security
May 28, 2026

Heap Overflow in LZMA Decompression: When Attacker-Controlled Data Meets memcpy

Introduction

Compression libraries are everywhere. They decompress archives, stream data, and sit quietly inside applications handling files users upload, download, or open daily. Because of this ubiquity, they make an attractive and high-value target for attackers. A vulnerability in a compression routine doesn't just affect one feature — it can affect every code path that processes compressed data.

This post covers a critical heap buffer overflow discovered and fixed in deps/lzma/src/LzmaDec.c, the LZMA decompression engine. The root cause is a classic but dangerous pattern: a length value derived from attacker-controlled input is used in a memcpy call without first verifying it fits within the destination buffer. One malformed compressed stream is all it takes to corrupt heap memory.

If you write C or C++, work with compression libraries, or ship software that processes user-supplied binary data, this one is worth understanding deeply.


The Vulnerability Explained

What Is LZMA?

LZMA (Lempel–Ziv–Markov chain Algorithm) is a lossless data compression algorithm used in formats like .7z, .xz, and .lzma. It's embedded in many applications and system utilities. The decompression logic is complex by nature — it must faithfully reconstruct arbitrary byte sequences from compact encoded representations — and that complexity creates opportunities for subtle bugs.

The Vulnerable Code Path

The vulnerability lives in LzmaDec_DecodeToBuf, the core function responsible for decompressing LZMA data into a caller-supplied output buffer. Here's the vulnerable section (before the fix):

// LzmaDec.c:1226 — BEFORE FIX
outSizeCur = p->dicPos - dicPos;
// ⚠️ outSizeCur is derived from decompressor internal state,
// influenced by the compressed input stream
memcpy(dest, p->dic + dicPos, outSizeCur);
dest += outSizeCur;
outSize -= outSizeCur;

The variable outSizeCur represents how many bytes were produced by the current decompression step. It's calculated from p->dicPos and dicPos — values that are directly influenced by the contents of the compressed input stream. The problem: this value is never checked against outSize, the actual remaining capacity of the destination buffer dest.

If an attacker crafts a malicious compressed stream that causes outSizeCur to exceed outSize, the memcpy will write beyond the end of the dest buffer, overwriting whatever happens to live adjacent to it on the heap.

The Broader Attack Surface

This isn't an isolated issue. The PR description identifies related patterns in the encoding side as well:

  • LzmaEnc.c:2930 — User-supplied data of size size is copied into p->data without bounds validation.
  • LzmaEnc.c:515 — A left-shift operation on lclp computes the litProbs array copy size. If lclp is attacker-influenced, the shift can produce an oversized value, again passed to a copy operation without a safety check.

All three paths share the same root cause: lengths derived from external input are trusted implicitly.

How Could This Be Exploited?

The attack scenario is straightforward for anyone who can supply compressed data to the application:

  1. Craft a malicious .lzma or .7z archive — The attacker constructs a compressed stream where the decompressed output length for a single chunk exceeds the destination buffer size.
  2. Trigger decompression — The application opens the file, passes it to the LZMA decoder, and calls LzmaDec_DecodeToBuf.
  3. Heap corruption occursmemcpy writes beyond the buffer boundary, overwriting heap metadata, adjacent objects, or function pointers.
  4. Exploitation — Depending on the heap layout and platform, the attacker may achieve:
    - Arbitrary code execution by overwriting a function pointer or vtable entry
    - Denial of service by corrupting heap metadata and triggering a crash
    - Information disclosure if the corruption causes sensitive data to be returned to the caller

This is classified as CWE-122: Heap-based Buffer Overflow, and it maps to CVSS critical severity — for good reason. Compressed file parsers are frequently exposed to untrusted input, and heap overflows in native code are among the most reliably exploitable vulnerability classes.


The Fix

The fix is elegant in its simplicity. A single bounds check was added immediately before the memcpy call:

// LzmaDec.c:1226 — AFTER FIX
outSizeCur = p->dicPos - dicPos;

// ✅ Validate before copying
if (outSizeCur > outSize)
  return SZ_ERROR_DATA;

memcpy(dest, p->dic + dicPos, outSizeCur);
dest += outSizeCur;
outSize -= outSizeCur;

Why This Works

The check if (outSizeCur > outSize) ensures that the number of bytes about to be copied never exceeds the remaining space in the destination buffer. If a malformed stream would cause an overflow, the function returns SZ_ERROR_DATA — a defined error code indicating the input data is invalid — before any memory corruption can occur.

This approach is correct for several reasons:

  • Fail-fast behavior: The function returns an error immediately rather than attempting partial recovery, preventing any ambiguous state.
  • Semantically accurate: If outSizeCur > outSize, the compressed data is genuinely malformed or malicious. SZ_ERROR_DATA is the appropriate signal.
  • No performance impact: A single integer comparison adds negligible overhead to what is already a CPU-intensive decompression loop.
  • Minimal diff: The fix is surgical — it doesn't restructure logic, introduce new dependencies, or risk regressions in the happy path.

Before and After at a Glance

Before After
Bounds check ❌ None outSizeCur > outSize
On malformed input Heap corruption Returns SZ_ERROR_DATA
Attack surface Open Closed
Performance impact Negligible

Prevention & Best Practices

This vulnerability is a textbook example of a class of bugs that has plagued C and C++ codebases for decades. Here's how to prevent it systematically.

1. Never Trust Lengths Derived from External Input

Any value that originates from a file, network packet, or user input — even indirectly through computation — must be treated as untrusted. Before using such a value as a copy length, validate it explicitly:

// ❌ Dangerous: length from external source, no validation
memcpy(dest, src, external_length);

// ✅ Safe: validate before use
if (external_length > dest_capacity) {
    return ERROR_INVALID_INPUT;
}
memcpy(dest, src, external_length);

2. Use Safer Memory Functions

Where possible, prefer bounds-checking variants:

// Prefer memcpy_s (C11 Annex K) where available
memcpy_s(dest, dest_size, src, count);

// Or use explicit size tracking
assert(count <= dest_remaining);
memcpy(dest, src, count);

3. Enable Compiler and Runtime Mitigations

Modern toolchains offer multiple layers of protection that can detect or limit the impact of buffer overflows:

  • AddressSanitizer (ASan): Detects out-of-bounds memory accesses at runtime. Run your test suite with -fsanitize=address.
  • Stack/Heap canaries: Enable with -fstack-protector-all (GCC/Clang).
  • FORTIFY_SOURCE: Compile with -D_FORTIFY_SOURCE=2 to enable compile-time and runtime checks on memcpy, strcpy, and similar functions.
  • Control Flow Integrity (CFI): Limits the impact of heap corruption on control flow.
# Example: build with multiple hardening flags
gcc -O2 -D_FORTIFY_SOURCE=2 -fstack-protector-all \
    -fsanitize=address,undefined \
    -o myapp myapp.c

4. Fuzz Your Parsers

Compression libraries and file format parsers are prime candidates for fuzzing. Tools like libFuzzer and AFL++ are highly effective at finding exactly this class of bug by generating malformed inputs automatically:

# Example: fuzz a decompression function with libFuzzer
clang -fsanitize=fuzzer,address -o fuzz_lzma fuzz_lzma.c LzmaDec.c
./fuzz_lzma corpus/

Fuzzing would have found this vulnerability — a malformed stream that produces an oversized outSizeCur is exactly the kind of input a fuzzer generates naturally.

5. Apply the Principle of Least Privilege

Even if exploitation occurs, limiting the process's privileges reduces the blast radius. Run decompression in a sandboxed process or with reduced permissions where your architecture allows it.

6. Reference Security Standards


Conclusion

A two-line fix — one comparison, one early return — closed a critical attack path that could have allowed heap corruption via malformed compressed input. That's the nature of memory safety bugs in C: the vulnerability is often small and subtle, but the consequences can be severe.

The key lessons from this vulnerability:

  • Lengths from compressed or encoded data are attacker-controlled. Always validate them before use.
  • memcpy has no safety net. It will copy exactly what you tell it to, even if that means writing off the end of your buffer.
  • Fail fast and loudly. Returning an error on invalid input is always better than proceeding with corrupted state.
  • Fuzz your parsers. Automated input generation is one of the most effective ways to find this class of bug before attackers do.

Memory corruption vulnerabilities in compression libraries have a long history of high-severity CVEs — from zlib to libpng to the LZMA SDK itself. The pattern is consistent, and so is the fix: validate inputs, check bounds, and never assume that data derived from an external source is well-formed.

Secure coding isn't about being paranoid — it's about being precise.


This vulnerability was identified and fixed as part of an automated security scanning workflow. The fix was verified by build pipeline, automated re-scan, and LLM-assisted code review.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #30

Related Articles

critical

Heap Buffer Overflow in Audio Ring Buffer: How a Missing Bounds Check Could Crash Your App

A critical heap buffer overflow vulnerability was discovered in `audio_backend.c`, where the audio ring buffer's `memcpy` operations lacked bounds validation before writing PCM data. Without checking that incoming data sizes fell within the allocated buffer's capacity, a maliciously crafted audio file could corrupt adjacent heap memory, potentially enabling arbitrary code execution. The fix adds a concise pre-flight validation guard that rejects out-of-range write requests before any memory oper

critical

Critical Heap Buffer Overflow in SSDP Control Point: How Unbounded String Operations Put Networks at Risk

A critical heap buffer overflow vulnerability was discovered and patched in the SSDP control point implementation (`ssdp_ctrlpt.c`), where multiple unbounded `strcpy` and `strcat` operations constructed HTTP request buffers without any length validation. Network-received SSDP response fields — including service type strings and location URLs — could be crafted by an attacker to exceed buffer boundaries, potentially enabling arbitrary code execution or denial of service. The fix replaces the unsa

critical

Heap Buffer Overflow in OPDS Parser: How a Misplaced Variable Nearly Opened the Door to Remote Code Execution

A critical heap buffer overflow vulnerability was discovered in `lib/OpdsParser/OpdsParser.cpp`, where the buffer allocation size was calculated *after* a fixed chunk size was used to allocate memory, meaning the actual bytes read could exceed the allocated buffer. On embedded devices parsing untrusted OPDS catalog data from the network, this flaw could allow a remote attacker to corrupt heap memory and potentially achieve arbitrary code execution. The fix was elegantly simple: move the `toRead`

critical

Heap Buffer Overflow in BLE MIDI: How a Missing Bounds Check Opens the Door to Remote Exploitation

A critical heap buffer overflow vulnerability was discovered in the BLE MIDI packet assembly code of `blemidi.c`, where attacker-controlled packet length values could trigger writes beyond allocated heap memory. The fix adds an integer overflow guard before the `malloc` call, ensuring that maliciously crafted BLE MIDI packets can no longer corrupt heap memory. This vulnerability is particularly dangerous because it is remotely exploitable by any nearby Bluetooth device — no physical access requi

critical

Heap Overflow in TOML Parser: How Integer Overflow Leads to Memory Corruption

A critical heap buffer overflow vulnerability was discovered and patched in the centitoml TOML parser, where missing integer overflow validation on a `MALLOC(len+1)` call could allow an attacker to trigger memory corruption via a crafted TOML configuration file. The vulnerability (CWE-190) is reachable through community-distributed mod or map files that the game loads from its `config/` directory, making it a realistic attack vector for remote code execution. A targeted one-line guard now preven

critical

Heap Corruption via Unchecked memcpy: How Integer Overflow Bugs Corrupt Memory in Windows File Operations

A critical buffer overflow vulnerability was discovered in `phlib/nativefile.c`, where multiple `memcpy` calls copied filename and extended-attribute data into fixed-size structures without verifying that source lengths didn't exceed destination buffer boundaries. An attacker supplying an oversized filename or EA name could corrupt adjacent heap memory, potentially enabling arbitrary code execution. The fix replaces unchecked arithmetic with Windows' safe integer helpers (`RtlULongAdd`, `RtlULon