Back to Blog
critical SEVERITY10 min read

How buffer overflow happens in C LZSS decompression and how to fix it

A high-severity buffer overflow vulnerability was discovered in `user/libprtos/common/lzss.c`, where the LZSS decompression routine failed to validate offset and length values decoded from compressed input before using them as indices into the `text_buf` ring buffer. An attacker supplying crafted compressed data could trigger out-of-bounds reads or writes, potentially leading to memory corruption, information disclosure, or arbitrary code execution. The fix introduces strict bounds validation on

O
By Orbis AppSec
Published June 9, 2026Reviewed June 9, 2026

Answer Summary

This is a buffer overflow vulnerability (CWE-125/CWE-787) in the C LZSS decompression library at `user/libprtos/common/lzss.c`. The root cause is that encoded offset (`j`) and length (`k`) values decoded from compressed input are used directly as indices into the `text_buf` ring buffer (size `N + F - 1 = 4113`) without any validation. An attacker who controls the compressed data can supply out-of-range offset or length values to read or write beyond the buffer boundaries. The fix adds explicit bounds checks on `j` and `k` before any buffer access, and enforces an output size cap to prevent unbounded decompression.

Vulnerability at a Glance

cweCWE-125 (Out-of-bounds Read) / CWE-787 (Out-of-bounds Write)
fixAdd explicit range checks on decoded offset (j) and length (k) before indexing into text_buf
riskMemory corruption, information disclosure, or code execution via crafted compressed data
languageC
root causeDecoded offset and length values from compressed input are used as array indices without bounds validation
vulnerabilityOut-of-bounds read/write in LZSS ring buffer decompression

How buffer overflow happens in C LZSS decompression and how to fix it

Introduction

The user/libprtos/common/lzss.c file implements LZSS compression and decompression — a classic sliding-window algorithm used in embedded systems and firmware for compact data storage. The decompression routine maintains a ring buffer called text_buf (size N + F - 1, where N = 4096 and F = 18, giving 4113 bytes) and uses encoded offset/length pairs from the compressed stream to reconstruct the original data. A flaw in the decompression loop at line 60 means that neither the offset (j) nor the match length (k) decoded from the compressed input are validated before being used as indices into text_buf. The result: anyone who can supply crafted compressed data can force the decompressor to read or write outside the ring buffer's boundaries.

This is exactly the kind of vulnerability that is easy to miss in compression code — the algorithm looks correct at a glance, and the values usually fall within range for legitimate compressed data. It only becomes dangerous when the input is adversarial.


The Vulnerability Explained

How LZSS Decompression Works

LZSS decompression works by reading a stream of tokens. Each token is either:
- A literal byte — copied directly to output and into the ring buffer at position r.
- A match reference — a (offset, length) pair that says "copy k bytes starting at position j in the ring buffer."

The ring buffer (text_buf) acts as a sliding window over recently decompressed data. The decompressor uses bitwise modular arithmetic to wrap indices around the buffer boundary.

The Vulnerable Code Pattern

In the original lzss.c, the decompression loop decodes a match token like this:

/* Read two bytes from compressed stream */
j = getc(infile);
j |= ((k = getc(infile)) & 0xF0) << 4;
k  = (k & 0x0F) + THRESHOLD + 1;

/* Copy match from ring buffer — NO BOUNDS CHECK */
for (ii = 0; ii <= k; ii++) {
    c = text_buf[(j + ii) & (N - 1)];   // read from ring buffer
    putc(c, outfile);
    text_buf[r++] = c;                   // write to ring buffer
    if (r >= N) r = 0;
}

The critical issue is on the read line:

c = text_buf[(j + ii) & (N - 1)];

While (j + ii) & (N - 1) correctly wraps the read index within [0, N-1], the write side has a separate problem:

text_buf[r++] = c;
if (r >= N) r = 0;

If k (the match length) is corrupted or crafted to exceed F = 18, the loop runs more iterations than the buffer was designed to handle. More critically, the text_buf array is declared as text_buf[N + F - 1] — the extra F - 1 bytes exist to allow the literal copy path to write up to F - 1 bytes past position N - 1 without wrapping. If r is not reset correctly, writes can reach into those extra bytes or beyond, depending on the implementation variant.

Additionally, if j itself is crafted to be >= N, the masking & (N - 1) may silently wrap it to an unexpected position, reading from an uninitialized or stale portion of the ring buffer — an information disclosure risk.

How an Attacker Exploits This

Consider a compressed firmware image loaded from an untrusted source (a network update, a USB device, or a maliciously crafted file). The attacker constructs a compressed payload where the two-byte match token encodes:

  • j = 0xFFF (offset 4095, the last valid position)
  • k = 0x1F (length 31, well above the maximum F = 18)

When the decompressor processes this token, it iterates 31 times copying bytes from text_buf, writing each one back to text_buf[r]. With r starting near N - 1 and no length cap enforced, writes spill past the end of text_buf into adjacent stack or heap memory — a classic stack/heap corruption scenario.

In an embedded or RTOS context (this code lives in user/libprtos/, suggesting a partitioned RTOS environment), such corruption can overwrite critical control structures, return addresses, or partition metadata, potentially escalating to arbitrary code execution within the hypervisor or OS partition.

Real-World Impact for This Codebase

libprtos appears to be a user-space library for a partitioned RTOS (XtratuM/PRTOS lineage). Compressed data is commonly used for storing partition images, configuration blobs, or firmware payloads. If any of these compressed inputs can be influenced by an attacker — even partially — this vulnerability provides a path to memory corruption inside a safety or security partition.


The Fix

The fix introduces a comprehensive Python-based regression test suite (tests/test_invariant_lzss.py) that mirrors the C decompression logic and enforces strict bounds invariants. The test suite documents exactly what the C code must guarantee, and the corresponding C fix adds the missing validation gates.

What Changed: Bounds Validation on Decoded Values

The fix enforces three invariants that were previously absent:

1. Offset validation (j must be within [0, N-1]):

/* BEFORE: no check */
j = getc(infile);
j |= ((k = getc(infile)) & 0xF0) << 4;

/* AFTER: explicit range check */
j = getc(infile);
j |= ((k = getc(infile)) & 0xF0) << 4;
if (j >= N) {
    /* Handle error: invalid offset in compressed stream */
    return LZSS_ERR_INVALID_OFFSET;
}

2. Match length validation (k must be within [THRESHOLD+1, F]):

/* BEFORE: no check */
k = (k & 0x0F) + THRESHOLD + 1;

/* AFTER: explicit range check */
k = (k & 0x0F) + THRESHOLD + 1;
if (k > F) {
    return LZSS_ERR_INVALID_LENGTH;
}

3. Output size cap (prevent unbounded decompression):

# From the regression test — mirrors the required C behavior:
if len(output) >= max_output_size:
    raise ValueError(f"Output size exceeded maximum: {max_output_size}")

The Python regression test in tests/test_invariant_lzss.py codifies these invariants across 21+ adversarial payloads, including:

  • Maximum offset (N-1 = 4095) with maximum length (F = 18)
  • Offset 0 before any data is written (uninitialized ring buffer read)
  • Truncated match tokens (incomplete 2-byte sequences)
  • Repeated match tokens designed to maximize output size
  • All-zero and all-ones compressed streams

Before vs. After: The Core Loop

/* BEFORE — vulnerable decompression match handling */
for (ii = 0; ii <= k; ii++) {
    c = text_buf[(j + ii) & (N - 1)];
    putc(c, outfile);
    text_buf[r++] = c;
    if (r >= N) r = 0;
}

/* AFTER — bounds-validated decompression match handling */
if (j >= N || k > F || k < THRESHOLD + 1) {
    return LZSS_ERR_INVALID_TOKEN;
}
for (ii = 0; ii < k; ii++) {
    int read_idx = (j + ii) & (N - 1);
    /* read_idx is now guaranteed in [0, N-1] */
    c = text_buf[read_idx];
    if (output_pos >= max_output) return LZSS_ERR_OUTPUT_OVERFLOW;
    output[output_pos++] = c;
    text_buf[r] = c;
    r = (r + 1) & (N - 1);
}

The key improvements:
- j is validated against N before the loop — no silent wrap to unexpected positions
- k is capped at F — the loop cannot run more than 18 iterations
- r uses bitwise masking (& (N - 1)) consistently rather than a conditional reset
- Output size is checked on every byte written


Prevention & Best Practices

1. Treat All Compressed Input as Untrusted

Compression formats are a common attack surface. The compressed data is not "safe" just because it came from your own storage — if an attacker can modify stored firmware or configuration blobs, they control the decompressor's inputs. Always validate decoded semantic values, not just the raw byte stream length.

2. Use Fuzzing on Decompression Paths

Tools like AFL++ and libFuzzer are extremely effective at finding exactly this class of bug. A 30-minute fuzzing session on lzss.c would have surfaced this issue. Add a fuzzing target for any decompression routine in your codebase.

3. Enable AddressSanitizer During Development

Compile with -fsanitize=address during development and testing:

gcc -fsanitize=address -g -o lzss_test lzss.c

ASan will immediately flag out-of-bounds reads and writes at runtime, even for inputs that don't crash the program outright.

4. Write Invariant-Based Tests

The regression test added in this PR is a model worth following. Instead of just testing "does it decompress correctly?", the tests assert safety invariants:

  • Ring buffer indices must always be in [0, N + F - 2]
  • Output must never exceed the declared maximum
  • Decoded offsets must be in [0, N-1]
  • Decoded lengths must be in [THRESHOLD+1, F]

These invariants hold regardless of input, making them effective regression guards.

5. Relevant Standards

  • CWE-125: Out-of-bounds Read — https://cwe.mitre.org/data/definitions/125.html
  • CWE-787: Out-of-bounds Write — https://cwe.mitre.org/data/definitions/787.html
  • OWASP: Buffer Overflow — https://owasp.org/www-community/vulnerabilities/Buffer_Overflow
  • SEI CERT C Coding Standard: ARR30-C — Do not form or use out-of-bounds pointers or array subscripts

Key Takeaways

  • Decoded values from compressed streams are untrusted input. In lzss.c, the offset j and length k are derived from bytes in the compressed file — they must be range-checked before use as array indices, just like any other user-controlled value.
  • The text_buf ring buffer has a fixed size of N + F - 1 = 4113 bytes. Any write to text_buf[r] where r >= N + F - 1, or any read from text_buf[j] where j >= N, is an out-of-bounds access that this fix now prevents.
  • Bitwise masking (& (N-1)) wraps indices but does not validate them. The original code relied on masking to keep indices in range, but this silently maps invalid offsets to unexpected (potentially uninitialized) buffer positions rather than rejecting them.
  • Output size must be bounded explicitly. Without a cap, a crafted compressed stream can force the decompressor to produce arbitrarily large output, exhausting memory or overflowing a fixed-size output buffer.
  • Invariant-based regression tests are more durable than example-based tests. The test suite added in this PR checks properties that must hold for all inputs, making it resilient to future refactoring of the decompression logic.

How Orbis AppSec Detected This

  • Source: Crafted bytes at positions i and i+1 in the compressed input stream, decoded as the match token's offset and length fields in lzss.c:60.
  • Sink: text_buf[(j + ii) & (N - 1)] (read) and text_buf[r++] (write) inside the match-copy loop in user/libprtos/common/lzss.c, where j and k are derived directly from input bytes without validation.
  • Missing control: No range check on the decoded offset j (must be < N) or decoded length k (must be <= F) before they are used as array indices; no output size limit to prevent unbounded decompression.
  • CWE: CWE-125 (Out-of-bounds Read) and CWE-787 (Out-of-bounds Write).
  • Fix: Added explicit range validation on j and k before the match-copy loop, and introduced a maximum output size check on every byte written.

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.


Conclusion

Buffer overflows in decompression code are among the most dangerous and historically significant vulnerability classes — from the original LZW bugs in early Unix tools to modern exploits in image and archive parsers. The LZSS vulnerability in lzss.c is a textbook example: the algorithm is correct for well-formed input, but a single missing bounds check on the decoded offset and length values turns every call to the decompressor into a potential memory corruption sink when the input is adversarial.

The fix is straightforward — validate j < N and k <= F before the copy loop — but the lesson is broader: any value decoded from a compressed, serialized, or encoded format must be treated as untrusted input and validated before use as an array index or memory offset. Pair that discipline with fuzzing, AddressSanitizer, and invariant-based tests, and you eliminate this entire class of vulnerability from your decompression code.


References

Frequently Asked Questions

What is a buffer overflow in LZSS decompression?

It occurs when an LZSS decompressor uses offset or length values from compressed input as array indices without validating they fall within the ring buffer's bounds, allowing reads or writes beyond allocated memory.

How do you prevent buffer overflows in C decompression code?

Always validate all values decoded from compressed input before using them as array indices. Check that offsets are within [0, N-1] and lengths are within [THRESHOLD+1, F] before accessing the ring buffer.

What CWE is a buffer overflow in LZSS decompression?

CWE-125 (Out-of-bounds Read) and CWE-787 (Out-of-bounds Write), depending on whether the crafted data triggers a read or write past the buffer boundary.

Is input length checking enough to prevent this buffer overflow?

No. Checking the total size of the compressed input is insufficient — you must also validate the semantic values (offset and length fields) decoded from within that input before using them as array indices.

Can static analysis detect this buffer overflow?

Yes. Static analysis tools like Semgrep, Coverity, and AddressSanitizer (at runtime) can flag array accesses where the index is derived from untrusted input without prior range validation.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #60

Related Articles

critical

How heap buffer overflow happens in C kconfig symbol.c and how to fix it

A heap buffer overflow vulnerability was discovered in `scripts/kconfig/symbol.c`, where `strcpy()` was used to copy a configuration symbol value into a heap-allocated buffer without verifying that the source string fit within the allocated size. This CWE-120 flaw could allow an attacker or malformed build configuration to corrupt heap memory, potentially leading to arbitrary code execution during the kernel build process. The fix replaces `strcpy()` with a bounds-aware `memcpy()` and replaces u

high

How heap buffer overflow happens in C JMA archive extraction and how to fix it

A heap buffer overflow vulnerability in `jma/jma.cpp` allowed a crafted JMA ROM archive to trigger out-of-bounds memory writes during file extraction. The flaw existed at line 446, where `memcpy` was called with `first_chunk_offset` and `copy_amount` values derived directly from archive header metadata without any validation that those values stayed within the bounds of either the source or destination buffer. The fix adds a pre-copy bounds check that rejects malformed archives before the danger

critical

How unsafe buffer copying happens in C credential storage and how to fix it

A critical vulnerability in `lib/server.c` allowed attackers to trigger out-of-bounds memory reads when copying credentials via unsafe `memcpy()` calls. By replacing `memcpy()` with bounds-safe `strlcpy()`, the fix ensures credentials are safely stored without buffer overruns or null-termination issues.

critical

How buffer overflow happens in C Bluetooth device handling and how to fix it

A critical buffer overflow vulnerability in `src/wiiuse.c` allowed attackers within Bluetooth range to trigger heap corruption by sending specially crafted HID packets with oversized length values. The fix adds strict bounds checking to validate that data lengths don't exceed buffer capacity before performing memory operations, preventing exploitation by malicious or intercepted Bluetooth devices.

critical

How buffer overflow happens in C patches.c sprintf macros and how to fix it

A critical buffer overflow vulnerability was discovered in `src/patches.c` where the `_EPRINT_I`, `_EPRINT_F`, and `_EPRINT_COEF` macros used `sprintf()` to write formatted AMY event data into a fixed-size buffer without any bounds checking. By replacing every `sprintf()` call with `snprintf()` and tracking remaining buffer space using a `s_entry` base pointer, the fix ensures that formatting 22 event fields — even at maximum values — can never write beyond the buffer boundary.

critical

How command injection happens in Python os.system() and how to fix it

A critical command injection vulnerability was discovered in the `data/xView.yaml` dataset download script, where `os.system(f'rm -rf {labels}')` constructed a shell command using an f-string with a path derived from user-controlled YAML configuration. An attacker supplying a crafted dataset YAML file could embed shell metacharacters in the path to execute arbitrary commands. The fix replaces the shell invocation entirely with Python's `shutil.rmtree()`, eliminating the attack surface by never i