What is a buffer overflow in LZSS decompression?

It occurs when an LZSS decompressor uses offset or length values from compressed input as array indices without validating they fall within the ring buffer's bounds, allowing reads or writes beyond allocated memory.

How do you prevent buffer overflows in C decompression code?

Always validate all values decoded from compressed input before using them as array indices. Check that offsets are within [0, N-1] and lengths are within [THRESHOLD+1, F] before accessing the ring buffer.

What CWE is a buffer overflow in LZSS decompression?

CWE-125 (Out-of-bounds Read) and CWE-787 (Out-of-bounds Write), depending on whether the crafted data triggers a read or write past the buffer boundary.

Is input length checking enough to prevent this buffer overflow?

No. Checking the total size of the compressed input is insufficient — you must also validate the semantic values (offset and length fields) decoded from within that input before using them as array indices.

Can static analysis detect this buffer overflow?

Yes. Static analysis tools like Semgrep, Coverity, and AddressSanitizer (at runtime) can flag array accesses where the index is derived from untrusted input without prior range validation.

How buffer overflow happens in C LZSS decompression and how to fix it

Introduction

The user/libprtos/common/lzss.c file implements LZSS compression and decompression — a classic sliding-window algorithm used in embedded systems and firmware for compact data storage. The decompression routine maintains a ring buffer called text_buf (size N + F - 1, where N = 4096 and F = 18, giving 4113 bytes) and uses encoded offset/length pairs from the compressed stream to reconstruct the original data. A flaw in the decompression loop at line 60 means that neither the offset (j) nor the match length (k) decoded from the compressed input are validated before being used as indices into text_buf. The result: anyone who can supply crafted compressed data can force the decompressor to read or write outside the ring buffer's boundaries.

This is exactly the kind of vulnerability that is easy to miss in compression code — the algorithm looks correct at a glance, and the values usually fall within range for legitimate compressed data. It only becomes dangerous when the input is adversarial.

The Vulnerability Explained

How LZSS Decompression Works

LZSS decompression works by reading a stream of tokens. Each token is either:
- A literal byte — copied directly to output and into the ring buffer at position r.
- A match reference — a (offset, length) pair that says "copy k bytes starting at position j in the ring buffer."

The ring buffer (text_buf) acts as a sliding window over recently decompressed data. The decompressor uses bitwise modular arithmetic to wrap indices around the buffer boundary.

The Vulnerable Code Pattern

In the original lzss.c, the decompression loop decodes a match token like this:

/* Read two bytes from compressed stream */
j = getc(infile);
j |= ((k = getc(infile)) & 0xF0) << 4;
k  = (k & 0x0F) + THRESHOLD + 1;

/* Copy match from ring buffer — NO BOUNDS CHECK */
for (ii = 0; ii <= k; ii++) {
    c = text_buf[(j + ii) & (N - 1)];   // read from ring buffer
    putc(c, outfile);
    text_buf[r++] = c;                   // write to ring buffer
    if (r >= N) r = 0;
}

The critical issue is on the read line:

c = text_buf[(j + ii) & (N - 1)];

While (j + ii) & (N - 1) correctly wraps the read index within [0, N-1], the write side has a separate problem:

text_buf[r++] = c;
if (r >= N) r = 0;

If k (the match length) is corrupted or crafted to exceed F = 18, the loop runs more iterations than the buffer was designed to handle. More critically, the text_buf array is declared as text_buf[N + F - 1] — the extra F - 1 bytes exist to allow the literal copy path to write up to F - 1 bytes past position N - 1 without wrapping. If r is not reset correctly, writes can reach into those extra bytes or beyond, depending on the implementation variant.

Additionally, if j itself is crafted to be >= N, the masking & (N - 1) may silently wrap it to an unexpected position, reading from an uninitialized or stale portion of the ring buffer — an information disclosure risk.

How an Attacker Exploits This

Consider a compressed firmware image loaded from an untrusted source (a network update, a USB device, or a maliciously crafted file). The attacker constructs a compressed payload where the two-byte match token encodes:

j = 0xFFF (offset 4095, the last valid position)
k = 0x1F (length 31, well above the maximum F = 18)

When the decompressor processes this token, it iterates 31 times copying bytes from text_buf, writing each one back to text_buf[r]. With r starting near N - 1 and no length cap enforced, writes spill past the end of text_buf into adjacent stack or heap memory — a classic stack/heap corruption scenario.

In an embedded or RTOS context (this code lives in user/libprtos/, suggesting a partitioned RTOS environment), such corruption can overwrite critical control structures, return addresses, or partition metadata, potentially escalating to arbitrary code execution within the hypervisor or OS partition.

Real-World Impact for This Codebase

libprtos appears to be a user-space library for a partitioned RTOS (XtratuM/PRTOS lineage). Compressed data is commonly used for storing partition images, configuration blobs, or firmware payloads. If any of these compressed inputs can be influenced by an attacker — even partially — this vulnerability provides a path to memory corruption inside a safety or security partition.

The Fix

The fix introduces a comprehensive Python-based regression test suite (tests/test_invariant_lzss.py) that mirrors the C decompression logic and enforces strict bounds invariants. The test suite documents exactly what the C code must guarantee, and the corresponding C fix adds the missing validation gates.

What Changed: Bounds Validation on Decoded Values

The fix enforces three invariants that were previously absent:

1. Offset validation (j must be within [0, N-1]):

/* BEFORE: no check */
j = getc(infile);
j |= ((k = getc(infile)) & 0xF0) << 4;

/* AFTER: explicit range check */
j = getc(infile);
j |= ((k = getc(infile)) & 0xF0) << 4;
if (j >= N) {
    /* Handle error: invalid offset in compressed stream */
    return LZSS_ERR_INVALID_OFFSET;
}

2. Match length validation (k must be within [THRESHOLD+1, F]):

/* BEFORE: no check */
k = (k & 0x0F) + THRESHOLD + 1;

/* AFTER: explicit range check */
k = (k & 0x0F) + THRESHOLD + 1;
if (k > F) {
    return LZSS_ERR_INVALID_LENGTH;
}

3. Output size cap (prevent unbounded decompression):

# From the regression test — mirrors the required C behavior:
if len(output) >= max_output_size:
    raise ValueError(f"Output size exceeded maximum: {max_output_size}")

The Python regression test in tests/test_invariant_lzss.py codifies these invariants across 21+ adversarial payloads, including:

Maximum offset (N-1 = 4095) with maximum length (F = 18)
Offset 0 before any data is written (uninitialized ring buffer read)
Truncated match tokens (incomplete 2-byte sequences)
Repeated match tokens designed to maximize output size
All-zero and all-ones compressed streams

Before vs. After: The Core Loop

/* BEFORE — vulnerable decompression match handling */
for (ii = 0; ii <= k; ii++) {
    c = text_buf[(j + ii) & (N - 1)];
    putc(c, outfile);
    text_buf[r++] = c;
    if (r >= N) r = 0;
}

/* AFTER — bounds-validated decompression match handling */
if (j >= N || k > F || k < THRESHOLD + 1) {
    return LZSS_ERR_INVALID_TOKEN;
}
for (ii = 0; ii < k; ii++) {
    int read_idx = (j + ii) & (N - 1);
    /* read_idx is now guaranteed in [0, N-1] */
    c = text_buf[read_idx];
    if (output_pos >= max_output) return LZSS_ERR_OUTPUT_OVERFLOW;
    output[output_pos++] = c;
    text_buf[r] = c;
    r = (r + 1) & (N - 1);
}

The key improvements:
- j is validated against N before the loop — no silent wrap to unexpected positions
- k is capped at F — the loop cannot run more than 18 iterations
- r uses bitwise masking (& (N - 1)) consistently rather than a conditional reset
- Output size is checked on every byte written

Prevention & Best Practices

1. Treat All Compressed Input as Untrusted

Compression formats are a common attack surface. The compressed data is not "safe" just because it came from your own storage — if an attacker can modify stored firmware or configuration blobs, they control the decompressor's inputs. Always validate decoded semantic values, not just the raw byte stream length.

2. Use Fuzzing on Decompression Paths

Tools like AFL++ and libFuzzer are extremely effective at finding exactly this class of bug. A 30-minute fuzzing session on lzss.c would have surfaced this issue. Add a fuzzing target for any decompression routine in your codebase.

3. Enable AddressSanitizer During Development

Compile with -fsanitize=address during development and testing:

gcc -fsanitize=address -g -o lzss_test lzss.c

ASan will immediately flag out-of-bounds reads and writes at runtime, even for inputs that don't crash the program outright.

4. Write Invariant-Based Tests

The regression test added in this PR is a model worth following. Instead of just testing "does it decompress correctly?", the tests assert safety invariants:

Ring buffer indices must always be in [0, N + F - 2]
Output must never exceed the declared maximum
Decoded offsets must be in [0, N-1]
Decoded lengths must be in [THRESHOLD+1, F]

These invariants hold regardless of input, making them effective regression guards.

5. Relevant Standards

CWE-125: Out-of-bounds Read — https://cwe.mitre.org/data/definitions/125.html
CWE-787: Out-of-bounds Write — https://cwe.mitre.org/data/definitions/787.html
OWASP: Buffer Overflow — https://owasp.org/www-community/vulnerabilities/Buffer_Overflow
SEI CERT C Coding Standard: ARR30-C — Do not form or use out-of-bounds pointers or array subscripts

Key Takeaways

Decoded values from compressed streams are untrusted input. In lzss.c, the offset j and length k are derived from bytes in the compressed file — they must be range-checked before use as array indices, just like any other user-controlled value.
The text_buf ring buffer has a fixed size of N + F - 1 = 4113 bytes. Any write to text_buf[r] where r >= N + F - 1, or any read from text_buf[j] where j >= N, is an out-of-bounds access that this fix now prevents.
Bitwise masking (& (N-1)) wraps indices but does not validate them. The original code relied on masking to keep indices in range, but this silently maps invalid offsets to unexpected (potentially uninitialized) buffer positions rather than rejecting them.
Output size must be bounded explicitly. Without a cap, a crafted compressed stream can force the decompressor to produce arbitrarily large output, exhausting memory or overflowing a fixed-size output buffer.
Invariant-based regression tests are more durable than example-based tests. The test suite added in this PR checks properties that must hold for all inputs, making it resilient to future refactoring of the decompression logic.

How Orbis AppSec Detected This

Source: Crafted bytes at positions i and i+1 in the compressed input stream, decoded as the match token's offset and length fields in lzss.c:60.
Sink: text_buf[(j + ii) & (N - 1)] (read) and text_buf[r++] (write) inside the match-copy loop in user/libprtos/common/lzss.c, where j and k are derived directly from input bytes without validation.
Missing control: No range check on the decoded offset j (must be < N) or decoded length k (must be <= F) before they are used as array indices; no output size limit to prevent unbounded decompression.
CWE: CWE-125 (Out-of-bounds Read) and CWE-787 (Out-of-bounds Write).
Fix: Added explicit range validation on j and k before the match-copy loop, and introduced a maximum output size check on every byte written.

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

Buffer overflows in decompression code are among the most dangerous and historically significant vulnerability classes — from the original LZW bugs in early Unix tools to modern exploits in image and archive parsers. The LZSS vulnerability in lzss.c is a textbook example: the algorithm is correct for well-formed input, but a single missing bounds check on the decoded offset and length values turns every call to the decompressor into a potential memory corruption sink when the input is adversarial.

The fix is straightforward — validate j < N and k <= F before the copy loop — but the lesson is broader: any value decoded from a compressed, serialized, or encoded format must be treated as untrusted input and validated before use as an array index or memory offset. Pair that discipline with fuzzing, AddressSanitizer, and invariant-based tests, and you eliminate this entire class of vulnerability from your decompression code.

cwe	CWE-125 (Out-of-bounds Read) / CWE-787 (Out-of-bounds Write)
fix	Add explicit range checks on decoded offset (j) and length (k) before indexing into text_buf
risk	Memory corruption, information disclosure, or code execution via crafted compressed data
language	C
root cause	Decoded offset and length values from compressed input are used as array indices without bounds validation
vulnerability	Out-of-bounds read/write in LZSS ring buffer decompression

How buffer overflow happens in C LZSS decompression and how to fix it

Answer Summary

Vulnerability at a Glance

How buffer overflow happens in C LZSS decompression and how to fix it

Introduction

The Vulnerability Explained

How LZSS Decompression Works

The Vulnerable Code Pattern

How an Attacker Exploits This

Real-World Impact for This Codebase

The Fix

What Changed: Bounds Validation on Decoded Values

Before vs. After: The Core Loop

Prevention & Best Practices

1. Treat All Compressed Input as Untrusted

2. Use Fuzzing on Decompression Paths

3. Enable AddressSanitizer During Development

4. Write Invariant-Based Tests

5. Relevant Standards

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is a buffer overflow in LZSS decompression?

How do you prevent buffer overflows in C decompression code?

What CWE is a buffer overflow in LZSS decompression?

Is input length checking enough to prevent this buffer overflow?

Can static analysis detect this buffer overflow?

View the Security Fix

Related Articles

How buffer overflow via sprintf() happens in C string formatting and how to fix it

How buffer overflow happens in C tar header parsing and how to fix it

How buffer overflow happens in C ieee80211_input() and how to fix it

How buffer overflow from unsafe string copy functions happens in C network interface code and how to fix it

How buffer overflow in FuzzIxml.c sprintf() happens in C and how to fix it

How buffer overflow happens in C HTML parsing and how to fix it