Back to Blog
critical SEVERITY9 min read

How out-of-bounds reads happen in C gettext .mo file parsers and how to fix it

A missing bounds check in the gettext `.mo` file parser inside `compose/asc-utils-l10n.c` allowed a malformed or truncated file to trigger out-of-bounds reads from heap memory. The vulnerability affected two distinct read sites — a `memcpy` of the full `AscLocaleGettextHeader` struct at line 131 and a 4-byte offset read at line 224 — neither of which validated that the source buffer was large enough. The fix adds explicit size checks before both reads, rejecting invalid files with a descriptive

O
By Orbis AppSec
Published June 18, 2026Reviewed June 18, 2026

Answer Summary

This is a heap out-of-bounds read vulnerability (CWE-125) in the C function `asc_l10n_parse_file_gettext()` inside `compose/asc-utils-l10n.c`. The root cause is that `memcpy(&h, data, sizeof(AscLocaleGettextHeader))` and a subsequent 4-byte offset read both execute without first verifying that `data_len` is large enough to satisfy the copy. The fix captures the buffer length via `g_bytes_get_data(bytes, &data_len)` and inserts an explicit `if (data_len < sizeof(AscLocaleGettextHeader))` guard — and a matching `if (m + 4 > len)` guard for the QM section parser — so that malformed input is rejected before any unsafe memory access occurs.

Vulnerability at a Glance

cweCWE-125
fixCapture `data_len` from `g_bytes_get_data()` and add `data_len < sizeof(AscLocaleGettextHeader)` and `m + 4 > len` guards before each unsafe read
riskHeap memory disclosure or crash when parsing a malformed gettext .mo or Qt .qm locale file
languageC
root cause`g_bytes_get_data()` was called without capturing the buffer length, so subsequent `memcpy` and offset reads had no upper bound to check against
vulnerabilityOut-of-bounds read via unchecked memcpy on attacker-controlled file data

How out-of-bounds reads happen in C gettext .mo file parsers and how to fix it

Summary

A missing bounds check in the gettext .mo file parser inside compose/asc-utils-l10n.c allowed a malformed or truncated file to trigger out-of-bounds reads from heap memory. The vulnerability affected two distinct read sites — a memcpy of the full AscLocaleGettextHeader struct at line 131 and a 4-byte offset read at line 224 — neither of which validated that the source buffer was large enough. The fix adds explicit size checks before both reads, rejecting invalid files with a descriptive error instead of reading past the end of allocated memory.


Introduction

The compose/asc-utils-l10n.c file is responsible for parsing localization data — specifically gettext .mo binary files and Qt .qm files — as part of the AppStream Compose toolchain. It is a natural target for malformed-input attacks because it processes files that can be supplied by third parties, packagers, or build pipelines. A flaw in asc_l10n_parse_file_gettext() created a situation where a single truncated .mo file could cause the parser to read heap memory it was never supposed to touch.

The problematic pattern is subtle and common in C: g_bytes_get_data() returns both a pointer and a length, but the length parameter is optional — you can pass NULL if you don't care about it. The original code did exactly that:

data = g_bytes_get_data (bytes, NULL);  // length silently discarded

From that point forward, every read from data was unbounded. The very next substantive operation was a memcpy that assumed the buffer was at least sizeof(AscLocaleGettextHeader) bytes long — an assumption that held for well-formed files but silently broke for truncated or crafted ones.


The Vulnerability Explained

What the parser does

When asc_l10n_parse_file_gettext() is called, it reads a .mo file into a GBytes buffer, then immediately copies the first sizeof(AscLocaleGettextHeader) bytes into a local struct h to inspect the magic number and determine byte order:

/* VULNERABLE — before the fix */
data = g_bytes_get_data (bytes, NULL);   // (1) length discarded

/* we only strictly need the header */
memcpy (&h, data, sizeof (AscLocaleGettextHeader));  // (2) no size check

If the file on disk is shorter than sizeof(AscLocaleGettextHeader) bytes — whether because it was truncated in transit, deliberately crafted by an attacker, or simply corrupt — the memcpy at step (2) reads past the end of the heap allocation backing bytes. The GLib GBytes allocator does not pad buffers, so adjacent heap metadata or unrelated allocations sit immediately after the buffer boundary.

A second vulnerable site existed in the Qt .qm section parser (asc_l10n_parse_file_qt):

/* VULNERABLE — before the fix */
AscLocaleQmSection section = _read_uint8 (data, &m);
guint32 section_len = _read_uint32 (data, &m);  // no check that m+4 <= len

Here, _read_uint32 advances the cursor m by 4 bytes. If m was already within 4 bytes of len, the read extended past the buffer boundary.

How an attacker exploits this

An attacker who can supply a .mo or .qm file to the AppStream Compose pipeline — for example, by contributing a malicious package to a distribution repository, or by intercepting a build artifact — can craft a file that is exactly 1 byte long. When asc_l10n_parse_file_gettext() processes it:

  1. g_bytes_get_data() returns a 1-byte heap allocation.
  2. memcpy(&h, data, sizeof(AscLocaleGettextHeader)) copies sizeof(AscLocaleGettextHeader) bytes (typically 28 bytes on a 64-bit system) starting from that 1-byte buffer.
  3. The 27 bytes past the buffer boundary are read from adjacent heap memory.

Depending on heap layout, those 27 bytes could contain fragments of other allocations — file paths, authentication tokens, or other sensitive strings processed earlier in the same compose run. In a crash scenario, the contents may appear in a core dump or error log. In a more targeted exploit, the attacker can use the magic-number branch (h.magic == 0x950412de) as an oracle to infer heap contents.

Real-world impact: The AppStream Compose tool runs as part of distribution metadata generation pipelines. A successful exploit could leak heap contents from the compose process, crash the pipeline, or — in combination with a write primitive — escalate to arbitrary code execution.


The Fix

The fix addresses both vulnerable sites with minimal, surgical changes.

Fix 1 — Capture the buffer length and guard the header memcpy

Before:

data = g_bytes_get_data (bytes, NULL);

/* we only strictly need the header */
memcpy (&h, data, sizeof (AscLocaleGettextHeader));

After:

gsize data_len = 0;
// ...
data = g_bytes_get_data (bytes, &data_len);  // length is now captured

/* we only strictly need the header */
if (data_len < sizeof (AscLocaleGettextHeader)) {
    g_set_error_literal (error,
                         ASC_COMPOSE_ERROR,
                         ASC_COMPOSE_ERROR_FAILED,
                         "Gettext file is too small to be valid");
    return FALSE;
}
memcpy (&h, data, sizeof (AscLocaleGettextHeader));

The key change is passing &data_len instead of NULL to g_bytes_get_data(). This one-character change unlocks all subsequent bounds checking. The guard immediately before memcpy ensures that if the file is shorter than the header struct, the function returns a descriptive GError rather than reading invalid memory. The memcpy itself is unchanged — it was never the problem; the missing precondition was.

Fix 2 — Guard the Qt section length read

Before:

AscLocaleQmSection section = _read_uint8 (data, &m);
guint32 section_len = _read_uint32 (data, &m);

After:

guint32 section_len;
AscLocaleQmSection section = _read_uint8 (data, &m);
if (m + 4 > len)
    break;
section_len = _read_uint32 (data, &m);

Note also that section_len was moved from a declaration-with-initializer inside the loop body to a plain declaration before the guard. This is necessary in C89/C90-compatible code where declarations must precede statements, but it also makes the control flow clearer: we declare the variable, check whether reading it is safe, and only then perform the read.

The break exits the section-parsing loop cleanly, allowing the caller to handle whatever translations were successfully parsed up to that point rather than crashing or returning garbage.

Why these two fixes are sufficient

Both vulnerable reads shared the same root cause: the buffer length was available (via g_bytes_get_data and the len variable respectively) but was not consulted before the read. The fix does not change the parsing logic — it simply inserts the missing precondition checks that should have been there from the start.


Prevention & Best Practices

1. Never discard the length from g_bytes_get_data()

The GLib documentation explicitly provides the length out-parameter for this reason. Treat g_bytes_get_data(bytes, NULL) as a code smell in any code that subsequently indexes into the returned pointer.

2. Validate every fixed-size read against the remaining buffer

A useful pattern is a helper macro or inline function:

#define CHECK_READ(offset, size, total_len) \
    do { if ((offset) + (size) > (total_len)) goto parse_error; } while (0)

Apply it before every memcpy, pointer cast, or multi-byte integer read from an external data source.

3. Use AddressSanitizer (ASan) in CI

Compile with -fsanitize=address during testing. ASan would have caught both of these reads immediately on a fuzzing run with a truncated input file. Pair it with a fuzzer (libFuzzer or AFL++) that generates truncated and bit-flipped .mo files.

4. Apply fuzzing to binary file parsers

Binary format parsers are high-value fuzzing targets. A corpus of valid .mo files plus a mutation fuzzer would almost certainly have generated a 1-byte input and triggered this bug before it reached production.

5. Follow the CERT C Secure Coding Standard

CERT C rule ARR38-C states: "Guarantee that library functions do not form invalid pointers." The memcpy call violated this rule by not guaranteeing the source pointer was valid for the requested size.

Relevant standards:
- CWE-125: Out-of-bounds Read
- CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer
- OWASP: Input Validation Cheat Sheet


Key Takeaways

  • Passing NULL as the length parameter to g_bytes_get_data() is safe only if you never index into the returned pointer — in asc_l10n_parse_file_gettext(), the pointer was immediately used in a memcpy, making the discarded length a direct vulnerability.
  • Both the gettext .mo parser and the Qt .qm parser in asc-utils-l10n.c shared the same class of bug: a read that assumed the buffer was large enough without checking. Always audit all read sites in a parser when you find one vulnerability.
  • The fix required zero changes to the parsing logic — only the addition of &data_len in one function call and two if guards. Bounds checks are cheap; heap over-reads are not.
  • Binary locale files are attacker-controlled input in any system that processes third-party packages. Treat them with the same skepticism as network input.
  • The _read_uint32 helper advancing m without a bounds check is a pattern that can hide in many cursor-based parsers — audit every call site of such helpers when the cursor approaches the buffer end.

How Orbis AppSec Detected This

  • Source: A .mo or .qm locale file read from disk via asc_unit_read_data() in asc_l10n_parse_file_gettext() — an externally supplied binary file with no prior size validation.
  • Sink: memcpy (&h, data, sizeof (AscLocaleGettextHeader)) at compose/asc-utils-l10n.c:131, and _read_uint32 (data, &m) in the Qt section loop — both operating on a pointer whose backing buffer length was unknown at the call site.
  • Missing control: The length out-parameter of g_bytes_get_data() was passed as NULL, so no upper bound was available to check before either read. No guard existed between the data pointer acquisition and the first memory copy.
  • CWE: CWE-125 — Out-of-bounds Read
  • Fix: Passed &data_len to g_bytes_get_data() and inserted if (data_len < sizeof(AscLocaleGettextHeader)) and if (m + 4 > len) guards before the respective unsafe reads.

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.


Conclusion

Out-of-bounds reads in binary file parsers are among the most common and consequential memory-safety bugs in C codebases. The vulnerability in asc_l10n_parse_file_gettext() is a textbook example: a single optional parameter (NULL instead of &data_len) silently removed the only mechanism by which the code could have known it was about to read past the end of a heap buffer. The fix is three lines of guard code — but those three lines are the difference between a parser that crashes or leaks memory on malformed input and one that fails safely with a clear error message.

For developers writing binary format parsers in C, the lesson is clear: always capture buffer lengths, always check them before every read, and fuzz your parsers with truncated inputs before shipping. The cost of adding a bounds check is microseconds; the cost of shipping without one can be a heap disclosure in a production build pipeline.


References

Frequently Asked Questions

What is an out-of-bounds read?

An out-of-bounds read occurs when a program reads memory beyond the end of an allocated buffer, potentially exposing sensitive heap contents or causing a crash.

How do you prevent out-of-bounds reads in C when parsing binary files?

Always capture the buffer length returned by functions like `g_bytes_get_data()`, then validate that every read offset plus the read size is strictly less than or equal to that length before performing `memcpy` or pointer arithmetic.

What CWE is out-of-bounds read?

CWE-125 — Out-of-bounds Read. Related weaknesses include CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer) and CWE-126 (Buffer Over-read).

Is address sanitizer (ASan) enough to prevent out-of-bounds reads in production?

ASan is an excellent detection tool during development and CI, but it is not a prevention mechanism for production code. Explicit bounds checks in the source code are required to prevent the vulnerability from being reachable at all.

Can static analysis detect missing bounds checks before memcpy?

Yes. Tools like Semgrep, Coverity, and CodeChecker can flag `memcpy` calls where the size argument is not validated against the source buffer length. Orbis AppSec's multi-agent AI scanner detected this exact pattern in `asc-utils-l10n.c`.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #750

Related Articles

critical

How buffer overflow in SMS response buffer handling happens in C and how to fix it

A critical buffer overflow vulnerability was discovered in `sm_at_sms.c`, where three consecutive unsafe string operations — `sprintf()`, `strcpy()`, and `strcat()` — wrote SMS payload data into a fixed-size buffer without any bounds checking. An attacker capable of crafting an oversized SMS message could overflow `sms_ctx.concat_rsp_buf`, corrupting adjacent stack or heap memory. The fix replaces all three unsafe calls with their bounds-aware counterparts: `snprintf()` and `strcat_s()`.

critical

How buffer overflow happens in C xxd utility and how to fix it

A critical buffer overflow vulnerability was discovered in the xxd utility's `xxdline()` function where `strcpy()` was used without bounds checking on file input. An attacker could craft a malicious hex dump file with oversized lines to trigger memory corruption. The fix replaces the unsafe `strcpy()` with `snprintf()` to enforce buffer size limits.

critical

How LDAP injection happens in C with OpenLDAP and how to fix it

A high-severity LDAP injection vulnerability was discovered in the OpenSIPS H350 module, where the `ldap_rfc4515_escape()` function failed to escape the NUL byte (`\0`) — one of the special characters defined in RFC 4515. This gap meant that crafted SIP URI values could bypass the escaping logic and manipulate LDAP filter queries. The fix adds explicit NUL byte escaping and replaces potentially unsafe `strncpy` calls with `memcpy` to ensure correct buffer handling.

critical

How unbounded strcpy() causes heap buffer overflow in C NGINX modules and how to fix it

A critical buffer overflow vulnerability was discovered in the 51Degrees NGINX module (`ngx_http_51D_module.c`), where four uses of unbounded `strcpy()` allowed attackers to overflow fixed-size heap buffers by sending HTTP requests with oversized header names. The fix replaces all unsafe string operations with length-bounded NGINX-native alternatives (`ngx_memcpy` and `ngx_cpystrn`), preventing memory corruption without any change to functional behavior.

critical

How integer overflow in path_join() happens in C and how to fix it

A critical integer overflow vulnerability was discovered in the `__cstl_join` function in `opencstl/filesystem.h` that could allow attackers to trigger a heap buffer overflow by supplying crafted file path strings. The fix adds an explicit overflow check before the size calculation, returning NULL when the combined path lengths would wrap around the `size_type64` maximum value.

critical

How command injection happens in Python subprocess and how to fix it

A command injection vulnerability in `skills/skill-comply/scripts/runner.py` allowed attackers who could influence skill definition files to execute arbitrary binaries on the host system via `subprocess.run()`. The fix introduces an explicit allowlist of permitted executables (`ALLOWED_SETUP_EXECUTABLES`) that gates every command before it reaches the subprocess call at line 110. This closes a significant attack surface in the skill-comply pipeline without breaking legitimate setup workflows.