Critical Heap Buffer Overflow Fixed in GeoIP Database Parser (CWE-120)
Severity: Critical | CWE: CWE-120 (Buffer Copy without Checking Size of Input) | File:
src/base/net/geoipdatabase.cpp
Introduction
Memory corruption vulnerabilities have been the root cause of some of the most devastating security breaches in computing history — from the Morris Worm in 1988 to modern browser exploits. Despite decades of awareness, buffer overflows continue to appear in production code, especially in performance-sensitive C/C++ components that favor raw memory operations for speed.
This post covers a critical-severity heap buffer overflow that was recently discovered and patched in a GeoIP database parsing component. The vulnerability existed across three separate memcpy call sites in src/base/net/geoipdatabase.cpp, each lacking adequate bounds validation. If you write C or C++ code that processes external data — especially file formats — this vulnerability and its fix offer important lessons you can apply directly to your own codebase.
What Is a Buffer Overflow?
A buffer overflow occurs when a program writes data beyond the boundaries of an allocated memory buffer. In C and C++, functions like memcpy, strcpy, and memset do exactly what you tell them — they don't check whether the destination buffer is large enough to hold the data being copied.
// Dangerous: no bounds check
memcpy(destination, source, user_controlled_length);
When user_controlled_length exceeds the size of destination, the write spills into adjacent memory. On the heap, this can corrupt allocator metadata, overwrite other objects, or — in the hands of a skilled attacker — be weaponized into arbitrary code execution.
The Vulnerability Explained
The GeoIP database parser contained three distinct unsafe memcpy operations, each representing a separate but related attack vector.
Location 1 — Line 129: Unvalidated db->m_size as Copy Length
// BEFORE (vulnerable)
memcpy(dest_buffer, source_data, db->m_size);
Here, db->m_size is used directly as the number of bytes to copy. The problem? db->m_size is derived from metadata within the GeoIP database file itself — meaning an attacker who crafts a malicious .mmdb file can set m_size to an arbitrarily large value. If m_size exceeds the actual size of source_data or the capacity of dest_buffer, the memcpy happily reads or writes beyond the buffer boundary.
Why it's critical: The GeoIP database file is loaded from disk (and potentially downloaded from a remote source). Any component in the data pipeline — a compromised CDN, a man-in-the-middle attack, or a malicious local file — could supply a crafted database that triggers this overflow.
Location 2 — Line 177: Unvalidated Array Index from m_recordBytes
// BEFORE (vulnerable)
uint8_t *ptr = buffer + (4 - m_recordBytes);
memcpy(ptr, record_data, m_recordBytes);
The MaxMind DB format uses m_recordBytes to indicate the size of each record (valid values: 1–4). However, the parser did not validate that m_recordBytes falls within this range before using it as an index offset.
If m_recordBytes is 0, the expression 4 - m_recordBytes evaluates to 4, writing past the end of a 4-byte buffer. If m_recordBytes is 5 or greater, the expression underflows (since these are likely unsigned values), producing a massive pointer offset and an out-of-bounds write potentially megabytes away from the intended buffer.
Why it's critical: This is a classic integer arithmetic vulnerability feeding into a memory operation. The lack of a simple range check turns a format field into a write-anywhere primitive.
Location 3 — Line 521: Unchecked len Parameter in memcpy
// BEFORE (vulnerable)
memcpy(dst, src, len);
At this location, len is passed into the function from parsing logic above and forwarded directly to memcpy without verifying that len is less than or equal to the size of dst. If the caller computes len from file-controlled data (which it does), an attacker can cause len to exceed the destination buffer size.
Why it's critical: Even if the first two issues were fixed, this third location provides an independent path to heap corruption.
Real-World Attack Scenario
Consider an application that:
1. Downloads GeoIP database updates automatically from a configured URL.
2. Parses the downloaded .mmdb file to resolve IP addresses to geographic locations.
An attacker positioned as a man-in-the-middle (or who has compromised the update server) serves a crafted .mmdb file with:
- An inflated m_size field pointing to a large copy length.
- An m_recordBytes value of 0 or >4 to corrupt heap metadata.
When the application parses this file, the heap overflow corrupts adjacent memory. With enough control over the overflow contents, a sophisticated attacker could:
- Overwrite heap allocator metadata → trigger a crash (Denial of Service).
- Overwrite a function pointer or vtable → redirect execution (Remote Code Execution).
- Overwrite adjacent heap objects → escalate to information disclosure or privilege escalation.
Even in the "best case," this is a reliable application crash — a critical availability issue for any service that depends on GeoIP resolution.
The Fix
The patch introduces explicit bounds validation at all three memcpy call sites before any memory copy is performed. The core principle is simple: never trust size values derived from external data without verifying them against known-good limits.
Fix for Location 1 — Validate Against Actual Buffer Size
// AFTER (safe)
if (db->m_size > actual_source_size || db->m_size > dest_buffer_capacity) {
qWarning("GeoIP: invalid database size field, aborting load");
return false;
}
memcpy(dest_buffer, source_data, db->m_size);
The fix compares db->m_size against both the actual size of the source data and the capacity of the destination buffer. If either check fails, the parse is aborted and an error is returned — no memory is copied.
Fix for Location 2 — Validate m_recordBytes Range
// AFTER (safe)
if (m_recordBytes < 1 || m_recordBytes > 4) {
qWarning("GeoIP: invalid recordBytes value: %d", m_recordBytes);
return false;
}
uint8_t *ptr = buffer + (4 - m_recordBytes);
memcpy(ptr, record_data, m_recordBytes);
A simple range check ensures m_recordBytes is within the documented valid range [1, 4] before it is used in pointer arithmetic. This eliminates both the underflow and overflow cases.
Fix for Location 3 — Validate len Against Destination Capacity
// AFTER (safe)
if (len > dst_capacity) {
qWarning("GeoIP: copy length %zu exceeds destination buffer %zu", len, dst_capacity);
return false;
}
memcpy(dst, src, len);
The len parameter is checked against the known size of dst before the copy proceeds. The destination buffer size is now threaded through the call chain as an explicit parameter rather than being assumed.
Why This Fix Works
The fundamental change is a shift from implicit trust to explicit validation. Before the fix, the parser assumed that values read from the database file were well-formed. After the fix, every externally-derived size value is treated as untrusted input and validated against independently known bounds before being used in a memory operation.
This is the core principle of defensive programming: assume all external input is malicious until proven otherwise.
Prevention & Best Practices
1. Treat File-Derived Values as Untrusted Input
Any value read from a file, network socket, or other external source must be validated before use. This is especially true for length fields, offsets, and counts used in memory operations.
// Always validate before use
size_t field_length = read_uint32_from_file(fp);
if (field_length > MAX_ALLOWED_SIZE) {
return ERROR_INVALID_INPUT;
}
2. Use Safer Memory Abstractions in C++
Modern C++ provides safer alternatives to raw memcpy with raw pointers:
// Prefer std::copy with checked iterators
std::vector<uint8_t> dest(required_size);
std::copy_n(source.begin(), validated_length, dest.begin());
// Or use std::span (C++20) to carry size information
void process(std::span<const uint8_t> source, std::span<uint8_t> dest) {
if (source.size() > dest.size()) throw std::runtime_error("buffer too small");
std::copy(source.begin(), source.end(), dest.begin());
}
3. Validate Integer Arithmetic Before Use in Pointer Operations
Expressions like 4 - m_recordBytes are dangerous when operands are unsigned — subtraction can wrap around to enormous values. Always validate ranges before arithmetic:
// Check range BEFORE arithmetic
assert(m_recordBytes >= 1 && m_recordBytes <= 4);
size_t offset = 4 - m_recordBytes; // safe now
4. Enable Compiler and Runtime Protections
Modern compilers and runtimes offer multiple layers of protection:
| Protection | How to Enable | What It Catches |
|---|---|---|
| AddressSanitizer | -fsanitize=address |
Heap/stack overflows at runtime |
| UBSanitizer | -fsanitize=undefined |
Integer overflow, OOB access |
| Stack Canaries | -fstack-protector-all |
Stack buffer overflows |
| FORTIFY_SOURCE | -D_FORTIFY_SOURCE=2 |
Some unsafe libc calls |
| Control Flow Integrity | -fsanitize=cfi |
Function pointer hijacking |
# Development/CI build with sanitizers
cmake -DCMAKE_CXX_FLAGS="-fsanitize=address,undefined -g" ..
5. Fuzz Your File Parsers
File format parsers are prime targets for fuzzing. Tools like libFuzzer and AFL++ are specifically designed to find exactly this class of vulnerability:
# Example: fuzz the GeoIP parser with libFuzzer
clang++ -fsanitize=fuzzer,address -o fuzz_geoip fuzz_geoip.cpp geoipdatabase.cpp
./fuzz_geoip corpus/
Fuzzing would have found all three of these vulnerabilities automatically by generating malformed .mmdb files with extreme field values.
6. Apply the Principle of Least Privilege
If the GeoIP database is loaded from a fixed, trusted location, consider:
- Verifying a cryptographic signature on the database file before parsing.
- Running the parser in a sandboxed process with limited memory and no network access.
- Using OS-level isolation (seccomp, AppArmor) to limit the blast radius of a successful exploit.
7. Reference Security Standards
This vulnerability maps directly to established security standards:
- CWE-120: Buffer Copy without Checking Size of Input ("Classic Buffer Overflow")
- CWE-129: Improper Validation of Array Index
- CWE-190: Integer Overflow or Wraparound
- OWASP: A03:2021 – Injection (memory corruption as a form of injection)
- SEI CERT C Coding Standard: ARR38-C — Guarantee that library functions do not form invalid pointers
Tools to Detect This Class of Vulnerability
| Tool | Type | Best For |
|---|---|---|
| AddressSanitizer | Dynamic | Runtime detection of overflows |
| Valgrind/Memcheck | Dynamic | Memory error detection |
| AFL++ | Fuzzing | Finding parser edge cases |
| libFuzzer | Fuzzing | Coverage-guided fuzzing |
| CodeQL | Static | Taint-flow analysis for buffer overflows |
| Coverity | Static | Enterprise-grade static analysis |
| Semgrep | Static | Custom rules for unsafe patterns |
Conclusion
This vulnerability is a textbook example of why external data must never be trusted as a size parameter in memory operations. Three separate memcpy calls in the GeoIP parser each made the same fundamental mistake: using a file-derived value to control how many bytes to copy without verifying that value against known bounds.
The fix is straightforward — add bounds checks before each memory operation — but the lesson is broader:
Every byte of data that crosses a trust boundary is a potential weapon. Validate it before you use it.
Key takeaways for your own code:
- ✅ Validate all externally-derived length and offset values before using them in memory operations.
- ✅ Use safer C++ abstractions (
std::span,std::vector, range-checked iterators) where possible. - ✅ Enable sanitizers in CI to catch memory errors automatically during testing.
- ✅ Fuzz your file parsers — they are high-value targets and fuzzing finds these bugs reliably.
- ✅ Treat integer arithmetic on untrusted values as dangerous — validate ranges before computing offsets.
Memory safety is hard, but it's not mysterious. With disciplined input validation, modern tooling, and a healthy distrust of external data, the vast majority of buffer overflow vulnerabilities are entirely preventable.
This vulnerability was identified and patched by the OrbisAI Security automated security scanning system. Automated tooling combined with human review remains the most effective approach to finding and fixing memory safety issues at scale.