What is a heap-based buffer overflow?

A heap-based buffer overflow occurs when a program writes more data into a dynamically allocated heap buffer than it was sized to hold, corrupting adjacent heap memory and potentially enabling code execution or crashes.

How do you prevent buffer overflow in C with sprintf()?

Replace fixed-size malloc()+sprintf() patterns with asprintf(), which allocates exactly the memory needed for the formatted result, or use snprintf() with explicit size limits if dynamic allocation isn't suitable.

What CWE is heap-based buffer overflow?

Heap-based buffer overflow is classified as CWE-122 (Heap-based Buffer Overflow), a subset of CWE-119 (Improper Restriction of Operations within the Bounds of a Memory Buffer).

Is snprintf() enough to prevent buffer overflow in C?

snprintf() prevents the overflow itself by truncating output, but truncation can introduce logic errors or incomplete data. asprintf() is often safer because it allocates exactly what is needed, avoiding both overflow and silent data loss.

Can static analysis detect buffer overflow from sprintf()?

Yes. Static analysis tools like Semgrep, Coverity, and clang-analyzer can detect unsafe sprintf() calls into fixed-size buffers and flag malloc()+sprintf() patterns as potential overflow sites.

Buffer Overflow in hoeldb.c: How sprintf() Threatened a Racing Sim's Database Layer

Introduction

The src/simmonitor/db/hoeldb.c file is the database access layer for a simulation monitor — it handles querying stints, laps, and sessions from a racing telemetry backend. It's infrastructure code: unglamorous, rarely touched, and exactly the kind of file where dangerous patterns quietly accumulate. A recent security scan flagged two functions in this file — getstints() and getlaps() — where fixed-size heap buffers were being written to with sprintf(), a combination that's been causing security incidents in C codebases for decades.

This post walks through exactly what was wrong, why it's exploitable, and what the fix looks like in concrete terms.

The Vulnerability Explained

Fixed Buffers + sprintf() = Undefined Territory

At line 121 in hoeldb.c, inside the getstints() function, the original code did this:

// VULNERABLE: Fixed 150-byte heap allocation
char* query = malloc(150 * sizeof(char));

// VULNERABLE: No bounds checking whatsoever
sprintf(query, "select stint_id, driver_id, team_member_id, session_id, car_id, game_car_id, laps, valid_laps, best_lap_id FROM %s WHERE session_id=%i", "Stints", use_id);

And at line 149, in getlaps(), the same pattern appeared with a slightly larger buffer:

// VULNERABLE: Fixed 250-byte heap allocation
char* query = malloc(250 * sizeof(char));

sprintf(query, "select lap_id, stint_id, sector_1, sector_2, sector_3, grip, tyre, time, ...", "Laps", use_id);

Let's count the bytes in that first query string. The static portion alone — before %s and %i are substituted — is already approximately 120 characters. With "Stints" substituted in, you're at ~126 bytes. Then use_id is formatted as a signed integer (%i). A normal session ID like 42 keeps you well within 150 bytes. But INT_MAX (2147483647) is 10 digits. That pushes the formatted string to ~136 bytes — still technically safe, but with only ~14 bytes of margin in a 150-byte buffer. More critically, INT_MIN (-2147483648) is 11 digits plus a minus sign, and any future developer adding even a small amount of additional query text could silently tip this over the edge.

The second buffer at 250 bytes has more headroom, but the pattern is identically unsafe — there is no guarantee the output will fit, and sprintf() will happily write past the end of the allocation if it doesn't.

What Happens When It Overflows?

When sprintf() writes beyond the 150-byte malloc'd buffer, it corrupts adjacent heap memory. In practice, this means:

Heap metadata corruption: The allocator's bookkeeping structures live adjacent to heap chunks. Overwriting them can cause crashes, double-frees, or exploitable heap layouts.
Adjacent object corruption: Whatever was malloc'd before or after query on the heap can be silently overwritten. In this codebase, that might be a StintDbo or LapDbo struct containing race telemetry data — or worse, a pointer.
Controlled write primitive: In a sufficiently adversarial scenario where use_id can be influenced by external input (e.g., from a network-facing session management layer), an attacker who can control the integer value and predict heap layout could turn this into an exploitable write primitive.

The Inconsistency That Made It Worse

What makes this particularly notable is that other functions in the same file already use asprintf() correctly for dynamic allocation. The vulnerability wasn't a matter of the developer not knowing about safer alternatives — asprintf() was already the established pattern in hoeldb.c. These two functions were simply inconsistent with the rest of the file.

The Companion Bug: malloc() Instead of calloc()

The PR also caught a secondary issue in getsessions() and getstints(). The row data arrays were allocated like this:

// VULNERABLE: Uninitialized memory
sess->rows = malloc(sizeof(SessionRowData) * result.nb_rows);
stint->rows = malloc(sizeof(StintRowData) * result.nb_rows);

Using malloc() for struct arrays means the memory contains whatever garbage was there before. If get_row_results() doesn't initialize every field (or if error handling exits early), the application could read and process uninitialized struct fields as valid telemetry data. This is a separate class of memory safety issue — not an overflow, but a use-of-uninitialized-memory bug.

The Fix

The fix makes three targeted changes, each addressing a specific memory safety concern.

Change 1: Replace malloc+sprintf with asprintf (Lines 121 and 149)

// BEFORE: Fixed buffer, no bounds checking
char* query = malloc(150 * sizeof(char));
sprintf(query, "select stint_id, driver_id, team_member_id, session_id, "
               "car_id, game_car_id, laps, valid_laps, best_lap_id "
               "FROM %s WHERE session_id=%i", "Stints", use_id);

// AFTER: Dynamic allocation, always exactly the right size
char* query;
asprintf(&query, "select stint_id, driver_id, team_member_id, session_id, "
                 "car_id, game_car_id, laps, valid_laps, best_lap_id "
                 "FROM %s WHERE session_id=%i", "Stints", use_id);

asprintf() is a GNU/BSD extension that formats a string and allocates exactly as much memory as needed — no guessing, no fixed buffers, no overflow possible. The buffer is always precisely strlen(result) + 1 bytes. This is exactly how the rest of hoeldb.c already builds query strings, so this change makes the file internally consistent.

The same transformation was applied to the getlaps() function at line 149, replacing the 250-byte fixed buffer with an asprintf() call.

Change 2: Replace malloc with calloc for Row Arrays (Lines 102 and 133)

// BEFORE: Uninitialized heap memory for struct arrays
sess->rows = malloc(sizeof(SessionRowData) * result.nb_rows);
stint->rows = malloc(sizeof(StintRowData) * result.nb_rows);

// AFTER: Zero-initialized memory, correct argument order
sess->rows = calloc(result.nb_rows, sizeof(SessionRowData));
stint->rows = calloc(result.nb_rows, sizeof(StintRowData));

calloc() zero-initializes the allocated memory, eliminating the uninitialized memory risk. Note also that calloc() takes (count, size) as arguments rather than (size * count) — this form also provides overflow protection on the multiplication itself, since calloc() implementations are required to detect and handle size overflow internally.

Prevention & Best Practices

Never Use sprintf() with Heap-Allocated Fixed Buffers in C

The fundamental rule: if you're using malloc() to allocate a buffer and then sprintf() to fill it, you need to either:
- Know the exact maximum size and verify it mathematically (error-prone, fragile under maintenance), or
- Use snprintf() with the buffer size as a parameter, or
- Use asprintf() / vasprintf() for fully dynamic allocation (preferred)

Use snprintf() When You Must Use Fixed Buffers

If asprintf() isn't available (e.g., on Windows without a compatibility layer), snprintf() is the safe alternative:

char query[150];
int written = snprintf(query, sizeof(query),
    "SELECT ... FROM %s WHERE session_id=%i", "Stints", use_id);
if (written < 0 || written >= (int)sizeof(query)) {
    // Handle truncation or error
    return ERROR_QUERY_TOO_LONG;
}

The regression test included in this PR demonstrates this pattern explicitly, testing 25 adversarial integer values including INT_MAX, INT_MIN, and common boundary values.

Use calloc() for Struct Arrays

When allocating arrays of structs that will be populated by external data (like database results), prefer calloc() over malloc(). Zero-initialization ensures that any fields not explicitly set by your population function won't contain garbage data that could be misinterpreted.

Establish and Enforce a File-Level Pattern

The fact that asprintf() was already used correctly elsewhere in hoeldb.c suggests a code review gap — these two functions were written (or copied) without following the established pattern. Code review checklists for C files should include: "Does every dynamic string construction use the same safe pattern established in this file?"

Static Analysis Tools That Would Catch This

Clang Static Analyzer: Detects potential buffer overflows in sprintf() calls
Coverity: Specifically flags BUFFER_SIZE_WARNING for fixed-size buffers with sprintf()
AddressSanitizer (ASan): Would catch the overflow at runtime during testing
Valgrind: Would detect uninitialized memory reads from the malloc() row arrays

Relevant Standards

CWE-121: Stack-based Buffer Overflow (the heap analog is CWE-122)
CWE-122: Heap-based Buffer Overflow — directly applicable here
CWE-676: Use of Potentially Dangerous Function (sprintf is explicitly listed)
CERT C Rule STR31-C: Guarantee that storage for strings has sufficient space for character data and the null terminator

Key Takeaways

getstints() and getlaps() in hoeldb.c both used malloc(N)+sprintf() while the rest of the file correctly used asprintf() — inconsistency within a single file is a red flag worth catching in code review.
The 150-byte buffer in getstints() provides only ~14 bytes of margin with INT_MAX as use_id — a seemingly safe buffer that becomes dangerous under edge-case integer values or minor future query modifications.
asprintf() eliminates the need to calculate or guess buffer sizes — it should be the default choice for dynamic SQL query construction in C codebases on Linux/BSD platforms.
calloc() is strictly safer than malloc() for struct arrays — zero initialization prevents uninitialized memory reads and makes the calloc(count, size) form protect against size multiplication overflow.
The regression test in this PR tests 25 adversarial integer values including INT_MAX, INT_MIN, and common bit patterns — this is the right level of thoroughness for a security-critical memory function.

Conclusion

This vulnerability in hoeldb.c is a textbook example of how memory safety issues accumulate in C codebases: not through dramatic architectural failures, but through small inconsistencies that compound over time. Two functions used a pattern (malloc + sprintf) that was already known to be unsafe and was already avoided elsewhere in the same file. The fix is clean and minimal — replacing four lines across two functions with asprintf() calls and switching malloc() to calloc() for row arrays.

The broader lesson for C developers is to treat sprintf() with a fixed-size buffer as a code smell that warrants immediate scrutiny. When you see malloc(N) followed by sprintf(), ask: "Can I prove this will never overflow under any valid input, including boundary values, and under any future maintenance?" If the answer is anything other than a confident yes with mathematical backing, reach for asprintf() or snprintf() instead.

Security in systems code is often about discipline and consistency — and this fix restores both.

cwe	CWE-122
fix	Replace malloc()+sprintf() with asprintf() for dynamic allocation; replace malloc() with calloc() for row buffers
risk	Heap memory corruption, potential arbitrary code execution or denial of service
language	C
root cause	Fixed-size buffers (150 and 250 bytes) allocated with malloc() written to with unchecked sprintf()
vulnerability	Heap-based Buffer Overflow via sprintf() into fixed-size malloc() buffer

Buffer Overflow in hoeldb.c: How sprintf() Threatened a Racing Sim's Database Layer

Answer Summary

Vulnerability at a Glance

Buffer Overflow in hoeldb.c: How sprintf() Threatened a Racing Sim's Database Layer

Introduction

The Vulnerability Explained

Fixed Buffers + sprintf() = Undefined Territory

What Happens When It Overflows?

The Inconsistency That Made It Worse

The Companion Bug: malloc() Instead of calloc()

The Fix

Change 1: Replace malloc+sprintf with asprintf (Lines 121 and 149)

Change 2: Replace malloc with calloc for Row Arrays (Lines 102 and 133)

Prevention & Best Practices

Never Use sprintf() with Heap-Allocated Fixed Buffers in C

Use snprintf() When You Must Use Fixed Buffers

Use calloc() for Struct Arrays

Establish and Enforce a File-Level Pattern

Static Analysis Tools That Would Catch This

Relevant Standards

Key Takeaways

Conclusion

Frequently Asked Questions

What is a heap-based buffer overflow?

How do you prevent buffer overflow in C with sprintf()?

What CWE is heap-based buffer overflow?

Is snprintf() enough to prevent buffer overflow in C?

Can static analysis detect buffer overflow from sprintf()?

View the Security Fix

Related Articles

How buffer overflow happens in C tar header parsing and how to fix it

How buffer overflow happens in C ieee80211_input() and how to fix it

How buffer overflow from unsafe string copy functions happens in C network interface code and how to fix it

How buffer overflow in FuzzIxml.c sprintf() happens in C and how to fix it

How buffer overflow happens in C HTML parsing and how to fix it

How buffer overflow in memcpy() happens in Node.js N-API bindings and how to fix it