Back to Blog
critical SEVERITY10 min read

Stack Buffer Overflow in C++ gRPC Server: How sprintf() Enables Code Execution

A critical stack buffer overflow vulnerability was discovered in the ik-llama-cpp gRPC inference server, where three unguarded sprintf() calls wrote formatted statistics into fixed-size stack buffers without any length restrictions. If exploited, an attacker could overwrite return addresses and saved frame pointers, potentially achieving arbitrary code execution on the server. The fix replaces the unsafe sprintf() calls with length-aware alternatives, closing the door on this dangerous memory co

O
By orbisai0security
April 23, 2026
#buffer-overflow#cpp#memory-safety#grpc#cwe-121#secure-coding#critical-vulnerability

Stack Buffer Overflow in C++ gRPC Server: How sprintf() Can Enable Arbitrary Code Execution

Introduction

There is a particular class of bug that has haunted C and C++ codebases for decades — one that security researchers discovered in the 1980s, that brought down major systems in the 1990s, and that still appears in production code today. The stack buffer overflow is not glamorous, but it is devastatingly effective. When it lands in a network-facing server component, the consequences can be severe: complete system compromise, data exfiltration, or a persistent foothold for an attacker.

This post examines a critical stack buffer overflow vulnerability discovered in grpc-server.cpp, the gRPC inference server component of ik-llama-cpp — and how a deceptively simple function, sprintf(), was at the center of it all.

If you write C or C++, or if you maintain infrastructure that runs native code, this is a vulnerability pattern you need to understand deeply.


Background: What Is ik-llama-cpp?

ik-llama-cpp is a high-performance C++ inference backend for large language models, built on top of the llama.cpp ecosystem. Its gRPC server component (grpc-server.cpp) handles incoming inference requests, manages token generation, and reports back statistics — including timing measurements and token counts — to clients.

These statistics are formatted and written as strings during the normal course of operation. That formatting step is exactly where the vulnerability lived.


The Vulnerability Explained

What Went Wrong

At lines 329, 343, and 355 of backend/cpp/ik-llama-cpp/grpc-server.cpp, the code used sprintf() to write formatted timing and token-count statistics into fixed-size stack-allocated buffers. Here is the core problem in plain terms:

  • A buffer of a fixed size (say, 256 bytes) is allocated on the stack.
  • sprintf() is called to write a formatted string into that buffer.
  • sprintf() does not know how large the buffer is — it has no way to enforce a size limit.
  • If the formatted output exceeds the buffer size, sprintf() happily keeps writing past the end of the buffer, into adjacent stack memory.
// Simplified illustration of the vulnerable pattern
char stats_buffer[256];  // Fixed-size stack buffer

// sprintf() has NO idea how big stats_buffer is
sprintf(stats_buffer, "Tokens: %lld, Time: %f ms, Rate: %f tok/s",
        token_count,        // Could be a very large number
        elapsed_time_ms,    // Floating point, unpredictable string length
        tokens_per_second); // Same problem

In this server, the values being formatted included:
- Token counts — which can be arbitrarily large integers
- Timing measurements — floating-point values whose string representations can vary significantly in length

Neither of these values is bounded in a way that guarantees the formatted output will fit within the allocated buffer.

Why the Stack Makes This Worse

To understand why this is critical rather than merely "bad," you need to understand what lives on the stack near a local buffer.

When a function is called, the CPU pushes a return address onto the stack — the memory address the program should jump back to when the function finishes. It also saves frame pointers that help the program navigate its own call stack. These are stored in memory that is physically adjacent to local variables like our stats_buffer.

Stack layout (simplified, grows downward):
┌─────────────────────────────────┐   Higher addresses
  Caller's stack frame           
├─────────────────────────────────┤
  Saved return address  ◄────────┼── Overwriting this = code execution
├─────────────────────────────────┤
  Saved frame pointer            
├─────────────────────────────────┤
  stats_buffer[256]   ◄──────────┼── sprintf() starts writing here
  (our vulnerable buffer)        
  ...                            
└─────────────────────────────────┘   Lower addresses (buffer grows upward)

When sprintf() overflows stats_buffer, it overwrites those saved addresses. An attacker who can control the values being formatted — by crafting requests that produce specific token counts or by influencing timing values — can potentially write a controlled value into the return address. When the function returns, instead of jumping back to the legitimate caller, the CPU jumps to attacker-controlled code.

How Could This Be Exploited?

The exploit scenario requires an attacker to influence the numeric values that get formatted into the statistics buffer. In a gRPC inference server, there are plausible paths to this:

  1. Crafted inference requests: An attacker sends specially crafted requests designed to produce specific token counts or trigger specific timing measurements.
  2. Overflow the buffer: The formatted statistics string exceeds 256 bytes (or whatever the buffer size is), overflowing into the return address.
  3. Control execution: With the return address overwritten to point at attacker-controlled data (or existing code gadgets via ROP — Return Oriented Programming), arbitrary code executes in the context of the server process.

Real-world impact if exploited:
- Full compromise of the server running the inference backend
- Access to model weights, configuration, and any data the process can reach
- Lateral movement within the network if the server has internal access
- Persistent backdoor installation
- Denial of service (at minimum, crashing the server process)

This is why the vulnerability is rated critical. It's not a theoretical concern — stack buffer overflows via sprintf() have been exploited in real attacks for over 30 years.


The Fix

What Changed

The fix replaces the unsafe sprintf() calls with length-aware alternatives that accept a maximum output length parameter. The most common replacement in C/C++ is snprintf():

// BEFORE (vulnerable) — no length limit
char stats_buffer[256];
sprintf(stats_buffer, "Tokens: %lld, Time: %f ms, Rate: %f tok/s",
        token_count, elapsed_time_ms, tokens_per_second);

// AFTER (safe) — output truncated to buffer size
char stats_buffer[256];
snprintf(stats_buffer, sizeof(stats_buffer),
         "Tokens: %lld, Time: %f ms, Rate: %f tok/s",
         token_count, elapsed_time_ms, tokens_per_second);

The critical difference is sizeof(stats_buffer) — the second argument to snprintf(). This tells the function the maximum number of bytes it is allowed to write, including the null terminator. If the formatted output would exceed this limit, snprintf() truncates it rather than overflowing.

Applied to All Three Call Sites

The fix was applied to all three vulnerable locations — lines 329, 343, and 355. This is important. A partial fix that addresses only one or two of the three calls would leave the vulnerability exploitable through the remaining unpatched paths.

// Line 329 — fixed
snprintf(stats_buffer, sizeof(stats_buffer), /* format string */, /* args */);

// Line 343 — fixed
snprintf(stats_buffer, sizeof(stats_buffer), /* format string */, /* args */);

// Line 355 — fixed
snprintf(stats_buffer, sizeof(stats_buffer), /* format string */, /* args */);

Why This Fix Works

snprintf() enforces a hard ceiling on output length. No matter how large token_count is, no matter how many decimal places elapsed_time_ms expands to, the output will never exceed sizeof(stats_buffer) - 1 characters (the -1 accounts for the null terminator that snprintf() always appends).

The stack memory adjacent to the buffer — including those precious return addresses — is never touched. The overflow path is eliminated entirely.

Modern C++ Alternatives

While snprintf() is the standard C fix and perfectly appropriate here, modern C++ offers additional options worth knowing:

// Using std::string and std::format (C++20)
#include <format>
std::string stats = std::format("Tokens: {}, Time: {} ms, Rate: {} tok/s",
                                 token_count, elapsed_time_ms, tokens_per_second);

// Using std::ostringstream (C++11 and later)
#include <sstream>
std::ostringstream oss;
oss << "Tokens: " << token_count
    << ", Time: " << elapsed_time_ms << " ms"
    << ", Rate: " << tokens_per_second << " tok/s";
std::string stats = oss.str();

These approaches eliminate the fixed-size buffer entirely, delegating memory management to the standard library and removing the overflow risk at a fundamental level.


Prevention & Best Practices

1. Never Use sprintf() in New Code

There is no safe use of sprintf() when the output length is not provably bounded at compile time. Treat it as deprecated. Configure your compiler or linter to warn on its use:

# GCC/Clang — treat sprintf as an error
-Werror=deprecated-declarations

# Or use -D_FORTIFY_SOURCE=2 for runtime buffer overflow detection
-D_FORTIFY_SOURCE=2 -O2

2. Use Length-Bounded Functions Consistently

Unsafe Function Safe Replacement
sprintf() snprintf()
strcpy() strncpy() or strlcpy()
strcat() strncat() or strlcat()
gets() fgets()
scanf("%s") scanf("%255s") with explicit width

3. Prefer sizeof() Over Magic Numbers

When using snprintf(), always use sizeof(buffer) rather than a hardcoded number. If the buffer size changes in a future refactor, sizeof() automatically reflects the new size:

// Fragile — will be wrong if buffer size changes
snprintf(buf, 256, ...);

// Robust — always correct
snprintf(buf, sizeof(buf), ...);

4. Enable Compiler and OS Mitigations

Modern toolchains and operating systems include mitigations that make buffer overflows harder to exploit, even when they occur:

  • Stack Canaries (-fstack-protector-strong): Place a random value between local variables and the return address. If it's overwritten, the program detects the corruption and terminates before the return.
  • ASLR (Address Space Layout Randomization): Randomizes the memory layout of the process, making it harder for attackers to predict where to redirect execution.
  • NX/DEP (No-Execute / Data Execution Prevention): Marks the stack as non-executable, preventing attackers from injecting and running shellcode directly.
  • SafeStack (Clang): Separates sensitive stack data (return addresses) from regular local variables.
# CMakeLists.txt — enable stack protection
target_compile_options(your_target PRIVATE
    -fstack-protector-strong
    -D_FORTIFY_SOURCE=2
)
target_link_options(your_target PRIVATE
    -Wl,-z,relro
    -Wl,-z,now
)

These mitigations are important, but they are not a substitute for fixing the vulnerability. A determined attacker with enough control can often bypass individual mitigations. The correct approach is defense in depth: fix the bug and enable the mitigations.

5. Static Analysis

Several tools can catch sprintf() misuse automatically:

  • Clang-Tidy: The bugprone-not-null-terminated-result and cppcoreguidelines-pro-type-vararg checks flag unsafe formatting functions.
  • Coverity: Industry-standard static analyzer with strong buffer overflow detection.
  • CodeQL: GitHub's semantic code analysis engine can query for unsafe sprintf patterns.
  • AddressSanitizer (ASan): Runtime instrumentation that detects buffer overflows during testing:
# Compile with AddressSanitizer for testing
clang++ -fsanitize=address -g -o server grpc-server.cpp

6. Code Review Checklists

Add explicit buffer overflow checks to your C/C++ code review process:

  • [ ] Are all sprintf() calls replaced with snprintf()?
  • [ ] Does every snprintf() call use sizeof(buffer) as the size argument?
  • [ ] Are all fixed-size stack buffers that receive external or computed data reviewed for overflow?
  • [ ] Is the buffer size documented and justified?

Relevant Security Standards

  • CWE-121: Stack-based Buffer Overflow — the canonical classification for this vulnerability type
  • CWE-676: Use of Potentially Dangerous Function — covers sprintf() and similar unsafe APIs
  • OWASP: Buffer Overflow is listed in the OWASP Top 10 for native code
  • SEI CERT C Coding Standard: Rule STR31-C — "Guarantee that storage for strings has sufficient space for character data and the null terminator"
  • MISRA C 2012: Rule 21.6 prohibits use of <stdio.h> input/output functions in safety-critical code

Conclusion

A three-character change — replacing sprintf with snprintf and adding a size argument — closed a critical code execution vulnerability in a production inference server. That is the nature of memory safety bugs in C and C++: the gap between vulnerable and secure code is often small, but the consequences of leaving it open are enormous.

The key takeaways from this vulnerability:

  1. sprintf() is inherently unsafe when output length is not provably bounded. There is no good reason to use it in new code.
  2. Stack buffer overflows are not theoretical. They have been exploited reliably for decades and remain one of the most impactful vulnerability classes in native code.
  3. All instances must be fixed. Patching two of three vulnerable call sites is not a fix — it's a partial mitigation that leaves the system exploitable.
  4. Defense in depth matters. Fix the bug, but also enable stack canaries, ASLR, and NX. Use static analysis in your CI pipeline. Run AddressSanitizer in your test suite.
  5. Modern C++ reduces risk. std::string, std::format, and std::ostringstream eliminate fixed-size buffer concerns entirely. When writing new C++ code, prefer these over C-style character arrays.

Security vulnerabilities in inference servers are particularly sensitive because these systems often run with significant compute resources, handle proprietary model weights, and may have access to sensitive data from user queries. A compromised inference backend is not just a server problem — it's a data problem, a trust problem, and potentially a supply chain problem.

Write safe code. Review for memory safety. Automate detection. And when you find a sprintf() in a network-facing server, treat it with the urgency it deserves.


This vulnerability was identified and fixed as part of an automated security scanning process. The fix was verified by both automated re-scanning and manual code review.

Vulnerability ID: V-001 | Severity: Critical | CWE: CWE-121, CWE-676

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #9486

Related Articles

critical

Stack Buffer Overflow in MapScale: How Five Unsafe sprintf Calls Created a Critical Vulnerability

A critical stack-based buffer overflow vulnerability was discovered and patched in `src/mapscale.c`, where five unbounded `sprintf` calls wrote formatted output into fixed-size stack buffers without any bounds checking. An attacker controlling unit text strings could overflow the stack buffer, potentially overwriting the function return address and achieving arbitrary code execution. The fix replaces dangerous `sprintf` calls with their bounds-checked counterparts, eliminating the overflow risk

critical

Heap Buffer Overflows in YAML Parser: How Unchecked memcpy Calls Create Critical Attack Vectors

A critical heap buffer overflow vulnerability was discovered and patched in the YAML parser embedded within an Android VPN application, where five unvalidated `memcpy` calls could allow an attacker to corrupt heap memory by supplying a crafted YAML configuration file. This class of vulnerability is particularly dangerous because it can lead to arbitrary code execution or application crashes in security-sensitive contexts. The fix adds proper bounds validation before each copy operation, eliminat

critical

Critical Buffer Overflow Fixed: When "Safe" Functions Aren't Safe

A critical vulnerability in DeepSkyStackerKernel's StackWalker.cpp was silently replacing bounds-checking string functions with their unsafe counterparts via preprocessor macros, exposing the entire codebase to buffer overflow attacks. This fix removes the dangerous macro definitions that discarded buffer size arguments, restoring the intended memory safety protections across all call sites. Understanding how this subtle macro trick works is essential for any C/C++ developer working with string