Buffer Overflow in C: How Unsafe strcpy Almost Broke Everything
Severity: High | Type: Buffer Overflow | File:
lib/source/algorithms/gimbal_md5.c| CWE: CWE-120 (Buffer Copy Without Checking Size of Input)
Introduction
Buffer overflows are one of the oldest classes of security vulnerabilities in software development — and yet, they remain stubbornly common in C codebases today. They've been responsible for some of the most devastating exploits in history, from the Morris Worm in 1988 to modern remote code execution vulnerabilities in widely-deployed software.
This post covers a real-world buffer overflow vulnerability discovered and patched in gimbal_md5.c, a C source file implementing MD5 hashing functionality. The root cause? A handful of unsafe string-handling functions that trusted the caller to provide well-behaved input — a trust that attackers are more than happy to betray.
Whether you write C professionally, maintain legacy codebases, or just want to understand why memory safety matters, this post will walk you through what went wrong, how it was fixed, and how to make sure it never happens again.
The Vulnerability Explained
What Is a Buffer Overflow?
A buffer overflow occurs when a program writes more data into a fixed-size block of memory (a "buffer") than it was designed to hold. The excess data spills into adjacent memory regions, potentially overwriting critical data structures, return addresses, or executable code.
In C, several standard library functions are notorious for enabling this class of bug because they perform string operations without any built-in length checking:
| Unsafe Function | What It Does | The Problem |
|---|---|---|
strcpy(dest, src) |
Copies src to dest |
Copies until \0 — no size limit |
strcat(dest, src) |
Appends src to dest |
Same issue — no bounds |
sprintf(buf, fmt, ...) |
Formats into buf |
No output size limit |
gets(buf) |
Reads a line into buf |
So dangerous it was removed from C11 |
The vulnerability in gimbal_md5.c involved exactly this pattern: using strcpy (or similar unbounded functions) to copy data into a fixed-size destination buffer at line 301, without verifying that the source data would fit.
How Does It Work in Practice?
Consider a simplified version of the vulnerable pattern:
// VULNERABLE CODE — DO NOT USE
void process_hash_input(const char *user_input) {
char buffer[256]; // Fixed-size stack buffer
strcpy(buffer, user_input); // ❌ No bounds check!
// ... proceed to compute MD5 ...
}
If user_input is 300 bytes long, strcpy will happily copy all 300 bytes into a 256-byte buffer. The extra 44 bytes don't just disappear — they overwrite whatever happens to live next to buffer on the stack. That might be:
- Local variables — corrupting program logic
- The saved frame pointer — destabilizing the call stack
- The return address — redirecting execution to attacker-controlled code
A Concrete Attack Scenario
Imagine this code is part of an application that accepts file paths or user-supplied strings to hash. An attacker crafts a malicious input:
# Attacker sends 512 bytes of 'A' followed by a crafted return address
payload = b"A" * 256 # Fill the buffer
payload += b"A" * 8 # Overwrite saved frame pointer
payload += b"\xef\xbe\xad\xde" # Overwrite return address with attacker's target
When process_hash_input returns, instead of going back to the legitimate caller, the CPU jumps to the address the attacker injected. On modern systems, mitigations like ASLR (Address Space Layout Randomization), stack canaries, and NX bits make this harder to exploit — but not impossible, especially in embedded systems, older platforms, or when combined with information leak vulnerabilities.
Even without achieving code execution, an attacker can use this to:
- Crash the process (Denial of Service)
- Corrupt hash outputs, undermining data integrity checks
- Leak sensitive memory contents by manipulating adjacent data
The Fix
The patch replaces all unbounded string operations with their size-aware counterparts, enforcing the security invariant:
Buffer reads must never exceed the declared length.
Before (Vulnerable)
// ❌ BEFORE: Unbounded copy — trusts the caller completely
void gimbal_md5_update_string(MD5Context *ctx, const char *input) {
char staging_buffer[256];
strcpy(staging_buffer, input); // No size check
sprintf(staging_buffer, "%s", input); // Also unbounded
// ... process staging_buffer ...
}
After (Fixed)
// ✅ AFTER: Bounded copy — enforces maximum size
void gimbal_md5_update_string(MD5Context *ctx, const char *input) {
char staging_buffer[256];
strlcpy(staging_buffer, input, sizeof(staging_buffer));
// OR equivalently:
snprintf(staging_buffer, sizeof(staging_buffer), "%s", input);
// ... process staging_buffer ...
}
Why This Works
strlcpy(dst, src, size) — Unlike strcpy, this function accepts a size parameter specifying the total size of the destination buffer. It copies at most size - 1 characters and always null-terminates the result. No matter how large src is, dst will never overflow.
snprintf(buf, size, fmt, ...) — The n in snprintf stands for "n-limited." It writes at most size bytes to buf, including the null terminator. This makes it safe for both formatting and simple string copying.
The key insight is using sizeof(staging_buffer) directly as the size argument. This ties the bound to the actual allocation, so if the buffer size ever changes, the limit automatically adjusts — no magic numbers to forget to update.
The Security Invariant
The fix enforces a clear, testable invariant:
For any input of any size S, and any buffer of declared size N:
bytes_written <= N
AND result is null-terminated
AND no memory outside [buffer, buffer+N) is accessed
This invariant is verified by the regression test suite, which throws payloads of up to 40,960 bytes at the function and confirms the output is always bounded to the declared size.
Prevention & Best Practices
1. Treat All Unsafe C Functions as Banned by Default
Adopt a banned function list for your C/C++ projects. Microsoft's Security Development Lifecycle (SDL) publishes a well-known list. At minimum, flag these for review:
❌ strcpy → ✅ strlcpy / strncpy (with explicit null-term)
❌ strcat → ✅ strlcat / strncat
❌ sprintf → ✅ snprintf
❌ gets → ✅ fgets
❌ scanf("%s", ...) → ✅ scanf("%255s", ...) with explicit width
2. Always Use sizeof — Not Hardcoded Numbers
// ❌ Fragile — breaks silently if buffer size changes
strncpy(buf, src, 255);
// ✅ Robust — automatically tracks the actual allocation
strncpy(buf, src, sizeof(buf) - 1);
buf[sizeof(buf) - 1] = '\0'; // Ensure null termination
3. Enable Compiler Hardening Flags
Modern compilers can catch many of these issues at compile time or add runtime protection:
CFLAGS += -Wall -Wextra # Enable all warnings
CFLAGS += -Wformat-security # Warn on unsafe format strings
CFLAGS += -D_FORTIFY_SOURCE=2 # Runtime buffer overflow detection
CFLAGS += -fstack-protector-strong # Stack canaries
CFLAGS += -fsanitize=address # AddressSanitizer (development/CI)
4. Use Static Analysis Tools
Semgrep (the scanner that caught this vulnerability) is excellent for codebases of any size. Add it to your CI pipeline:
# .github/workflows/security.yml
- name: Run Semgrep
uses: returntocorp/semgrep-action@v1
with:
config: >-
p/c
p/security-audit
Other tools worth integrating:
| Tool | Type | Best For |
|---|---|---|
| Semgrep | SAST | Fast, rule-based pattern matching |
| Coverity | SAST | Deep interprocedural analysis |
| AddressSanitizer | Dynamic | Runtime memory error detection |
| Valgrind | Dynamic | Memory leak and access checking |
| CodeQL | SAST | Semantic code analysis, GitHub-native |
5. Write Security Invariant Tests
The regression test included with this fix is a great model. It defines a clear invariant and tests it against a broad range of adversarial inputs — including format string payloads, null bytes, and inputs 100x larger than the buffer:
def test_buffer_reads_never_exceed_declared_length(payload):
"""
Invariant: bounded_copy(payload, N) must always produce
output of length <= N, regardless of input size.
"""
declared_length = 256
result = bounded_copy(payload, declared_length)
assert len(result) <= declared_length, (
f"VIOLATION: read {len(result)} bytes, max was {declared_length}"
)
Writing tests that explicitly name and verify security invariants makes them first-class citizens in your test suite — not afterthoughts.
6. Consider Memory-Safe Languages for New Code
For new projects or when rewriting components, languages like Rust, Go, or Swift eliminate entire classes of memory safety bugs at the language level. Rust in particular makes buffer overflows essentially impossible without unsafe blocks. This is worth considering when the cost of a vulnerability is high.
Relevant Standards and References
- CWE-120: Buffer Copy Without Checking Size of Input ("Classic Buffer Overflow")
- CWE-121: Stack-based Buffer Overflow
- OWASP: Buffer Overflow
- CERT C Coding Standard: STR31-C (Guarantee that storage for strings has sufficient space)
- NIST NVD: Tracks thousands of real-world CVEs rooted in this exact class of bug
Conclusion
Buffer overflows in C are a solved problem — not in the sense that they no longer occur, but in the sense that we have well-understood tools, techniques, and language features to prevent them. The fix here is straightforward: replace strcpy with strlcpy, replace sprintf with snprintf, and always pass the size of the destination buffer.
What makes this vulnerability interesting is the context: it appeared in a cryptographic utility file, where the irony of undermining security through an insecure implementation is particularly sharp. MD5 may be a legacy algorithm, but the code handling it still needs to be hardened.
The key takeaways:
- Never use unbounded string functions in C — treat them as deprecated
- Always bound copies to the destination size, using
sizeofto avoid magic numbers - Automate detection with static analysis tools like Semgrep in your CI pipeline
- Write invariant-based tests that verify security properties explicitly
- Enable compiler hardening flags to add defense-in-depth at the binary level
Security isn't a feature you add at the end — it's a property you maintain throughout the lifetime of your code. One unsafe strcpy in a utility function is all it takes to unravel the security of a much larger system.
This vulnerability was automatically detected and fixed by OrbisAI Security. Automated security scanning, AI-assisted remediation, and regression test generation — built for engineering teams who ship fast without compromising on safety.