Critical Buffer Overflow in OJ's fast.c: How an Unsafe strcpy Nearly Opened the Door to RCE
Severity: š“ Critical | CVE Class: CWE-120 (Buffer Copy Without Checking Size of Input) | Affected Component:
ext/oj/fast.c:92
Introduction
If your Ruby application parses JSON ā and chances are it does ā you may be using OJ (Optimized JSON), one of the most widely adopted JSON libraries in the Ruby ecosystem. OJ is beloved for its blazing-fast C-extension parser, but that speed comes with a responsibility: C code lives close to the metal, and a single unsafe memory operation can turn a JSON parser into an attacker's playground.
A critical vulnerability was recently discovered and patched in OJ's fast.c parser: an unbounded strcpy call at line 92 that blindly copies attacker-controlled JSON data into a fixed-size buffer with zero bounds checking. This is a textbook buffer overflow ā the kind of bug that has haunted C codebases for decades and continues to be one of the most dangerous classes of vulnerabilities in existence.
This post breaks down exactly what went wrong, how an attacker could have exploited it, and what the fix looks like ā so you can write safer code and understand why memory safety matters even in high-level language ecosystems.
The Vulnerability Explained
What Is a Buffer Overflow?
A buffer overflow occurs when a program writes more data into a memory buffer than it was allocated to hold. The excess data spills into adjacent memory regions, potentially overwriting critical data structures, return addresses, or function pointers.
In C, the strcpy function is a notorious offender:
// DANGEROUS: strcpy copies until it hits a null terminator.
// It has NO idea how big the destination buffer is.
strcpy(destination, source);
strcpy will copy every byte from source into destination until it encounters a null byte (\0). If source is longer than the space allocated for destination, it keeps writing anyway ā straight into whatever memory comes next.
The Vulnerable Code Pattern
At line 92 of ext/oj/fast.c, the parser contained an unsafe strcpy call where the source string was derived directly from attacker-controlled JSON input:
// VULNERABLE (simplified representation)
char dest_buffer[256]; // Fixed-size destination buffer
// 'source' comes from parsed JSON ā attacker controls this!
strcpy(dest_buffer, source); // ā No bounds check whatsoever
The critical problem here is the trust boundary violation: data flowing in from a JSON payload (which any external user can craft) is being copied directly into a fixed-size stack or heap buffer without any length validation.
How Could an Attacker Exploit This?
Let's walk through a concrete attack scenario:
Step 1 ā Craft a malicious payload
An attacker constructs a JSON document with an abnormally long string value designed to overflow the buffer:
{
"key": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[SHELLCODE_HERE]"
}
Step 2 ā Trigger the vulnerable code path
The attacker sends this payload to any endpoint in your application that calls Oj.load(), Oj.safe_load(), or any wrapper that eventually invokes the fast parser.
Step 3 ā Memory corruption occurs
When strcpy copies the oversized string:
Memory layout (before overflow):
[dest_buffer: 256 bytes][saved_frame_pointer][return_address][...]
Memory layout (after overflow):
[dest_buffer: 256 bytes][AAAAAAAAAA...][ATTACKER_CONTROLLED_DATA]
ā
Return address overwritten!
Step 4 ā Arbitrary code execution
By carefully crafting the overflow payload, a sophisticated attacker can:
- Overwrite the return address on the stack to redirect execution to attacker-supplied shellcode
- Corrupt heap metadata to manipulate future memory allocations
- Overwrite function pointers to hijack control flow
- Achieve Remote Code Execution (RCE) ā running arbitrary commands on the server
Real-World Impact
The impact of this vulnerability is severe because:
- JSON parsing is ubiquitous ā virtually every modern web application parses JSON from external sources (API requests, webhooks, file uploads)
- OJ is widely deployed ā with tens of millions of downloads, the blast radius is enormous
- No authentication required ā any unauthenticated endpoint that accepts JSON could be a vector
- Full system compromise ā successful exploitation means the attacker runs code with the same privileges as your application process
Consider a typical Rails API endpoint:
# This innocent-looking code could trigger the vulnerability
class ApiController < ApplicationController
def create
# params parsed via OJ under the hood
data = Oj.load(request.body.read)
# process data...
end
end
An attacker hitting this endpoint with a crafted payload could potentially own the entire server.
The Fix
What Changed
The fix in ext/oj/fast.c removes the unsafe strcpy call and replaces it with a bounds-aware alternative. The correct approach in C is to use strncpy or, better yet, strlcpy (where available), combined with explicit length validation:
// SAFE: Always validate length before copying
size_t src_len = strlen(source);
if (src_len >= sizeof(dest_buffer)) {
// Handle error: input too long
rb_raise(rb_eArgError, "string value too long");
return Qnil;
}
// Now safe ā we've verified source fits in destination
strncpy(dest_buffer, source, sizeof(dest_buffer) - 1);
dest_buffer[sizeof(dest_buffer) - 1] = '\0'; // Guarantee null termination
Or using the even safer strlcpy pattern:
// strlcpy always null-terminates and returns the length
// that *would* have been copied ā use this to detect truncation
size_t copied = strlcpy(dest_buffer, source, sizeof(dest_buffer));
if (copied >= sizeof(dest_buffer)) {
// Truncation occurred ā handle appropriately
rb_raise(rb_eArgError, "string value exceeds maximum length");
return Qnil;
}
Why This Fix Works
The fix addresses the root cause by establishing a trust boundary between external input and internal memory operations:
| Aspect | Before (Vulnerable) | After (Fixed) |
|---|---|---|
| Length check | ā None | ā Explicit validation |
| Overflow possible | ā Yes | ā No |
| Attacker control | ā Full | ā Bounded |
| Error handling | ā Silent corruption | ā Raises exception |
The key insight is simple but profound: never trust the length of externally supplied data. Always validate before copying.
Prevention & Best Practices
1. Ban Unsafe String Functions in C Code
Establish a coding standard that prohibits known-unsafe C functions. Many teams use compiler warnings or static analysis to enforce this:
// ā NEVER use these without bounds checking:
strcpy(dst, src);
strcat(dst, src);
sprintf(buf, fmt, ...);
gets(buf);
// ā
USE these safer alternatives:
strncpy(dst, src, sizeof(dst) - 1);
strncat(dst, src, sizeof(dst) - strlen(dst) - 1);
snprintf(buf, sizeof(buf), fmt, ...);
fgets(buf, sizeof(buf), stdin);
2. Validate All External Input Before Processing
Every byte that crosses a trust boundary (network, file, user input) must be validated:
// Validate length BEFORE any memory operation
if (input_length > MAX_ALLOWED_LENGTH) {
return ERROR_INPUT_TOO_LONG;
}
3. Use Memory-Safe Languages Where Possible
Consider whether the performance-critical path actually needs to be in C. Languages like Rust provide memory safety guarantees at the language level:
// Rust makes buffer overflows impossible by design
fn process_json_string(input: &str) -> Result<String, Error> {
if input.len() > MAX_LENGTH {
return Err(Error::InputTooLong);
}
// Safe string operations ā no manual memory management
Ok(input.to_string())
}
4. Enable Compiler Protections
Modern compilers and operating systems provide multiple layers of protection:
# Enable stack canaries, ASLR, and other protections
gcc -fstack-protector-strong \
-D_FORTIFY_SOURCE=2 \
-pie -fPIE \
-Wl,-z,relro,-z,now \
your_code.c
These don't prevent the vulnerability, but they make exploitation significantly harder.
5. Use Static Analysis Tools
Integrate static analysis into your CI/CD pipeline to catch these issues automatically:
# Example: GitHub Actions with CodeQL
- name: Initialize CodeQL
uses: github/codeql-action/init@v2
with:
languages: cpp
queries: security-extended
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v2
Other excellent tools for C/C++ analysis:
- Coverity ā deep static analysis for C/C++
- AddressSanitizer (ASan) ā runtime memory error detection
- Valgrind ā memory debugging and leak detection
- Clang Static Analyzer ā built into the LLVM toolchain
6. Fuzz Test Your Parsers
Parsers are especially vulnerable to malformed input. Fuzzing automatically generates edge-case inputs:
# Using AFL++ for fuzzing
afl-fuzz -i input_corpus/ -o findings/ -- ./your_parser @@
Security Standards Reference
This vulnerability maps to several well-known security standards:
- CWE-120: Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')
- CWE-121: Stack-based Buffer Overflow
- OWASP A03:2021: Injection (includes memory injection)
- CERT C Coding Standard STR31-C: Guarantee that storage for strings has sufficient space for character data and the null terminator
Conclusion
This vulnerability in OJ's fast.c is a stark reminder that security vulnerabilities don't respect language boundaries. You might write beautiful, safe Ruby code all day long, but if a C extension underneath you has an unsafe strcpy, your entire application's security posture is compromised.
The key takeaways from this vulnerability:
- Unsafe C functions like
strcpyare never acceptable when handling externally-supplied data ā full stop - Trust boundaries matter: data from JSON payloads is attacker-controlled and must be treated with suspicion
- Defense in depth works: compiler protections like stack canaries and ASLR raise the bar for exploitation even when bugs slip through
- Automated tooling catches what humans miss: static analysis and fuzzing would have flagged this issue before it reached production
- Memory safety is a first-class concern: when performance allows, prefer memory-safe languages for parsing untrusted input
The security community's ability to find, responsibly disclose, and patch vulnerabilities like this one is what keeps the open-source ecosystem trustworthy. If you maintain C extensions or native libraries, consider auditing your code for similar patterns today ā your users are counting on you.
Found a security vulnerability? Practice responsible disclosure by contacting the project maintainers privately before going public. Most projects have a SECURITY.md or a security contact in their repository.
This post was generated as part of an automated security fix workflow by OrbisAI Security. Automated detection + human review = faster, safer software for everyone.