What is a buffer overflow in a C extension?

A buffer overflow occurs when a program writes more data into a fixed-size buffer than it can hold, overwriting adjacent memory. In C extensions like OJ's fast.c, this can happen when functions like strcpy() copy user-controlled data without checking the destination buffer's size.

How do you prevent buffer overflow in C Ruby extensions?

Replace unsafe functions like strcpy(), strcat(), and sprintf() with length-bounded alternatives like strncpy(), strlcpy(), snprintf(), or memcpy() with explicit size checks. Always validate input length against the destination buffer size before copying.

What CWE is buffer overflow?

Buffer overflow vulnerabilities are classified under CWE-120 ("Buffer Copy without Checking Size of Input"), with related entries CWE-121 (stack-based) and CWE-122 (heap-based). This specific vulnerability in OJ's fast.c is CWE-120.

Is input validation alone enough to prevent buffer overflow in C?

No. While input validation helps, buffer overflows in C require both input validation AND using safe, bounds-checked copy functions. Even with validation, a single missed code path using strcpy() can be exploited. Defense in depth requires safe APIs at every copy site.

Can static analysis detect buffer overflow from strcpy()?

Yes. Static analysis tools like Semgrep, Coverity, Clang's static analyzer, and cppcheck can flag uses of strcpy() with tainted (user-controlled) input. Orbis AppSec automatically detected this specific strcpy() usage in OJ's fast.c and opened a remediation PR.

Critical Buffer Overflow in OJ's fast.c: How an Unsafe strcpy Nearly Opened the Door to RCE

Severity: 🔴 Critical | CVE Class: CWE-120 (Buffer Copy Without Checking Size of Input) | Affected Component: ext/oj/fast.c:92

Introduction

If your Ruby application parses JSON — and chances are it does — you may be using OJ (Optimized JSON), one of the most widely adopted JSON libraries in the Ruby ecosystem. OJ is beloved for its blazing-fast C-extension parser, but that speed comes with a responsibility: C code lives close to the metal, and a single unsafe memory operation can turn a JSON parser into an attacker's playground.

A critical vulnerability was recently discovered and patched in OJ's fast.c parser: an unbounded strcpy call at line 92 that blindly copies attacker-controlled JSON data into a fixed-size buffer with zero bounds checking. This is a textbook buffer overflow — the kind of bug that has haunted C codebases for decades and continues to be one of the most dangerous classes of vulnerabilities in existence.

This post breaks down exactly what went wrong, how an attacker could have exploited it, and what the fix looks like — so you can write safer code and understand why memory safety matters even in high-level language ecosystems.

The Vulnerability Explained

What Is a Buffer Overflow?

A buffer overflow occurs when a program writes more data into a memory buffer than it was allocated to hold. The excess data spills into adjacent memory regions, potentially overwriting critical data structures, return addresses, or function pointers.

In C, the strcpy function is a notorious offender:

// DANGEROUS: strcpy copies until it hits a null terminator.
// It has NO idea how big the destination buffer is.
strcpy(destination, source);

strcpy will copy every byte from source into destination until it encounters a null byte (\0). If source is longer than the space allocated for destination, it keeps writing anyway — straight into whatever memory comes next.

The Vulnerable Code Pattern

At line 92 of ext/oj/fast.c, the parser contained an unsafe strcpy call where the source string was derived directly from attacker-controlled JSON input:

// VULNERABLE (simplified representation)
char dest_buffer[256];  // Fixed-size destination buffer

// 'source' comes from parsed JSON — attacker controls this!
strcpy(dest_buffer, source);  // ❌ No bounds check whatsoever

The critical problem here is the trust boundary violation: data flowing in from a JSON payload (which any external user can craft) is being copied directly into a fixed-size stack or heap buffer without any length validation.

How Could an Attacker Exploit This?

Let's walk through a concrete attack scenario:

Step 1 — Craft a malicious payload

An attacker constructs a JSON document with an abnormally long string value designed to overflow the buffer:

{
  "key": "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA[SHELLCODE_HERE]"
}

Step 2 — Trigger the vulnerable code path

The attacker sends this payload to any endpoint in your application that calls Oj.load(), Oj.safe_load(), or any wrapper that eventually invokes the fast parser.

Step 3 — Memory corruption occurs

When strcpy copies the oversized string:

Memory layout (before overflow):
[dest_buffer: 256 bytes][saved_frame_pointer][return_address][...]

Memory layout (after overflow):
[dest_buffer: 256 bytes][AAAAAAAAAA...][ATTACKER_CONTROLLED_DATA]
                                       ↑
                              Return address overwritten!

Step 4 — Arbitrary code execution

By carefully crafting the overflow payload, a sophisticated attacker can:
- Overwrite the return address on the stack to redirect execution to attacker-supplied shellcode
- Corrupt heap metadata to manipulate future memory allocations
- Overwrite function pointers to hijack control flow
- Achieve Remote Code Execution (RCE) — running arbitrary commands on the server

Real-World Impact

The impact of this vulnerability is severe because:

JSON parsing is ubiquitous — virtually every modern web application parses JSON from external sources (API requests, webhooks, file uploads)
OJ is widely deployed — with tens of millions of downloads, the blast radius is enormous
No authentication required — any unauthenticated endpoint that accepts JSON could be a vector
Full system compromise — successful exploitation means the attacker runs code with the same privileges as your application process

Consider a typical Rails API endpoint:

# This innocent-looking code could trigger the vulnerability
class ApiController < ApplicationController
  def create
    # params parsed via OJ under the hood
    data = Oj.load(request.body.read)
    # process data...
  end
end

An attacker hitting this endpoint with a crafted payload could potentially own the entire server.

The Fix

What Changed

The fix in ext/oj/fast.c removes the unsafe strcpy call and replaces it with a bounds-aware alternative. The correct approach in C is to use strncpy or, better yet, strlcpy (where available), combined with explicit length validation:

// SAFE: Always validate length before copying
size_t src_len = strlen(source);

if (src_len >= sizeof(dest_buffer)) {
    // Handle error: input too long
    rb_raise(rb_eArgError, "string value too long");
    return Qnil;
}

// Now safe — we've verified source fits in destination
strncpy(dest_buffer, source, sizeof(dest_buffer) - 1);
dest_buffer[sizeof(dest_buffer) - 1] = '\0';  // Guarantee null termination

Or using the even safer strlcpy pattern:

// strlcpy always null-terminates and returns the length
// that *would* have been copied — use this to detect truncation
size_t copied = strlcpy(dest_buffer, source, sizeof(dest_buffer));

if (copied >= sizeof(dest_buffer)) {
    // Truncation occurred — handle appropriately
    rb_raise(rb_eArgError, "string value exceeds maximum length");
    return Qnil;
}

Why This Fix Works

The fix addresses the root cause by establishing a trust boundary between external input and internal memory operations:

Aspect	Before (Vulnerable)	After (Fixed)
Length check	❌ None	✅ Explicit validation
Overflow possible	✅ Yes	❌ No
Attacker control	✅ Full	❌ Bounded
Error handling	❌ Silent corruption	✅ Raises exception

The key insight is simple but profound: never trust the length of externally supplied data. Always validate before copying.

Prevention & Best Practices

1. Ban Unsafe String Functions in C Code

Establish a coding standard that prohibits known-unsafe C functions. Many teams use compiler warnings or static analysis to enforce this:

// ❌ NEVER use these without bounds checking:
strcpy(dst, src);
strcat(dst, src);
sprintf(buf, fmt, ...);
gets(buf);

// ✅ USE these safer alternatives:
strncpy(dst, src, sizeof(dst) - 1);
strncat(dst, src, sizeof(dst) - strlen(dst) - 1);
snprintf(buf, sizeof(buf), fmt, ...);
fgets(buf, sizeof(buf), stdin);

2. Validate All External Input Before Processing

Every byte that crosses a trust boundary (network, file, user input) must be validated:

// Validate length BEFORE any memory operation
if (input_length > MAX_ALLOWED_LENGTH) {
    return ERROR_INPUT_TOO_LONG;
}

3. Use Memory-Safe Languages Where Possible

Consider whether the performance-critical path actually needs to be in C. Languages like Rust provide memory safety guarantees at the language level:

// Rust makes buffer overflows impossible by design
fn process_json_string(input: &str) -> Result<String, Error> {
    if input.len() > MAX_LENGTH {
        return Err(Error::InputTooLong);
    }
    // Safe string operations — no manual memory management
    Ok(input.to_string())
}

4. Enable Compiler Protections

Modern compilers and operating systems provide multiple layers of protection:

# Enable stack canaries, ASLR, and other protections
gcc -fstack-protector-strong \
    -D_FORTIFY_SOURCE=2 \
    -pie -fPIE \
    -Wl,-z,relro,-z,now \
    your_code.c

These don't prevent the vulnerability, but they make exploitation significantly harder.

5. Use Static Analysis Tools

Integrate static analysis into your CI/CD pipeline to catch these issues automatically:

# Example: GitHub Actions with CodeQL
- name: Initialize CodeQL
  uses: github/codeql-action/init@v2
  with:
    languages: cpp
    queries: security-extended

- name: Perform CodeQL Analysis
  uses: github/codeql-action/analyze@v2

Other excellent tools for C/C++ analysis:
- Coverity — deep static analysis for C/C++
- AddressSanitizer (ASan) — runtime memory error detection
- Valgrind — memory debugging and leak detection
- Clang Static Analyzer — built into the LLVM toolchain

6. Fuzz Test Your Parsers

Parsers are especially vulnerable to malformed input. Fuzzing automatically generates edge-case inputs:

# Using AFL++ for fuzzing
afl-fuzz -i input_corpus/ -o findings/ -- ./your_parser @@

Security Standards Reference

This vulnerability maps to several well-known security standards:

CWE-120: Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')
CWE-121: Stack-based Buffer Overflow
OWASP A03:2021: Injection (includes memory injection)
CERT C Coding Standard STR31-C: Guarantee that storage for strings has sufficient space for character data and the null terminator

Conclusion

This vulnerability in OJ's fast.c is a stark reminder that security vulnerabilities don't respect language boundaries. You might write beautiful, safe Ruby code all day long, but if a C extension underneath you has an unsafe strcpy, your entire application's security posture is compromised.

The key takeaways from this vulnerability:

Unsafe C functions like strcpy are never acceptable when handling externally-supplied data — full stop
Trust boundaries matter: data from JSON payloads is attacker-controlled and must be treated with suspicion
Defense in depth works: compiler protections like stack canaries and ASLR raise the bar for exploitation even when bugs slip through
Automated tooling catches what humans miss: static analysis and fuzzing would have flagged this issue before it reached production
Memory safety is a first-class concern: when performance allows, prefer memory-safe languages for parsing untrusted input

The security community's ability to find, responsibly disclose, and patch vulnerabilities like this one is what keeps the open-source ecosystem trustworthy. If you maintain C extensions or native libraries, consider auditing your code for similar patterns today — your users are counting on you.

Found a security vulnerability? Practice responsible disclosure by contacting the project maintainers privately before going public. Most projects have a SECURITY.md or a security contact in their repository.

This post was generated as part of an automated security fix workflow by OrbisAI Security. Automated detection + human review = faster, safer software for everyone.

cwe	CWE-120
fix	Replaced unbounded strcpy() with a length-checked copy operation, eliminating the unconstrained write
risk	Heap/stack corruption leading to arbitrary code execution
language	C (Ruby native extension)
root cause	strcpy() used on attacker-controlled JSON string data with no bounds check on the destination buffer
vulnerability	Buffer Overflow via unbounded strcpy()

Critical Buffer Overflow in OJ's fast.c: How an Unsafe strcpy Nearly Opened the Door to RCE

Answer Summary

Vulnerability at a Glance

Critical Buffer Overflow in OJ's fast.c: How an Unsafe strcpy Nearly Opened the Door to RCE

Introduction

The Vulnerability Explained

What Is a Buffer Overflow?

The Vulnerable Code Pattern

How Could an Attacker Exploit This?

Real-World Impact

The Fix

What Changed

Why This Fix Works

Prevention & Best Practices

1. Ban Unsafe String Functions in C Code

2. Validate All External Input Before Processing

3. Use Memory-Safe Languages Where Possible

4. Enable Compiler Protections

5. Use Static Analysis Tools

6. Fuzz Test Your Parsers

Security Standards Reference

Conclusion

Frequently Asked Questions

What is a buffer overflow in a C extension?

How do you prevent buffer overflow in C Ruby extensions?

What CWE is buffer overflow?

Is input validation alone enough to prevent buffer overflow in C?

Can static analysis detect buffer overflow from strcpy()?

View the Security Fix

Related Articles

How buffer overflow happens in C tar header parsing and how to fix it

How buffer overflow happens in C ieee80211_input() and how to fix it

How buffer overflow from unsafe string copy functions happens in C network interface code and how to fix it

How buffer overflow in FuzzIxml.c sprintf() happens in C and how to fix it

How buffer overflow happens in C HTML parsing and how to fix it

How buffer overflow in memcpy() happens in Node.js N-API bindings and how to fix it