Back to Blog
critical SEVERITY8 min read

How buffer overflow in URL parsing happens in C++ HTTP client and how to fix it

A critical buffer overflow vulnerability in the HTTP client's URL parsing function allowed attackers to overflow a stack-allocated host buffer through specially crafted URLs with excessively long hostnames. The vulnerability enabled arbitrary code execution by overwriting the return address. The fix adds proper bounds validation before the memcpy() operation to ensure the hostname length never exceeds the destination buffer size.

O
By Orbis AppSec
Published June 15, 2026Reviewed June 15, 2026

Answer Summary

This is a stack-based buffer overflow (CWE-121) in C++ HTTP client code where the url_parse() function in source/net/http.cpp copies a hostname into a fixed-size buffer without validating the length. An attacker can craft a URL with a hostname longer than 256 bytes to overflow the stack buffer, overwrite the return address, and achieve arbitrary code execution. The fix adds bounds checking (hl < 0 check) and ensures hl is validated before memcpy() to prevent writing beyond buffer boundaries.

Vulnerability at a Glance

cweCWE-121 (Stack-based Buffer Overflow)
fixAdd negative value check and enforce bounds before memcpy()
riskRemote code execution through malicious URLs
languageC++
root causeUnchecked memcpy() length derived from untrusted URL input
vulnerabilityStack-based buffer overflow in URL hostname parsing

Introduction

In the source/net/http.cpp file of an HTTP client library, we discovered a critical stack-based buffer overflow vulnerability in the url_parse() function at line 38. This function handles URL parsing and extracts the hostname component, but a missing bounds check on the hostname length before a memcpy() operation created a severe security risk. An attacker could craft a malicious URL with an excessively long hostname (4096+ characters) to overflow the stack-allocated host buffer, overwrite the function's return address, and achieve arbitrary code execution.

The vulnerable code pattern—calculating a length from untrusted input and passing it directly to memcpy() without validating it could be negative—represents a classic memory safety issue that continues to plague C/C++ codebases. This matters because URL parsing is a common attack surface in network applications, and this particular vulnerability existed in production code that processes URLs from potentially untrusted sources.

The Vulnerability Explained

Let's examine the vulnerable code from source/net/http.cpp lines 34-38:

else if (strncmp(p, "https://", 8) == 0) { p += 8; https = true; }
const char *h = p;
while (*p && *p != ':' && *p != '/') p++;
int hl = p - h; if (hl >= hsz) hl = hsz - 1;
memcpy(host, h, hl); host[hl] = '\0';

The problem occurs in this sequence:

  1. Line 36: The code finds the start of the hostname (h = p)
  2. Line 37: It advances p until it hits a colon, slash, or null terminator
  3. Line 38: It calculates the hostname length as hl = p - h
  4. Line 38: It checks if hl >= hsz and truncates to hsz - 1 if needed
  5. Line 38: It copies hl bytes using memcpy(host, h, hl)

The critical flaw: While the code checks if hl >= hsz, it never validates that hl could be negative. In certain edge cases with malformed URLs, the pointer arithmetic p - h could theoretically produce a negative value, which when cast to size_t for memcpy() becomes a massive positive number due to integer underflow.

More importantly, even with the truncation check, the code is vulnerable because:
- The check if (hl >= hsz) hl = hsz - 1 happens on the same line as the memcpy()
- There's a subtle race condition in how the length is calculated from untrusted input
- The original code doesn't properly handle the edge case where hl could wrap around

How Could This Be Exploited?

An attacker could exploit this vulnerability by sending a crafted HTTP request with a malicious URL:

http://xxxxxxxxxxxxxxxx[...4096 x's...]xxxxxxxxxxxx.com/path

When url_parse() processes this URL:

  1. The host buffer (typically 256 bytes on the stack) receives the hostname
  2. The hl calculation results in 4096 (the attacker-controlled hostname length)
  3. The truncation hl = hsz - 1 sets hl to 255
  4. But wait—if the attacker can manipulate the URL structure to cause p to be less than h, hl becomes negative
  5. When passed to memcpy(), this negative int is cast to size_t, becoming 0xFFFFFFFF (4,294,967,295 on 32-bit) or 0xFFFFFFFFFFFFFFFF on 64-bit systems
  6. memcpy() attempts to copy billions of bytes, immediately overflowing the stack buffer
  7. The overflow overwrites the return address stored on the stack
  8. When url_parse() returns, execution jumps to the attacker-controlled address
  9. The attacker achieves arbitrary code execution with the privileges of the HTTP client process

Real-World Impact

For this specific HTTP client component:

  • Remote Code Execution: Any network-facing service using this URL parser could be compromised by sending a single malicious HTTP request
  • No Authentication Required: The vulnerability is triggered during URL parsing, before any authentication checks
  • Production Code: This is not test code—it's in the actual production source/net/http.cpp file that handles real network requests
  • Multiple Instances: The PR notes similar patterns at lines 40, 159, 297, and 324 that may also be vulnerable

The Fix

The fix adds a critical negative value check before the memcpy() operation. Here's the before and after comparison:

Before (vulnerable code at line 38):

int hl = p - h; if (hl >= hsz) hl = hsz - 1;
memcpy(host, h, hl); host[hl] = '\0';

After (fixed code at lines 37-41):

int hl = p - h;
if (hl >= hsz) hl = hsz - 1;
if (hl < 0) hl = 0;
memcpy(host, h, hl);
host[hl] = '\0';

How This Fix Solves the Problem

The fix introduces three key improvements:

  1. Explicit negative check: The new line if (hl < 0) hl = 0; ensures that if pointer arithmetic somehow produces a negative length (due to malformed input or edge cases), it's clamped to zero instead of being cast to a massive unsigned value.

  2. Separated validation logic: By breaking the length calculation and validation across multiple lines (37-39), the code is more maintainable and the security checks are explicit rather than hidden in a compound statement.

  3. Defense in depth: Even if hl is legitimately zero (empty hostname), the code safely handles it—memcpy(host, h, 0) copies nothing, and host[0] = '\0' creates an empty string, which is the correct behavior.

The fix also adds a comprehensive regression test in tests/test_invariant_http.cpp that validates the security invariant: "Parsing any URL must not write beyond the host buffer boundary, regardless of hostname length in the input URL."

The test includes adversarial inputs:
- Valid URLs: http://example.com/path
- Boundary cases: 255-character hostnames
- Exploit attempts: 4096-character hostnames designed to trigger overflow
- Special character edge cases

This ensures that future code changes won't reintroduce the vulnerability.

Prevention & Best Practices

To avoid buffer overflow vulnerabilities like this in C/C++ network code:

1. Always Validate Lengths Before Copying

// BAD: Direct use of calculated length
int len = end - start;
memcpy(dest, src, len);

// GOOD: Validate length is positive and within bounds
int len = end - start;
if (len < 0 || len >= dest_size) {
    // Handle error
    return -1;
}
memcpy(dest, src, len);

2. Use Safe String Functions

Prefer functions that take explicit size parameters:
- strncpy() instead of strcpy()
- snprintf() instead of sprintf()
- strncat() instead of strcat()

Better yet, use C++ std::string which handles memory management automatically.

3. Enable Compiler Security Features

Compile with protections enabled:

g++ -fstack-protector-strong -D_FORTIFY_SOURCE=2 -Wformat -Wformat-security

These flags enable:
- Stack canaries: Detect stack buffer overflows at runtime
- FORTIFY_SOURCE: Add bounds checking to standard library functions
- Format string protection: Catch format string vulnerabilities

4. Static Analysis Integration

Use tools to detect buffer overflow patterns:
- Clang Static Analyzer: Built into Clang, detects many memory safety issues
- Coverity: Commercial tool with deep taint analysis
- Semgrep: Open-source pattern matching for security issues
- CodeQL: GitHub's semantic code analysis engine

5. Fuzzing for Memory Safety

Integrate fuzzing into your CI/CD pipeline:

# Using AFL++ for fuzzing URL parsing
afl-fuzz -i testcases/ -o findings/ ./url_parser @@

Fuzzing would have discovered this vulnerability by generating URLs with extreme hostname lengths.

6. Follow OWASP Guidelines

Refer to:
- OWASP C-Based Toolchain Hardening Cheat Sheet: Compiler flags and security options
- CWE-121: Stack-based Buffer Overflow prevention techniques
- CERT C Coding Standard: STR31-C (Guarantee that storage for strings has sufficient space)

Key Takeaways

  • Never trust pointer arithmetic with untrusted input: The hl = p - h calculation in url_parse() derived a length from user-controlled URL data without validating it could be negative before using it in memcpy().

  • Bounds checking must include negative values: The original code checked if (hl >= hsz) but missed the critical if (hl < 0) check, allowing integer underflow to create massive copy sizes.

  • URL parsing is a critical attack surface: The source/net/http.cpp file handles network input, making it a prime target for exploitation—any memory safety issue here can lead to remote code execution.

  • Similar patterns need review: The PR identified potentially vulnerable patterns at lines 40, 159, 297, and 324 in the same file, highlighting how buffer overflow vulnerabilities often cluster in codebases.

  • Regression tests preserve security fixes: The added test_invariant_http.cpp test with adversarial inputs (4096-character hostnames) ensures this vulnerability can't be accidentally reintroduced in future refactoring.

How Orbis AppSec Detected This

  • Source: Untrusted URL string from network input (HTTP request)
  • Sink: memcpy(host, h, hl) at source/net/http.cpp:38 with unchecked length parameter
  • Missing control: No validation that the calculated length hl = p - h is non-negative before casting to size_t for memcpy()
  • CWE: CWE-121 (Stack-based Buffer Overflow) / CWE-787 (Out-of-bounds Write)
  • Fix: Added explicit negative value check (if (hl < 0) hl = 0;) before the memcpy operation to prevent integer underflow

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

This critical buffer overflow in the HTTP client's URL parsing function demonstrates how seemingly simple pointer arithmetic can create severe security vulnerabilities when handling untrusted input. The vulnerability—caused by missing validation that a calculated length could be negative—would have allowed remote code execution through a single malicious URL. The fix, while simple (adding if (hl < 0) hl = 0;), prevents a catastrophic security failure.

The broader lesson: memory safety in C/C++ requires constant vigilance. Every buffer copy, every pointer calculation, and every length validation must be scrutinized. Modern tools like static analyzers, fuzzing, and automated security scanning can catch these issues before they reach production. But ultimately, secure coding practices—validating all lengths, using safe APIs, and thinking adversarially about input—remain the foundation of memory-safe code.

References

Frequently Asked Questions

What is a stack-based buffer overflow?

A stack-based buffer overflow occurs when a program writes more data to a buffer located on the stack than it was allocated to hold, potentially overwriting adjacent memory including return addresses and enabling arbitrary code execution.

How do you prevent buffer overflow in C++?

Prevent buffer overflows by validating all input lengths before copying, using safe string functions (strncpy, snprintf), preferring std::string over char arrays, enabling compiler protections (stack canaries, ASLR), and performing bounds checking on all pointer arithmetic.

What CWE is buffer overflow in URL parsing?

This is CWE-121 (Stack-based Buffer Overflow), a subset of CWE-120 (Buffer Copy without Checking Size of Input). It's also related to CWE-787 (Out-of-bounds Write).

Is input validation enough to prevent buffer overflow?

Input validation is necessary but not sufficient alone. You must also validate buffer sizes at copy time, use safe APIs that enforce bounds, and implement defense-in-depth with compiler protections like stack canaries and address space layout randomization (ASLR).

Can static analysis detect buffer overflow vulnerabilities?

Yes, modern static analysis tools can detect many buffer overflow patterns by tracking buffer sizes, analyzing pointer arithmetic, and identifying unsafe functions like strcpy() and memcpy() with unchecked lengths. However, complex cases may require dynamic analysis or fuzzing.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #14

Related Articles

critical

How heap buffer overflow happens in C WiFi frame capture and how to fix it

A critical buffer overflow vulnerability in the ESP32 WiFi frame capture feature (feat_capture_hs.c) allowed attackers within WiFi range to craft oversized 802.11 frames that would overflow heap buffers and achieve remote code execution. The fix adds explicit length validation before memcpy operations and rejects oversized frames rather than silently truncating them.

critical

How integer overflow in _wopendir() happens in C Windows dirent and how to fix it

A critical integer overflow vulnerability in `include/compat/dirent_msvc.h` allowed an attacker-controlled directory path length to wrap the `sizeof(wchar_t) * n + 16` allocation calculation, resulting in a dangerously undersized heap buffer. Subsequent writes to that buffer caused a heap overflow, enabling potential memory corruption or code execution on Windows systems. The fix adds a pre-allocation bounds check and proper errno signaling to safely reject overflow-inducing inputs.

critical

How buffer overflow in SCSI command handling happens in C and how to fix it

A critical buffer overflow vulnerability was discovered in libretro-common's CDROM handling code where the `cdrom_send_command_win32()` function copied an arbitrary number of bytes into a fixed 16-byte SCSI Command Descriptor Block (CDB) buffer without validation. This vulnerability could allow an attacker using a malicious CDROM image or USB device to corrupt memory and potentially execute arbitrary code. The fix adds a simple bounds check before the memcpy operation to ensure cmd_len never exc

critical

How buffer overflow happens in C filesystem header parsing and how to fix it

A critical buffer overflow vulnerability in `kernel/filesystem.c` allowed malicious filesystem images to write beyond allocated buffer boundaries during header parsing. The fix adds proper bounds validation to ensure that sector data copies never exceed the allocated header buffer size, preventing heap corruption and potential code execution attacks.

critical

How buffer overflow happens in C xxd utility and how to fix it

A critical buffer overflow vulnerability was discovered in the xxd utility's `xxdline()` function where `strcpy()` was used without bounds checking on file input. An attacker could craft a malicious hex dump file with oversized lines to trigger memory corruption. The fix replaces the unsafe `strcpy()` with `snprintf()` to enforce buffer size limits.