Back to Blog
high SEVERITY6 min read

How unbounded input size denial-of-service happens in C lexer functions and how to fix it

A high-severity denial-of-service vulnerability was discovered in the PH7 lexer where the `PH7_TokenizePHP()` function accepted arbitrarily large input sizes without validation. An attacker could submit gigabyte-scale PHP code, causing proportional CPU and memory exhaustion. The fix introduces a configurable input size cap enforced before lexer processing begins.

O
By Orbis AppSec
Published June 23, 2026Reviewed June 23, 2026

Answer Summary

This is an unbounded input size denial-of-service vulnerability (CWE-400) in the C-based PH7 PHP lexer. The `PH7_TokenizePHP()` function accepted a `nLen` parameter of up to ~4GB with no upper bound validation, allowing attackers to exhaust system resources. The fix adds a `PH7_MAX_INPUT_SIZE` constant and validates input length in `ProcessScript()` before invoking the lexer, rejecting oversized input with a compile error.

Vulnerability at a Glance

cweCWE-400 (Uncontrolled Resource Consumption)
fixAdded input size validation in ProcessScript() with configurable PH7_MAX_INPUT_SIZE limit
riskSystem resource exhaustion leading to service unavailability
languageC
root causeNo upper bound validation on nLen parameter before lexer processing
vulnerabilityUnbounded Input Size Denial-of-Service

Introduction

In the PH7 PHP interpreter codebase, we discovered a high-severity denial-of-service vulnerability in src/ph7/lex.c. The PH7_TokenizePHP() function, which handles lexical analysis of PHP source code, accepted input sizes up to approximately 4GB through its nLen parameter—with absolutely no upper bound validation.

This matters because the lexer iterates over every byte of input, allocating a SyToken structure for each token into the pOut SySet. When an attacker submits gigabyte-scale input, the system consumes proportional CPU time and memory, potentially crashing the application or making it unresponsive.

The vulnerability was particularly dangerous because the public API functions ph7_compile() and ph7_compile_v2() passed caller-supplied nLen values directly to the tokenizer without any size cap whatsoever.

The Vulnerability Explained

The root cause lies in how the PH7 compilation pipeline handled input sizes. When a user calls ph7_compile() with PHP source code, the input length flows through ProcessScript() and eventually reaches PH7_TokenizePHP().

The nLen parameter is defined as sxu32 (an unsigned 32-bit integer), which means it can represent values up to approximately 4.29 billion bytes (~4GB). The lexer then processes this input byte-by-byte:

// Conceptual flow - lexer iterates over entire input
for (each byte in input of length nLen) {
    // Allocate SyToken structure
    // Add to pOut SySet
    // Process character
}

The Attack Scenario

An attacker with access to the PH7 compilation API—either through direct library calls or via an embedding application that exposes compilation functionality—could exploit this vulnerability:

  1. Craft malicious input: Create a 1GB+ PHP source file containing repeated valid tokens (e.g., millions of $a variable references)
  2. Submit for compilation: Pass this oversized input to ph7_compile() or ph7_compile_v2()
  3. Resource exhaustion: The lexer attempts to process every byte, allocating memory for each token
  4. System impact: CPU spikes to 100%, memory consumption grows unbounded, system becomes unresponsive

For a local CLI tool like PH7, exploitation requires the attacker to control command-line arguments or input files. However, in embedded scenarios where PH7 processes untrusted PHP code, this becomes a serious availability risk.

Real-World Impact

The impact is denial-of-service through resource exhaustion:
- Memory exhaustion: Each token requires a SyToken allocation; millions of tokens = gigabytes of memory
- CPU exhaustion: Linear time complexity means processing 1GB takes ~1000x longer than 1MB
- Service unavailability: The application hangs or crashes, affecting all users

The Fix

The fix introduces input size validation at the earliest possible point—before the lexer ever touches the data. Here's what changed in src/ph7/api.c:

New Configuration Option

A new configuration verb PH7_CONFIG_MAX_INPUT was added to allow runtime customization of the limit:

case PH7_CONFIG_MAX_INPUT: {
    /* Per-compile input byte cap (0 = use PH7_MAX_INPUT_SIZE default). */
    unsigned int nMax = va_arg(ap,unsigned int);
    pEngine->xConf.nMaxInput = (sxu32)nMax;
    break;
}

Input Validation in ProcessScript()

The critical fix adds size validation in ProcessScript() before any compilation occurs:

/* Enforce input size cap before touching the lexer/compiler */
{
    sxu32 nLimit = pEngine->xConf.nMaxInput ? pEngine->xConf.nMaxInput : PH7_MAX_INPUT_SIZE;
    if( SyStringLength(pScript) > nLimit ){
        PH7_GenCompileError(&pVm->sCodeGen,E_ERROR,1,
            "Input size (%u bytes) exceeds the configured limit (%u bytes)",
            SyStringLength(pScript),nLimit);
    }
}
/* Compile the script */
if( pVm->sCodeGen.nErr == 0 ){
    PH7_CompileScript(pVm,&(*pScript),iFlags);
}

Before vs. After

Before (Vulnerable):

/* Reset the error message consumer */
SyBlobReset(&pEngine->xConf.sErrConsumer);
/* Compile the script - NO SIZE CHECK! */
PH7_CompileScript(pVm,&(*pScript),iFlags);

After (Fixed):

/* Reset the error message consumer */
SyBlobReset(&pEngine->xConf.sErrConsumer);
/* Enforce input size cap before touching the lexer/compiler */
{
    sxu32 nLimit = pEngine->xConf.nMaxInput ? pEngine->xConf.nMaxInput : PH7_MAX_INPUT_SIZE;
    if( SyStringLength(pScript) > nLimit ){
        PH7_GenCompileError(&pVm->sCodeGen,E_ERROR,1,
            "Input size (%u bytes) exceeds the configured limit (%u bytes)",
            SyStringLength(pScript),nLimit);
    }
}
/* Compile the script */
if( pVm->sCodeGen.nErr == 0 ){
    PH7_CompileScript(pVm,&(*pScript),iFlags);
}

Test Infrastructure Update

The Makefile was also updated to include the new PHL_MAX_INPUT environment variable for stress testing:

-TEST_STRESS_CMD = PHL_MAX_ALLOC=1048576 "$(PHL_BIN)" "tests/phpt.php" \
+TEST_STRESS_CMD = PHL_MAX_ALLOC=1048576 PHL_MAX_INPUT=32768 "$(PHL_BIN)" "tests/phpt.php" \

This ensures stress tests run with a 32KB input limit, preventing test infrastructure from being affected by oversized inputs.

Prevention & Best Practices

1. Validate Input Sizes Early

Always check input sizes at the API boundary, before any processing begins:

#define MAX_SAFE_INPUT_SIZE (1024 * 1024)  // 1MB

int process_input(const char *data, size_t len) {
    if (len > MAX_SAFE_INPUT_SIZE) {
        return ERROR_INPUT_TOO_LARGE;
    }
    // Now safe to process
}

2. Make Limits Configurable

Different deployments have different resource constraints. Provide configuration options:

// Allow runtime configuration with sensible defaults
size_t max_input = config_get_max_input();
if (max_input == 0) {
    max_input = DEFAULT_MAX_INPUT;
}

3. Document Resource Limits

Clearly document the limits in your API documentation so integrators know what to expect.

4. Use Static Analysis

Tools like Semgrep can flag functions that accept size parameters without validation. Create rules to catch patterns like:

rules:
  - id: unbounded-size-parameter
    patterns:
      - pattern: $FUNC(..., $SIZE, ...)
      - pattern-not: if ($SIZE > $LIMIT) { ... }
    message: "Size parameter used without upper bound check"

Key Takeaways

  • Never pass user-controlled size parameters to resource-intensive functions without validation—the nLen parameter in PH7_TokenizePHP() should have been validated before reaching the lexer
  • Validate at the API boundary—the fix in ProcessScript() catches oversized input before PH7_CompileScript() is ever called
  • Make security limits configurable—the new PH7_CONFIG_MAX_INPUT option allows different deployments to set appropriate limits
  • Linear-time algorithms become DoS vectors with unbounded input—even O(n) complexity is dangerous when n can be 4 billion
  • Test with adversarial input sizes—the regression test explicitly checks behavior with 1MB input and verifies bounded execution time

How Orbis AppSec Detected This

  • Source: Caller-supplied nLen parameter passed to ph7_compile() and ph7_compile_v2() public API functions
  • Sink: PH7_TokenizePHP() function in src/ph7/lex.c:1100 which iterates over input and allocates tokens proportional to input size
  • Missing control: No upper bound validation on input size before lexer processing
  • CWE: CWE-400 (Uncontrolled Resource Consumption)
  • Fix: Added input size validation in ProcessScript() that compares input length against PH7_MAX_INPUT_SIZE (or configured limit) and generates a compile error for oversized input

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

This vulnerability demonstrates why input validation must happen at the earliest possible point in your processing pipeline. The PH7 lexer's acceptance of unbounded input sizes created a denial-of-service risk that could exhaust system resources with a single malicious compilation request.

The fix is straightforward but effective: validate input size before any expensive operations begin, provide sensible defaults, and allow configuration for different deployment scenarios. By catching oversized input in ProcessScript() before it reaches PH7_CompileScript(), the system now fails fast with a clear error message instead of slowly consuming all available resources.

When building systems that process untrusted input—especially compilers, parsers, and interpreters—always ask: "What happens if someone sends me 4GB of data?" If the answer involves proportional resource consumption, you need bounds checking.

References

Frequently Asked Questions

What is unbounded input size denial-of-service?

A vulnerability where an application accepts arbitrarily large input without size limits, allowing attackers to exhaust CPU, memory, or other system resources by submitting oversized data.

How do you prevent unbounded input DoS in C?

Validate input sizes against defined upper bounds before processing, use configurable limits, and reject oversized input early with appropriate error messages.

What CWE is unbounded input denial-of-service?

CWE-400 (Uncontrolled Resource Consumption) covers this class of vulnerability where resource consumption is not properly limited.

Is checking input size at the API boundary enough to prevent this?

Yes, validating input size before any resource-intensive processing begins is the most effective mitigation, as it prevents the expensive operations from ever starting.

Can static analysis detect unbounded input vulnerabilities?

Yes, static analyzers can flag functions that accept size parameters without validation, especially when those parameters flow into loops or memory allocations.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #262

Related Articles

high

How buffer overflow via insecure strcpy/strncpy happens in C textbox widgets and how to fix it

A high-severity buffer overflow vulnerability was discovered in the Aroma UI framework's textbox widget where `strncpy()` was used to copy user-provided text without guaranteed null-termination safety. The fix replaces the dangerous `strncpy()` pattern with `snprintf()`, which automatically handles buffer boundaries and null-termination in a single, safer operation.

critical

How buffer overflow via sprintf happens in C++ fuzzer code and how to fix it

A critical buffer overflow vulnerability was discovered in `prog/fuzzing/recog_basic_fuzzer.cc` where `sprintf` writes to a fixed 256-byte buffer without bounds checking. An attacker providing crafted fuzzer input could exploit this to corrupt memory. The fix replaces `sprintf` with `snprintf`, enforcing the buffer size limit and preventing overflow.

critical

How buffer overflow in memcpy happens in C bios_disk.h and how to fix it

A critical buffer overflow vulnerability was discovered in `include/bios_disk.h` at line 474, where a `memcpy` operation copies 512 bytes from a source buffer without properly validating that the calculated offset from the `sectnum` parameter stays within bounds. An attacker controlling the `sectnum` parameter could trigger an out-of-bounds read, potentially leaking sensitive memory contents or causing a crash. The fix adds a proper bounds check before the memcpy call to ensure the source offset

critical

How command injection happens in Python subprocess and how to fix it

A critical command injection vulnerability was discovered in `script/llm_semantic_analyzer.py` at line 394, where user-controlled input (API keys and model parameters) was interpolated directly into shell commands passed to `subprocess.run` with `shell=True`. An attacker who could control these parameters could inject shell metacharacters like `; rm -rf /` or `$(whoami)` to execute arbitrary commands. The fix sanitizes all user input before it reaches shell execution.

critical

How path traversal happens in Python os.path and how to fix it

A critical path traversal vulnerability in the TRL backend allowed attackers to read arbitrary system files like `/etc/passwd` and `/proc/self/environ` through the gRPC fine-tuning API. The `_do_training` method passed user-controlled `dataset_source` directly to `os.path.exists()` and `load_dataset()` without validation. The fix implements strict directory containment checks using `os.path.realpath()` to ensure all file operations stay within allowed directories.

high

How form limit bypass DoS happens in Python Starlette and how to fix it

CVE-2026-54283 is a high-severity denial-of-service vulnerability in Starlette where size limits set on `request.form()` were silently ignored for `application/x-www-form-urlencoded` content, allowing attackers to submit unbounded form data and exhaust server resources. The fix upgrades Starlette from version 1.2.1 to 1.3.1, which correctly enforces form size limits for all content types. Any Python web application using Starlette (including FastAPI-based services) that accepts form submissions