What is unbounded input size denial-of-service?

A vulnerability where an application accepts arbitrarily large input without size limits, allowing attackers to exhaust CPU, memory, or other system resources by submitting oversized data.

How do you prevent unbounded input DoS in C?

Validate input sizes against defined upper bounds before processing, use configurable limits, and reject oversized input early with appropriate error messages.

What CWE is unbounded input denial-of-service?

CWE-400 (Uncontrolled Resource Consumption) covers this class of vulnerability where resource consumption is not properly limited.

Is checking input size at the API boundary enough to prevent this?

Yes, validating input size before any resource-intensive processing begins is the most effective mitigation, as it prevents the expensive operations from ever starting.

Can static analysis detect unbounded input vulnerabilities?

Yes, static analyzers can flag functions that accept size parameters without validation, especially when those parameters flow into loops or memory allocations.

PH7 Lexer DoS Input Size Limit Fix

Introduction

In the PH7 PHP interpreter codebase, we discovered a high-severity denial-of-service vulnerability in src/ph7/lex.c. The PH7_TokenizePHP() function, which handles lexical analysis of PHP source code, accepted input sizes up to approximately 4GB through its nLen parameter—with absolutely no upper bound validation.

This matters because the lexer iterates over every byte of input, allocating a SyToken structure for each token into the pOut SySet. When an attacker submits gigabyte-scale input, the system consumes proportional CPU time and memory, potentially crashing the application or making it unresponsive.

The vulnerability was particularly dangerous because the public API functions ph7_compile() and ph7_compile_v2() passed caller-supplied nLen values directly to the tokenizer without any size cap whatsoever.

The Vulnerability Explained

The root cause lies in how the PH7 compilation pipeline handled input sizes. When a user calls ph7_compile() with PHP source code, the input length flows through ProcessScript() and eventually reaches PH7_TokenizePHP().

The nLen parameter is defined as sxu32 (an unsigned 32-bit integer), which means it can represent values up to approximately 4.29 billion bytes (~4GB). The lexer then processes this input byte-by-byte:

// Conceptual flow - lexer iterates over entire input
for (each byte in input of length nLen) {
    // Allocate SyToken structure
    // Add to pOut SySet
    // Process character
}

The Attack Scenario

An attacker with access to the PH7 compilation API—either through direct library calls or via an embedding application that exposes compilation functionality—could exploit this vulnerability:

Craft malicious input: Create a 1GB+ PHP source file containing repeated valid tokens (e.g., millions of $a variable references)
Submit for compilation: Pass this oversized input to ph7_compile() or ph7_compile_v2()
Resource exhaustion: The lexer attempts to process every byte, allocating memory for each token
System impact: CPU spikes to 100%, memory consumption grows unbounded, system becomes unresponsive

For a local CLI tool like PH7, exploitation requires the attacker to control command-line arguments or input files. However, in embedded scenarios where PH7 processes untrusted PHP code, this becomes a serious availability risk.

Real-World Impact

The impact is denial-of-service through resource exhaustion:
- Memory exhaustion: Each token requires a SyToken allocation; millions of tokens = gigabytes of memory
- CPU exhaustion: Linear time complexity means processing 1GB takes ~1000x longer than 1MB
- Service unavailability: The application hangs or crashes, affecting all users

The Fix

The fix introduces input size validation at the earliest possible point—before the lexer ever touches the data. Here's what changed in src/ph7/api.c:

New Configuration Option

A new configuration verb PH7_CONFIG_MAX_INPUT was added to allow runtime customization of the limit:

case PH7_CONFIG_MAX_INPUT: {
    /* Per-compile input byte cap (0 = use PH7_MAX_INPUT_SIZE default). */
    unsigned int nMax = va_arg(ap,unsigned int);
    pEngine->xConf.nMaxInput = (sxu32)nMax;
    break;
}

Input Validation in ProcessScript()

The critical fix adds size validation in ProcessScript() before any compilation occurs:

/* Enforce input size cap before touching the lexer/compiler */
{
    sxu32 nLimit = pEngine->xConf.nMaxInput ? pEngine->xConf.nMaxInput : PH7_MAX_INPUT_SIZE;
    if( SyStringLength(pScript) > nLimit ){
        PH7_GenCompileError(&pVm->sCodeGen,E_ERROR,1,
            "Input size (%u bytes) exceeds the configured limit (%u bytes)",
            SyStringLength(pScript),nLimit);
    }
}
/* Compile the script */
if( pVm->sCodeGen.nErr == 0 ){
    PH7_CompileScript(pVm,&(*pScript),iFlags);
}

Before vs. After

Before (Vulnerable):

/* Reset the error message consumer */
SyBlobReset(&pEngine->xConf.sErrConsumer);
/* Compile the script - NO SIZE CHECK! */
PH7_CompileScript(pVm,&(*pScript),iFlags);

After (Fixed):

/* Reset the error message consumer */
SyBlobReset(&pEngine->xConf.sErrConsumer);
/* Enforce input size cap before touching the lexer/compiler */
{
    sxu32 nLimit = pEngine->xConf.nMaxInput ? pEngine->xConf.nMaxInput : PH7_MAX_INPUT_SIZE;
    if( SyStringLength(pScript) > nLimit ){
        PH7_GenCompileError(&pVm->sCodeGen,E_ERROR,1,
            "Input size (%u bytes) exceeds the configured limit (%u bytes)",
            SyStringLength(pScript),nLimit);
    }
}
/* Compile the script */
if( pVm->sCodeGen.nErr == 0 ){
    PH7_CompileScript(pVm,&(*pScript),iFlags);
}

Test Infrastructure Update

The Makefile was also updated to include the new PHL_MAX_INPUT environment variable for stress testing:

-TEST_STRESS_CMD = PHL_MAX_ALLOC=1048576 "$(PHL_BIN)" "tests/phpt.php" \
+TEST_STRESS_CMD = PHL_MAX_ALLOC=1048576 PHL_MAX_INPUT=32768 "$(PHL_BIN)" "tests/phpt.php" \

This ensures stress tests run with a 32KB input limit, preventing test infrastructure from being affected by oversized inputs.

Prevention & Best Practices

1. Validate Input Sizes Early

Always check input sizes at the API boundary, before any processing begins:

#define MAX_SAFE_INPUT_SIZE (1024 * 1024)  // 1MB

int process_input(const char *data, size_t len) {
    if (len > MAX_SAFE_INPUT_SIZE) {
        return ERROR_INPUT_TOO_LARGE;
    }
    // Now safe to process
}

2. Make Limits Configurable

Different deployments have different resource constraints. Provide configuration options:

// Allow runtime configuration with sensible defaults
size_t max_input = config_get_max_input();
if (max_input == 0) {
    max_input = DEFAULT_MAX_INPUT;
}

3. Document Resource Limits

Clearly document the limits in your API documentation so integrators know what to expect.

4. Use Static Analysis

Tools like Semgrep can flag functions that accept size parameters without validation. Create rules to catch patterns like:

rules:
  - id: unbounded-size-parameter
    patterns:
      - pattern: $FUNC(..., $SIZE, ...)
      - pattern-not: if ($SIZE > $LIMIT) { ... }
    message: "Size parameter used without upper bound check"

Key Takeaways

Never pass user-controlled size parameters to resource-intensive functions without validation—the nLen parameter in PH7_TokenizePHP() should have been validated before reaching the lexer
Validate at the API boundary—the fix in ProcessScript() catches oversized input before PH7_CompileScript() is ever called
Make security limits configurable—the new PH7_CONFIG_MAX_INPUT option allows different deployments to set appropriate limits
Linear-time algorithms become DoS vectors with unbounded input—even O(n) complexity is dangerous when n can be 4 billion
Test with adversarial input sizes—the regression test explicitly checks behavior with 1MB input and verifies bounded execution time

How Orbis AppSec Detected This

Source: Caller-supplied nLen parameter passed to ph7_compile() and ph7_compile_v2() public API functions
Sink: PH7_TokenizePHP() function in src/ph7/lex.c:1100 which iterates over input and allocates tokens proportional to input size
Missing control: No upper bound validation on input size before lexer processing
CWE: CWE-400 (Uncontrolled Resource Consumption)
Fix: Added input size validation in ProcessScript() that compares input length against PH7_MAX_INPUT_SIZE (or configured limit) and generates a compile error for oversized input

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

This vulnerability demonstrates why input validation must happen at the earliest possible point in your processing pipeline. The PH7 lexer's acceptance of unbounded input sizes created a denial-of-service risk that could exhaust system resources with a single malicious compilation request.

The fix is straightforward but effective: validate input size before any expensive operations begin, provide sensible defaults, and allow configuration for different deployment scenarios. By catching oversized input in ProcessScript() before it reaches PH7_CompileScript(), the system now fails fast with a clear error message instead of slowly consuming all available resources.

When building systems that process untrusted input—especially compilers, parsers, and interpreters—always ask: "What happens if someone sends me 4GB of data?" If the answer involves proportional resource consumption, you need bounds checking.

cwe	CWE-400 (Uncontrolled Resource Consumption)
fix	Added input size validation in ProcessScript() with configurable PH7_MAX_INPUT_SIZE limit
risk	System resource exhaustion leading to service unavailability
language	C
root cause	No upper bound validation on nLen parameter before lexer processing
vulnerability	Unbounded Input Size Denial-of-Service

How unbounded input size denial-of-service happens in C lexer functions and how to fix it

Answer Summary

Vulnerability at a Glance

Introduction

The Vulnerability Explained

The Attack Scenario

Real-World Impact

The Fix

New Configuration Option

Input Validation in ProcessScript()

Before vs. After

Test Infrastructure Update

Prevention & Best Practices

1. Validate Input Sizes Early

2. Make Limits Configurable

3. Document Resource Limits

4. Use Static Analysis

Key Takeaways

How Orbis AppSec Detected This

Conclusion

References

Frequently Asked Questions

What is unbounded input size denial-of-service?

How do you prevent unbounded input DoS in C?

What CWE is unbounded input denial-of-service?

Is checking input size at the API boundary enough to prevent this?

Can static analysis detect unbounded input vulnerabilities?

View the Security Fix

Related Articles

How buffer overflow via insecure strcpy/strncpy happens in C textbox widgets and how to fix it

How buffer overflow via sprintf happens in C++ fuzzer code and how to fix it

How buffer overflow in memcpy happens in C bios_disk.h and how to fix it

How command injection happens in Python subprocess and how to fix it

How path traversal happens in Python os.path and how to fix it

How form limit bypass DoS happens in Python Starlette and how to fix it