Introduction
In the PH7 PHP interpreter codebase, we discovered a high-severity denial-of-service vulnerability in src/ph7/lex.c. The PH7_TokenizePHP() function, which handles lexical analysis of PHP source code, accepted input sizes up to approximately 4GB through its nLen parameter—with absolutely no upper bound validation.
This matters because the lexer iterates over every byte of input, allocating a SyToken structure for each token into the pOut SySet. When an attacker submits gigabyte-scale input, the system consumes proportional CPU time and memory, potentially crashing the application or making it unresponsive.
The vulnerability was particularly dangerous because the public API functions ph7_compile() and ph7_compile_v2() passed caller-supplied nLen values directly to the tokenizer without any size cap whatsoever.
The Vulnerability Explained
The root cause lies in how the PH7 compilation pipeline handled input sizes. When a user calls ph7_compile() with PHP source code, the input length flows through ProcessScript() and eventually reaches PH7_TokenizePHP().
The nLen parameter is defined as sxu32 (an unsigned 32-bit integer), which means it can represent values up to approximately 4.29 billion bytes (~4GB). The lexer then processes this input byte-by-byte:
// Conceptual flow - lexer iterates over entire input
for (each byte in input of length nLen) {
// Allocate SyToken structure
// Add to pOut SySet
// Process character
}
The Attack Scenario
An attacker with access to the PH7 compilation API—either through direct library calls or via an embedding application that exposes compilation functionality—could exploit this vulnerability:
- Craft malicious input: Create a 1GB+ PHP source file containing repeated valid tokens (e.g., millions of
$avariable references) - Submit for compilation: Pass this oversized input to
ph7_compile()orph7_compile_v2() - Resource exhaustion: The lexer attempts to process every byte, allocating memory for each token
- System impact: CPU spikes to 100%, memory consumption grows unbounded, system becomes unresponsive
For a local CLI tool like PH7, exploitation requires the attacker to control command-line arguments or input files. However, in embedded scenarios where PH7 processes untrusted PHP code, this becomes a serious availability risk.
Real-World Impact
The impact is denial-of-service through resource exhaustion:
- Memory exhaustion: Each token requires a SyToken allocation; millions of tokens = gigabytes of memory
- CPU exhaustion: Linear time complexity means processing 1GB takes ~1000x longer than 1MB
- Service unavailability: The application hangs or crashes, affecting all users
The Fix
The fix introduces input size validation at the earliest possible point—before the lexer ever touches the data. Here's what changed in src/ph7/api.c:
New Configuration Option
A new configuration verb PH7_CONFIG_MAX_INPUT was added to allow runtime customization of the limit:
case PH7_CONFIG_MAX_INPUT: {
/* Per-compile input byte cap (0 = use PH7_MAX_INPUT_SIZE default). */
unsigned int nMax = va_arg(ap,unsigned int);
pEngine->xConf.nMaxInput = (sxu32)nMax;
break;
}
Input Validation in ProcessScript()
The critical fix adds size validation in ProcessScript() before any compilation occurs:
/* Enforce input size cap before touching the lexer/compiler */
{
sxu32 nLimit = pEngine->xConf.nMaxInput ? pEngine->xConf.nMaxInput : PH7_MAX_INPUT_SIZE;
if( SyStringLength(pScript) > nLimit ){
PH7_GenCompileError(&pVm->sCodeGen,E_ERROR,1,
"Input size (%u bytes) exceeds the configured limit (%u bytes)",
SyStringLength(pScript),nLimit);
}
}
/* Compile the script */
if( pVm->sCodeGen.nErr == 0 ){
PH7_CompileScript(pVm,&(*pScript),iFlags);
}
Before vs. After
Before (Vulnerable):
/* Reset the error message consumer */
SyBlobReset(&pEngine->xConf.sErrConsumer);
/* Compile the script - NO SIZE CHECK! */
PH7_CompileScript(pVm,&(*pScript),iFlags);
After (Fixed):
/* Reset the error message consumer */
SyBlobReset(&pEngine->xConf.sErrConsumer);
/* Enforce input size cap before touching the lexer/compiler */
{
sxu32 nLimit = pEngine->xConf.nMaxInput ? pEngine->xConf.nMaxInput : PH7_MAX_INPUT_SIZE;
if( SyStringLength(pScript) > nLimit ){
PH7_GenCompileError(&pVm->sCodeGen,E_ERROR,1,
"Input size (%u bytes) exceeds the configured limit (%u bytes)",
SyStringLength(pScript),nLimit);
}
}
/* Compile the script */
if( pVm->sCodeGen.nErr == 0 ){
PH7_CompileScript(pVm,&(*pScript),iFlags);
}
Test Infrastructure Update
The Makefile was also updated to include the new PHL_MAX_INPUT environment variable for stress testing:
-TEST_STRESS_CMD = PHL_MAX_ALLOC=1048576 "$(PHL_BIN)" "tests/phpt.php" \
+TEST_STRESS_CMD = PHL_MAX_ALLOC=1048576 PHL_MAX_INPUT=32768 "$(PHL_BIN)" "tests/phpt.php" \
This ensures stress tests run with a 32KB input limit, preventing test infrastructure from being affected by oversized inputs.
Prevention & Best Practices
1. Validate Input Sizes Early
Always check input sizes at the API boundary, before any processing begins:
#define MAX_SAFE_INPUT_SIZE (1024 * 1024) // 1MB
int process_input(const char *data, size_t len) {
if (len > MAX_SAFE_INPUT_SIZE) {
return ERROR_INPUT_TOO_LARGE;
}
// Now safe to process
}
2. Make Limits Configurable
Different deployments have different resource constraints. Provide configuration options:
// Allow runtime configuration with sensible defaults
size_t max_input = config_get_max_input();
if (max_input == 0) {
max_input = DEFAULT_MAX_INPUT;
}
3. Document Resource Limits
Clearly document the limits in your API documentation so integrators know what to expect.
4. Use Static Analysis
Tools like Semgrep can flag functions that accept size parameters without validation. Create rules to catch patterns like:
rules:
- id: unbounded-size-parameter
patterns:
- pattern: $FUNC(..., $SIZE, ...)
- pattern-not: if ($SIZE > $LIMIT) { ... }
message: "Size parameter used without upper bound check"
Key Takeaways
- Never pass user-controlled size parameters to resource-intensive functions without validation—the
nLenparameter inPH7_TokenizePHP()should have been validated before reaching the lexer - Validate at the API boundary—the fix in
ProcessScript()catches oversized input beforePH7_CompileScript()is ever called - Make security limits configurable—the new
PH7_CONFIG_MAX_INPUToption allows different deployments to set appropriate limits - Linear-time algorithms become DoS vectors with unbounded input—even O(n) complexity is dangerous when n can be 4 billion
- Test with adversarial input sizes—the regression test explicitly checks behavior with 1MB input and verifies bounded execution time
How Orbis AppSec Detected This
- Source: Caller-supplied
nLenparameter passed toph7_compile()andph7_compile_v2()public API functions - Sink:
PH7_TokenizePHP()function insrc/ph7/lex.c:1100which iterates over input and allocates tokens proportional to input size - Missing control: No upper bound validation on input size before lexer processing
- CWE: CWE-400 (Uncontrolled Resource Consumption)
- Fix: Added input size validation in
ProcessScript()that compares input length againstPH7_MAX_INPUT_SIZE(or configured limit) and generates a compile error for oversized input
Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.
Conclusion
This vulnerability demonstrates why input validation must happen at the earliest possible point in your processing pipeline. The PH7 lexer's acceptance of unbounded input sizes created a denial-of-service risk that could exhaust system resources with a single malicious compilation request.
The fix is straightforward but effective: validate input size before any expensive operations begin, provide sensible defaults, and allow configuration for different deployment scenarios. By catching oversized input in ProcessScript() before it reaches PH7_CompileScript(), the system now fails fast with a clear error message instead of slowly consuming all available resources.
When building systems that process untrusted input—especially compilers, parsers, and interpreters—always ask: "What happens if someone sends me 4GB of data?" If the answer involves proportional resource consumption, you need bounds checking.