Thread-Safe Tokenization: Fixing a Hidden strtok() Reentrancy Bug in Game Script Parsing
Introduction
At first glance, a call to strtok() looks harmless — it's a standard C library function taught in introductory programming courses. But lurking beneath its simple interface is a well-known trap: it uses global, shared state. In a game engine that parses user-supplied level scripts, this design flaw can become a serious security liability.
This post explores a high-severity vulnerability found in src/lvl_script_commands.c, where the use of strtok() in script command parsing created conditions for memory corruption. We'll walk through what went wrong, how it was fixed, and what every C developer should know about safe string tokenization.
The Vulnerability Explained
What is strtok() and Why Is It Dangerous?
strtok() is a C standard library function used to split strings into tokens based on a delimiter. Here's the catch: it maintains internal state using a static (global) pointer. This means:
- Only one tokenization operation can safely be "in progress" at any time.
- If a second call to
strtok()occurs — from a nested function, a signal handler, or another thread — it corrupts the state of the first operation. - In a complex parsing pipeline, this is almost impossible to reason about at a glance.
// Dangerous: strtok() stores state in a hidden global variable
char *flag = strtok(new_value, " ");
while (flag != NULL) {
// If anything inside here calls strtok() again (directly or indirectly),
// the outer loop's state is silently destroyed
flag = strtok(NULL, " ");
}
The Attack Surface: Malicious Level Files
This vulnerability lives inside set_power_configuration_check(), a function responsible for parsing configuration flags from level script commands. The game loads .lvl files supplied by users (e.g., custom maps or mods), and these files are parsed directly by this code.
An attacker who crafts a malicious level file can:
- Trigger reentrancy: Cause nested parsing calls that corrupt
strtok()'s internal pointer mid-loop. - Cause use-after-free or out-of-bounds reads: The corrupted pointer can point to already-freed or unintended memory regions.
- Chain with other vulnerabilities: When combined with the integer overflow in allocation (V-004) and a potential use-after-free (V-005) in the same codebase, this reentrancy bug becomes a stepping stone toward arbitrary code execution.
Real-World Impact
Consider this scenario:
- A player downloads a community-made level file from an untrusted source.
- The level file contains a specially crafted
SET_POWER_CONFIGURATIONcommand with a malformed flag string. - During parsing, a nested call to
strtok()corrupts the tokenizer's internal state. - The loop reads a garbage pointer, accessing memory it shouldn't.
- Combined with heap layout manipulation from V-004, this becomes a controlled write primitive.
The result? A game mod that executes attacker-supplied code on the player's machine — a classic drive-by code execution scenario through a trusted-looking game file.
The Fix
Replacing strtok() with strtok_r()
The fix is clean and surgical: every call to strtok() is replaced with strtok_r(), the reentrant, thread-safe variant. Instead of relying on hidden global state, strtok_r() stores its progress in a caller-supplied pointer (saveptr), making the state explicit and local.
Before (Vulnerable)
// BEFORE: Global state, not reentrant
char *flag = strtok(new_value, " ");
while (flag != NULL)
{
j = get_long_id(powermodel_castability_commands, flag);
if (j < 0)
{
DEALLOCATE_SCRIPT_VALUE
return;
}
flag = strtok(NULL, " "); // Relies on hidden global pointer
}
After (Fixed)
// AFTER: Explicit local state, fully reentrant
char *saveptr = NULL;
char *flag = strtok_r(new_value, " ", &saveptr);
while (flag != NULL)
{
j = get_long_id(powermodel_castability_commands, flag);
if (j < 0)
{
DEALLOCATE_SCRIPT_VALUE
return;
}
flag = strtok_r(NULL, " ", &saveptr); // Uses our local saveptr
}
Why This Works
| Property | strtok() |
strtok_r() |
|---|---|---|
| State storage | Global static variable | Caller-provided pointer |
| Reentrant | ❌ No | ✅ Yes |
| Thread-safe | ❌ No | ✅ Yes |
| Nested call safe | ❌ No | ✅ Yes |
| POSIX standard | ✅ Yes | ✅ Yes |
By making the tokenizer state explicit (saveptr), the code becomes immune to reentrancy corruption. Each parsing operation owns its own state, and no amount of nested calls or concurrent execution can interfere with it.
The fix was applied in two separate locations within set_power_configuration_check() — one for castability flags and one for properties flags — ensuring complete coverage of the vulnerable code paths.
Prevention & Best Practices
1. Ban strtok() in Security-Sensitive Code
The simplest rule: treat strtok() as deprecated in any code that handles untrusted input or operates in a multithreaded context. Add a linter rule or compiler warning to flag its use.
# Example: grep for strtok() usage excluding strtok_r()
grep -rn '\bstrtok\b' src/ | grep -v 'strtok_r'
2. Always Use Reentrant Alternatives
| Avoid | Use Instead |
|---|---|
strtok() |
strtok_r() (POSIX) or strtok_s() (C11) |
strerror() |
strerror_r() |
localtime() |
localtime_r() |
rand() |
rand_r() or platform CSPRNG |
3. Validate All Script/Config Input Before Parsing
Before tokenizing user-supplied strings, enforce constraints:
// Check length before parsing
if (strlen(new_value) > MAX_CONFIG_VALUE_LENGTH) {
SCRIPT_ERRORF("Configuration value too long");
return;
}
// Whitelist allowed characters
if (strspn(new_value, ALLOWED_FLAG_CHARS) != strlen(new_value)) {
SCRIPT_ERRORF("Invalid characters in configuration value");
return;
}
4. Treat Level/Mod Files as Untrusted Input
Game engines often treat local files as implicitly trusted. This is a dangerous assumption in an era of:
- Modding communities sharing files on third-party platforms
- Malicious mods distributed through compromised accounts
- Social engineering attacks targeting gamers
Apply the same rigor to file parsing that you would to network input.
5. Use Static Analysis Tools
Tools that can catch strtok() misuse and related issues:
- Clang Static Analyzer — detects use of non-reentrant functions
- Coverity — flags thread-safety issues
- cppcheck — general C/C++ static analysis
- Semgrep — custom rules for banning specific function calls
- AddressSanitizer (ASan) — runtime detection of memory corruption
6. Understand the CWE Landscape
This vulnerability relates to:
- CWE-190: Integer Overflow or Wraparound (referenced in the PR for the broader vulnerability class)
- CWE-364: Signal Handler Race Condition (same class of reentrancy issues)
- CWE-119: Improper Restriction of Operations within the Bounds of a Memory Buffer
- CWE-416: Use After Free (chained vulnerability V-005)
Understanding how these CWEs chain together is critical for threat modeling complex parsers.
7. Fuzz Your Parsers
Level script parsers are a prime target for fuzzing. Tools like AFL++ or libFuzzer can generate malformed input that triggers exactly the kind of edge cases that lead to reentrancy corruption:
# Example: fuzz the script parser with AFL++
afl-fuzz -i seed_scripts/ -o findings/ -- ./game --parse-script @@
Conclusion
The strtok() → strtok_r() migration might look like a minor code quality improvement, but in the context of a game engine parsing untrusted level files, it closes a real attack vector. The key lessons from this vulnerability are:
- Global state is a security risk — functions that hide state in static variables are inherently dangerous in complex, reentrant codebases.
- Parser code deserves adversarial scrutiny — any code that processes user-controlled files should be treated with the same care as a web application handling HTTP requests.
- Vulnerability chaining is real — this reentrancy bug alone might seem low-risk, but combined with integer overflows and use-after-free conditions in the same file, it becomes a pathway to code execution.
- The fix is simple; the discovery is hard — two-line changes like this one are easy to make once you know where to look. Automated security scanning and code review are essential to surface these issues before attackers do.
Secure coding in C isn't about avoiding the language — it's about knowing which functions carry hidden risks and choosing safer alternatives consistently. When in doubt, reach for the reentrant variant.
This vulnerability was identified and fixed by automated security scanning. Automated tools can catch subtle issues like non-reentrant function usage at scale — consider integrating security scanning into your CI/CD pipeline.