Shell Script JSON Injection: When printf Becomes a Security Risk
Vulnerability ID: V-004 | Severity: Medium | File:
scripts/openai_compat_report.sh
Introduction
Shell scripts are the duct tape of the software world — quick, powerful, and everywhere. They glue together pipelines, automate deployments, and call APIs. But that convenience comes with a hidden cost: shell scripts have no native understanding of structured data formats like JSON. When developers reach for printf or string interpolation to build JSON payloads, they're essentially constructing a structured document with a blunt instrument.
This is exactly what happened in scripts/openai_compat_report.sh. The script was building JSON payloads destined for an API using shell variable interpolation — and doing so without any escaping or sanitization. The result? A classic JSON injection vulnerability hiding in plain sight inside a shell script.
If you've ever written something like this in a shell script:
payload="{\"message\": \"$user_input\"}"
...then this post is for you.
The Vulnerability Explained
What Is JSON Injection in a Shell Script?
JSON injection occurs when user-controlled or externally-sourced data is embedded into a JSON structure without proper escaping. In compiled languages, developers often use JSON serialization libraries that handle escaping automatically. In shell scripts, there is no such safety net.
The vulnerable code in openai_compat_report.sh was constructing JSON payloads using printf and shell variable interpolation, then writing those payloads to output files used directly as API request bodies. Something conceptually like this:
# ⚠️ VULNERABLE pattern — do not use
printf '{"model": "%s", "prompt": "%s"}' "$MODEL" "$USER_CONTENT" > payload.json
curl -X POST https://api.example.com/v1/completions \
-d @payload.json
At first glance, this looks harmless. But consider what happens when $USER_CONTENT contains any of the following JSON special characters:
| Character | JSON Meaning | Effect if Unescaped |
|---|---|---|
" |
String delimiter | Terminates the string early |
\ |
Escape character | Corrupts the following character |
\n |
Newline | Breaks JSON string literals |
Control chars (\x00-\x1f) |
Reserved | Produces invalid JSON |
How Could It Be Exploited?
Let's walk through a concrete attack scenario.
Imagine $USER_CONTENT is populated from a file, an environment variable, or any external source that an attacker can influence. The attacker supplies the following string:
Hello", "role": "system", "injected": "true
The resulting JSON becomes:
{
"model": "gpt-4",
"prompt": "Hello", "role": "system", "injected": "true"
}
This is no longer the intended payload. The attacker has:
- Closed the
promptstring early with the"character - Injected new JSON fields (
role,injected) into the request - Potentially altered the API request semantics — for example, escalating a user message to a system-level instruction in an LLM API call
Even if the attacker can't fully control the outcome, they can reliably break the JSON structure, causing API errors, unexpected behavior, or information leakage through error messages.
What's the Real-World Impact?
The impact varies depending on what the API does with the payload, but the risks include:
- Semantic injection: Altering the meaning of an API request (e.g., changing a
userrole to asystemrole in an LLM API call) - Denial of service: Consistently malforming JSON to cause API failures and disrupt dependent workflows
- Data exfiltration: In some API designs, injected fields can trigger unexpected response content
- Bypassing application logic: If the API response influences downstream decisions, injected fields could manipulate those decisions
In the context of an OpenAI-compatible API report script, the most concerning scenario is prompt injection — where an attacker-controlled value modifies the system prompt or model parameters, potentially causing the LLM to behave in unintended ways.
The Fix
What Changed?
The fix addresses the root cause: shell variable interpolation into JSON strings without escaping. The solution involves sanitizing or escaping all variables before they are interpolated into JSON payloads.
The safest approach — and the one recommended here — is to use a dedicated JSON-aware tool like jq to construct payloads, rather than building them manually with printf.
Before (Vulnerable):
# ⚠️ No escaping — special characters in variables break JSON
generate_payload() {
local model="$1"
local content="$2"
printf '{"model": "%s", "messages": [{"role": "user", "content": "%s"}]}' \
"$model" "$content" > "$OUTPUT_FILE"
}
After (Fixed):
# ✅ Use jq to safely construct JSON with proper escaping
generate_payload() {
local model="$1"
local content="$2"
jq -n \
--arg model "$model" \
--arg content "$content" \
'{
"model": $model,
"messages": [{"role": "user", "content": $content}]
}' > "$OUTPUT_FILE"
}
Why Does jq Solve the Problem?
When you pass a shell variable to jq using --arg, jq treats the value as a raw string and handles all necessary JSON escaping internally. It will:
- Escape double quotes (
"→\") - Escape backslashes (
\→\\) - Escape newlines and other control characters
- Produce syntactically valid JSON regardless of input content
This completely eliminates the injection surface. No matter what characters an attacker puts into the input variables, jq will encode them safely.
Alternative: Escaping with printf (If jq Is Unavailable)
If jq is not available in your environment, you can write a sanitization function. However, this approach is error-prone and not recommended for production use:
# ⚠️ Manual escaping — fragile, use jq instead
json_escape() {
local input="$1"
# Escape backslashes first, then quotes, then control characters
printf '%s' "$input" \
| sed 's/\\/\\\\/g' \
| sed 's/"/\\"/g' \
| sed ':a;N;$!ba;s/\n/\\n/g'
}
content=$(json_escape "$USER_CONTENT")
printf '{"model": "%s", "content": "%s"}' "$MODEL" "$content"
Even this approach can miss edge cases (null bytes, other control characters). Always prefer jq or a language with native JSON support.
Prevention & Best Practices
1. Never Construct JSON Manually in Shell Scripts
This is the cardinal rule. Shell scripts lack the type awareness and escaping libraries that make JSON construction safe in other languages. If you need to build JSON in a shell script, always use jq.
# ✅ Safe JSON construction with jq
jq -n --arg key "value with \"quotes\" and \\ backslashes" '{"key": $key}'
2. Validate and Sanitize All External Input
Before any external data touches a JSON payload, validate it:
# Validate that a value is a safe model name (alphanumeric + hyphens only)
validate_model_name() {
local model="$1"
if [[ ! "$model" =~ ^[a-zA-Z0-9_-]+$ ]]; then
echo "ERROR: Invalid model name: $model" >&2
exit 1
fi
}
3. Add Resource Limits for File-Based Inputs
As noted in the vulnerability description, the script also lacked size checks when loading files. Always enforce limits:
# Check file size before loading
MAX_FILE_SIZE=$((1 * 1024 * 1024)) # 1MB limit
check_file_size() {
local file="$1"
local size
size=$(stat -c%s "$file" 2>/dev/null || stat -f%z "$file")
if [[ "$size" -gt "$MAX_FILE_SIZE" ]]; then
echo "ERROR: File too large ($size bytes). Maximum allowed: $MAX_FILE_SIZE bytes." >&2
exit 1
fi
}
4. Validate JSON Structure After Construction
Even after using jq, verify the output is valid JSON before sending it:
# Validate JSON before using it
validate_json() {
local file="$1"
if ! jq empty "$file" 2>/dev/null; then
echo "ERROR: Generated payload is not valid JSON" >&2
exit 1
fi
}
5. Use Principle of Least Privilege for API Calls
Ensure the API keys and tokens used in scripts have the minimum required permissions. If an injection does occur, limited permissions reduce the blast radius.
6. Static Analysis and Linting
Use tools to catch these issues before they reach production:
- ShellCheck — Static analysis for shell scripts. Won't catch JSON injection directly, but flags many common scripting mistakes.
- Semgrep — Can be configured with custom rules to detect unsafe JSON construction patterns in shell scripts.
- Manual code review — Look for any
printf,echo, or string concatenation that produces JSON with unescaped variables.
7. Relevant Security Standards
This vulnerability maps to several well-known security standards:
- CWE-74: Improper Neutralization of Special Elements in Output Used by a Downstream Component (Injection)
- CWE-116: Improper Encoding or Escaping of Output
- OWASP: Injection — The classic injection category applies to JSON just as it does to SQL or HTML
- CWE-400: Uncontrolled Resource Consumption — For the resource exhaustion aspect of this vulnerability
A Note on Resource Exhaustion
The vulnerability description also flags a related issue: the script loads entire files into memory without size checks and parses JSON without depth limits. This is a separate but equally important concern.
A maliciously crafted import file with:
- Millions of entries → Memory exhaustion
- Deeply nested JSON (e.g., {"a":{"a":{"a":...}}} repeated thousands of times) → Stack overflow or extreme parse time
These are denial-of-service vectors that can take down the process or the host running the script. The fix should include:
# Limit JSON depth when parsing with jq
jq --argjson max_depth 10 'if (. | path(..) | length) > $max_depth then error("JSON too deeply nested") else . end' input.json
# Or simply enforce a file size limit before parsing
[[ $(wc -c < "$INPUT_FILE") -gt 1048576 ]] && { echo "File too large"; exit 1; }
Conclusion
The vulnerability in openai_compat_report.sh is a great reminder that injection vulnerabilities aren't limited to web applications. Shell scripts that construct structured data formats like JSON are just as susceptible — and often less scrutinized.
The key takeaways from this fix:
- Never build JSON with
printfand raw variable interpolation in shell scripts - Always use
jq --argto safely pass shell variables into JSON structures - Enforce file size and depth limits before parsing any externally-supplied data
- Validate inputs before they enter any structured format
- Run static analysis (ShellCheck, Semgrep) on all shell scripts, especially those that handle external data or make API calls
Security vulnerabilities in shell scripts are easy to overlook precisely because shell scripts feel "simple." But simplicity doesn't equal safety. Every time your script touches external data and produces output consumed by another system, you have an injection surface that deserves careful attention.
Secure coding isn't just for application developers — it's for everyone who writes code, including the humble shell script.
This vulnerability was identified and fixed as part of an automated security scanning process. Fix verified by scanner re-scan and LLM code review.
Found a security issue in your codebase? Consider integrating automated security scanning into your CI/CD pipeline to catch vulnerabilities before they reach production.