Shell Script JSON Injection: When `printf` Becomes a Security Risk

Vulnerability ID: V-004 | Severity: Medium | File: scripts/openai_compat_report.sh

Introduction

Shell scripts are the duct tape of the software world — quick, powerful, and everywhere. They glue together pipelines, automate deployments, and call APIs. But that convenience comes with a hidden cost: shell scripts have no native understanding of structured data formats like JSON. When developers reach for printf or string interpolation to build JSON payloads, they're essentially constructing a structured document with a blunt instrument.

This is exactly what happened in scripts/openai_compat_report.sh. The script was building JSON payloads destined for an API using shell variable interpolation — and doing so without any escaping or sanitization. The result? A classic JSON injection vulnerability hiding in plain sight inside a shell script.

If you've ever written something like this in a shell script:

payload="{\"message\": \"$user_input\"}"

...then this post is for you.

The Vulnerability Explained

What Is JSON Injection in a Shell Script?

JSON injection occurs when user-controlled or externally-sourced data is embedded into a JSON structure without proper escaping. In compiled languages, developers often use JSON serialization libraries that handle escaping automatically. In shell scripts, there is no such safety net.

The vulnerable code in openai_compat_report.sh was constructing JSON payloads using printf and shell variable interpolation, then writing those payloads to output files used directly as API request bodies. Something conceptually like this:

# ⚠️ VULNERABLE pattern — do not use
printf '{"model": "%s", "prompt": "%s"}' "$MODEL" "$USER_CONTENT" > payload.json
curl -X POST https://api.example.com/v1/completions \
  -d @payload.json

At first glance, this looks harmless. But consider what happens when $USER_CONTENT contains any of the following JSON special characters:

Character	JSON Meaning	Effect if Unescaped
`"`	String delimiter	Terminates the string early
`\`	Escape character	Corrupts the following character
`\n`	Newline	Breaks JSON string literals
Control chars (`\x00`-`\x1f`)	Reserved	Produces invalid JSON

How Could It Be Exploited?

Let's walk through a concrete attack scenario.

Imagine $USER_CONTENT is populated from a file, an environment variable, or any external source that an attacker can influence. The attacker supplies the following string:

Hello", "role": "system", "injected": "true

The resulting JSON becomes:

{
  "model": "gpt-4",
  "prompt": "Hello", "role": "system", "injected": "true"
}

This is no longer the intended payload. The attacker has:

Closed the prompt string early with the " character
Injected new JSON fields (role, injected) into the request
Potentially altered the API request semantics — for example, escalating a user message to a system-level instruction in an LLM API call

Even if the attacker can't fully control the outcome, they can reliably break the JSON structure, causing API errors, unexpected behavior, or information leakage through error messages.

What's the Real-World Impact?

The impact varies depending on what the API does with the payload, but the risks include:

Semantic injection: Altering the meaning of an API request (e.g., changing a user role to a system role in an LLM API call)
Denial of service: Consistently malforming JSON to cause API failures and disrupt dependent workflows
Data exfiltration: In some API designs, injected fields can trigger unexpected response content
Bypassing application logic: If the API response influences downstream decisions, injected fields could manipulate those decisions

In the context of an OpenAI-compatible API report script, the most concerning scenario is prompt injection — where an attacker-controlled value modifies the system prompt or model parameters, potentially causing the LLM to behave in unintended ways.

The Fix

What Changed?

The fix addresses the root cause: shell variable interpolation into JSON strings without escaping. The solution involves sanitizing or escaping all variables before they are interpolated into JSON payloads.

The safest approach — and the one recommended here — is to use a dedicated JSON-aware tool like jq to construct payloads, rather than building them manually with printf.

Before (Vulnerable):

# ⚠️ No escaping — special characters in variables break JSON
generate_payload() {
  local model="$1"
  local content="$2"
  printf '{"model": "%s", "messages": [{"role": "user", "content": "%s"}]}' \
    "$model" "$content" > "$OUTPUT_FILE"
}

After (Fixed):

# ✅ Use jq to safely construct JSON with proper escaping
generate_payload() {
  local model="$1"
  local content="$2"
  jq -n \
    --arg model "$model" \
    --arg content "$content" \
    '{
      "model": $model,
      "messages": [{"role": "user", "content": $content}]
    }' > "$OUTPUT_FILE"
}

Why Does `jq` Solve the Problem?

When you pass a shell variable to jq using --arg, jq treats the value as a raw string and handles all necessary JSON escaping internally. It will:

Escape double quotes (" → \")
Escape backslashes (\ → \\)
Escape newlines and other control characters
Produce syntactically valid JSON regardless of input content

This completely eliminates the injection surface. No matter what characters an attacker puts into the input variables, jq will encode them safely.

Alternative: Escaping with `printf` (If `jq` Is Unavailable)

If jq is not available in your environment, you can write a sanitization function. However, this approach is error-prone and not recommended for production use:

# ⚠️ Manual escaping — fragile, use jq instead
json_escape() {
  local input="$1"
  # Escape backslashes first, then quotes, then control characters
  printf '%s' "$input" \
    | sed 's/\\/\\\\/g' \
    | sed 's/"/\\"/g' \
    | sed ':a;N;$!ba;s/\n/\\n/g'
}

content=$(json_escape "$USER_CONTENT")
printf '{"model": "%s", "content": "%s"}' "$MODEL" "$content"

Even this approach can miss edge cases (null bytes, other control characters). Always prefer jq or a language with native JSON support.

Prevention & Best Practices

1. Never Construct JSON Manually in Shell Scripts

This is the cardinal rule. Shell scripts lack the type awareness and escaping libraries that make JSON construction safe in other languages. If you need to build JSON in a shell script, always use jq.

# ✅ Safe JSON construction with jq
jq -n --arg key "value with \"quotes\" and \\ backslashes" '{"key": $key}'

2. Validate and Sanitize All External Input

Before any external data touches a JSON payload, validate it:

# Validate that a value is a safe model name (alphanumeric + hyphens only)
validate_model_name() {
  local model="$1"
  if [[ ! "$model" =~ ^[a-zA-Z0-9_-]+$ ]]; then
    echo "ERROR: Invalid model name: $model" >&2
    exit 1
  fi
}

3. Add Resource Limits for File-Based Inputs

As noted in the vulnerability description, the script also lacked size checks when loading files. Always enforce limits:

# Check file size before loading
MAX_FILE_SIZE=$((1 * 1024 * 1024))  # 1MB limit
check_file_size() {
  local file="$1"
  local size
  size=$(stat -c%s "$file" 2>/dev/null || stat -f%z "$file")
  if [[ "$size" -gt "$MAX_FILE_SIZE" ]]; then
    echo "ERROR: File too large ($size bytes). Maximum allowed: $MAX_FILE_SIZE bytes." >&2
    exit 1
  fi
}

4. Validate JSON Structure After Construction

Even after using jq, verify the output is valid JSON before sending it:

# Validate JSON before using it
validate_json() {
  local file="$1"
  if ! jq empty "$file" 2>/dev/null; then
    echo "ERROR: Generated payload is not valid JSON" >&2
    exit 1
  fi
}

5. Use Principle of Least Privilege for API Calls

Ensure the API keys and tokens used in scripts have the minimum required permissions. If an injection does occur, limited permissions reduce the blast radius.

6. Static Analysis and Linting

Use tools to catch these issues before they reach production:

ShellCheck — Static analysis for shell scripts. Won't catch JSON injection directly, but flags many common scripting mistakes.
Semgrep — Can be configured with custom rules to detect unsafe JSON construction patterns in shell scripts.
Manual code review — Look for any printf, echo, or string concatenation that produces JSON with unescaped variables.

7. Relevant Security Standards

This vulnerability maps to several well-known security standards:

CWE-74: Improper Neutralization of Special Elements in Output Used by a Downstream Component (Injection)
CWE-116: Improper Encoding or Escaping of Output
OWASP: Injection — The classic injection category applies to JSON just as it does to SQL or HTML
CWE-400: Uncontrolled Resource Consumption — For the resource exhaustion aspect of this vulnerability

A Note on Resource Exhaustion

The vulnerability description also flags a related issue: the script loads entire files into memory without size checks and parses JSON without depth limits. This is a separate but equally important concern.

A maliciously crafted import file with:
- Millions of entries → Memory exhaustion
- Deeply nested JSON (e.g., {"a":{"a":{"a":...}}} repeated thousands of times) → Stack overflow or extreme parse time

These are denial-of-service vectors that can take down the process or the host running the script. The fix should include:

# Limit JSON depth when parsing with jq
jq --argjson max_depth 10 'if (. | path(..) | length) > $max_depth then error("JSON too deeply nested") else . end' input.json

# Or simply enforce a file size limit before parsing
[[ $(wc -c < "$INPUT_FILE") -gt 1048576 ]] && { echo "File too large"; exit 1; }

Conclusion

The vulnerability in openai_compat_report.sh is a great reminder that injection vulnerabilities aren't limited to web applications. Shell scripts that construct structured data formats like JSON are just as susceptible — and often less scrutinized.

The key takeaways from this fix:

Never build JSON with printf and raw variable interpolation in shell scripts
Always use jq --arg to safely pass shell variables into JSON structures
Enforce file size and depth limits before parsing any externally-supplied data
Validate inputs before they enter any structured format
Run static analysis (ShellCheck, Semgrep) on all shell scripts, especially those that handle external data or make API calls

Security vulnerabilities in shell scripts are easy to overlook precisely because shell scripts feel "simple." But simplicity doesn't equal safety. Every time your script touches external data and produces output consumed by another system, you have an injection surface that deserves careful attention.

Secure coding isn't just for application developers — it's for everyone who writes code, including the humble shell script.

This vulnerability was identified and fixed as part of an automated security scanning process. Fix verified by scanner re-scan and LLM code review.

Found a security issue in your codebase? Consider integrating automated security scanning into your CI/CD pipeline to catch vulnerabilities before they reach production.

Shell Script JSON Injection: When printf Becomes a Security Risk

Shell Script JSON Injection: When `printf` Becomes a Security Risk

Introduction

The Vulnerability Explained

What Is JSON Injection in a Shell Script?

How Could It Be Exploited?

What's the Real-World Impact?

The Fix

What Changed?

Why Does `jq` Solve the Problem?

Alternative: Escaping with `printf` (If `jq` Is Unavailable)

Prevention & Best Practices

1. Never Construct JSON Manually in Shell Scripts

2. Validate and Sanitize All External Input

3. Add Resource Limits for File-Based Inputs

4. Validate JSON Structure After Construction

5. Use Principle of Least Privilege for API Calls

6. Static Analysis and Linting

7. Relevant Security Standards

A Note on Resource Exhaustion

Conclusion

View the Security Fix

Related Articles

Fixing NULL Pointer Dereference in eMMC Memory Allocation

Silent Data Destruction: The Hidden Danger in Upload Price Tier Logic

Slidev Resolver Vulnerability: When Themes Become Trojan Horses

Shell Script JSON Injection: When printf Becomes a Security Risk

Shell Script JSON Injection: When printf Becomes a Security Risk

Introduction

The Vulnerability Explained

What Is JSON Injection in a Shell Script?

How Could It Be Exploited?

What's the Real-World Impact?

The Fix

What Changed?

Why Does jq Solve the Problem?

Alternative: Escaping with printf (If jq Is Unavailable)

Prevention & Best Practices

1. Never Construct JSON Manually in Shell Scripts

2. Validate and Sanitize All External Input

3. Add Resource Limits for File-Based Inputs

4. Validate JSON Structure After Construction

5. Use Principle of Least Privilege for API Calls

6. Static Analysis and Linting

7. Relevant Security Standards

A Note on Resource Exhaustion

Conclusion

View the Security Fix

Related Articles

Fixing NULL Pointer Dereference in eMMC Memory Allocation

Silent Data Destruction: The Hidden Danger in Upload Price Tier Logic

Slidev Resolver Vulnerability: When Themes Become Trojan Horses

Shell Script JSON Injection: When `printf` Becomes a Security Risk

Why Does `jq` Solve the Problem?

Alternative: Escaping with `printf` (If `jq` Is Unavailable)