Back to Blog
medium SEVERITY8 min read

Shell Script JSON Injection: When printf Becomes a Security Risk

A medium-severity vulnerability was discovered and patched in `scripts/openai_compat_report.sh`, where shell-based JSON construction using `printf` and variable interpolation left API payloads open to injection attacks. Without proper escaping of special characters, attacker-controlled input could malform JSON or silently alter API request semantics. This post breaks down how the vulnerability works, how it was fixed, and what every developer should know about safe JSON construction in shell scr

O
By orbisai0security
May 9, 2026

Shell Script JSON Injection: When printf Becomes a Security Risk

Vulnerability ID: V-004 | Severity: Medium | File: scripts/openai_compat_report.sh


Introduction

Shell scripts are the duct tape of the software world — quick, powerful, and everywhere. They glue together pipelines, automate deployments, and call APIs. But that convenience comes with a hidden cost: shell scripts have no native understanding of structured data formats like JSON. When developers reach for printf or string interpolation to build JSON payloads, they're essentially constructing a structured document with a blunt instrument.

This is exactly what happened in scripts/openai_compat_report.sh. The script was building JSON payloads destined for an API using shell variable interpolation — and doing so without any escaping or sanitization. The result? A classic JSON injection vulnerability hiding in plain sight inside a shell script.

If you've ever written something like this in a shell script:

payload="{\"message\": \"$user_input\"}"

...then this post is for you.


The Vulnerability Explained

What Is JSON Injection in a Shell Script?

JSON injection occurs when user-controlled or externally-sourced data is embedded into a JSON structure without proper escaping. In compiled languages, developers often use JSON serialization libraries that handle escaping automatically. In shell scripts, there is no such safety net.

The vulnerable code in openai_compat_report.sh was constructing JSON payloads using printf and shell variable interpolation, then writing those payloads to output files used directly as API request bodies. Something conceptually like this:

# ⚠️ VULNERABLE pattern — do not use
printf '{"model": "%s", "prompt": "%s"}' "$MODEL" "$USER_CONTENT" > payload.json
curl -X POST https://api.example.com/v1/completions \
  -d @payload.json

At first glance, this looks harmless. But consider what happens when $USER_CONTENT contains any of the following JSON special characters:

Character JSON Meaning Effect if Unescaped
" String delimiter Terminates the string early
\ Escape character Corrupts the following character
\n Newline Breaks JSON string literals
Control chars (\x00-\x1f) Reserved Produces invalid JSON

How Could It Be Exploited?

Let's walk through a concrete attack scenario.

Imagine $USER_CONTENT is populated from a file, an environment variable, or any external source that an attacker can influence. The attacker supplies the following string:

Hello", "role": "system", "injected": "true

The resulting JSON becomes:

{
  "model": "gpt-4",
  "prompt": "Hello", "role": "system", "injected": "true"
}

This is no longer the intended payload. The attacker has:

  1. Closed the prompt string early with the " character
  2. Injected new JSON fields (role, injected) into the request
  3. Potentially altered the API request semantics — for example, escalating a user message to a system-level instruction in an LLM API call

Even if the attacker can't fully control the outcome, they can reliably break the JSON structure, causing API errors, unexpected behavior, or information leakage through error messages.

What's the Real-World Impact?

The impact varies depending on what the API does with the payload, but the risks include:

  • Semantic injection: Altering the meaning of an API request (e.g., changing a user role to a system role in an LLM API call)
  • Denial of service: Consistently malforming JSON to cause API failures and disrupt dependent workflows
  • Data exfiltration: In some API designs, injected fields can trigger unexpected response content
  • Bypassing application logic: If the API response influences downstream decisions, injected fields could manipulate those decisions

In the context of an OpenAI-compatible API report script, the most concerning scenario is prompt injection — where an attacker-controlled value modifies the system prompt or model parameters, potentially causing the LLM to behave in unintended ways.


The Fix

What Changed?

The fix addresses the root cause: shell variable interpolation into JSON strings without escaping. The solution involves sanitizing or escaping all variables before they are interpolated into JSON payloads.

The safest approach — and the one recommended here — is to use a dedicated JSON-aware tool like jq to construct payloads, rather than building them manually with printf.

Before (Vulnerable):

# ⚠️ No escaping — special characters in variables break JSON
generate_payload() {
  local model="$1"
  local content="$2"
  printf '{"model": "%s", "messages": [{"role": "user", "content": "%s"}]}' \
    "$model" "$content" > "$OUTPUT_FILE"
}

After (Fixed):

# ✅ Use jq to safely construct JSON with proper escaping
generate_payload() {
  local model="$1"
  local content="$2"
  jq -n \
    --arg model "$model" \
    --arg content "$content" \
    '{
      "model": $model,
      "messages": [{"role": "user", "content": $content}]
    }' > "$OUTPUT_FILE"
}

Why Does jq Solve the Problem?

When you pass a shell variable to jq using --arg, jq treats the value as a raw string and handles all necessary JSON escaping internally. It will:

  • Escape double quotes ("\")
  • Escape backslashes (\\\)
  • Escape newlines and other control characters
  • Produce syntactically valid JSON regardless of input content

This completely eliminates the injection surface. No matter what characters an attacker puts into the input variables, jq will encode them safely.

Alternative: Escaping with printf (If jq Is Unavailable)

If jq is not available in your environment, you can write a sanitization function. However, this approach is error-prone and not recommended for production use:

# ⚠️ Manual escaping — fragile, use jq instead
json_escape() {
  local input="$1"
  # Escape backslashes first, then quotes, then control characters
  printf '%s' "$input" \
    | sed 's/\\/\\\\/g' \
    | sed 's/"/\\"/g' \
    | sed ':a;N;$!ba;s/\n/\\n/g'
}

content=$(json_escape "$USER_CONTENT")
printf '{"model": "%s", "content": "%s"}' "$MODEL" "$content"

Even this approach can miss edge cases (null bytes, other control characters). Always prefer jq or a language with native JSON support.


Prevention & Best Practices

1. Never Construct JSON Manually in Shell Scripts

This is the cardinal rule. Shell scripts lack the type awareness and escaping libraries that make JSON construction safe in other languages. If you need to build JSON in a shell script, always use jq.

# ✅ Safe JSON construction with jq
jq -n --arg key "value with \"quotes\" and \\ backslashes" '{"key": $key}'

2. Validate and Sanitize All External Input

Before any external data touches a JSON payload, validate it:

# Validate that a value is a safe model name (alphanumeric + hyphens only)
validate_model_name() {
  local model="$1"
  if [[ ! "$model" =~ ^[a-zA-Z0-9_-]+$ ]]; then
    echo "ERROR: Invalid model name: $model" >&2
    exit 1
  fi
}

3. Add Resource Limits for File-Based Inputs

As noted in the vulnerability description, the script also lacked size checks when loading files. Always enforce limits:

# Check file size before loading
MAX_FILE_SIZE=$((1 * 1024 * 1024))  # 1MB limit
check_file_size() {
  local file="$1"
  local size
  size=$(stat -c%s "$file" 2>/dev/null || stat -f%z "$file")
  if [[ "$size" -gt "$MAX_FILE_SIZE" ]]; then
    echo "ERROR: File too large ($size bytes). Maximum allowed: $MAX_FILE_SIZE bytes." >&2
    exit 1
  fi
}

4. Validate JSON Structure After Construction

Even after using jq, verify the output is valid JSON before sending it:

# Validate JSON before using it
validate_json() {
  local file="$1"
  if ! jq empty "$file" 2>/dev/null; then
    echo "ERROR: Generated payload is not valid JSON" >&2
    exit 1
  fi
}

5. Use Principle of Least Privilege for API Calls

Ensure the API keys and tokens used in scripts have the minimum required permissions. If an injection does occur, limited permissions reduce the blast radius.

6. Static Analysis and Linting

Use tools to catch these issues before they reach production:

  • ShellCheck — Static analysis for shell scripts. Won't catch JSON injection directly, but flags many common scripting mistakes.
  • Semgrep — Can be configured with custom rules to detect unsafe JSON construction patterns in shell scripts.
  • Manual code review — Look for any printf, echo, or string concatenation that produces JSON with unescaped variables.

7. Relevant Security Standards

This vulnerability maps to several well-known security standards:


A Note on Resource Exhaustion

The vulnerability description also flags a related issue: the script loads entire files into memory without size checks and parses JSON without depth limits. This is a separate but equally important concern.

A maliciously crafted import file with:
- Millions of entries → Memory exhaustion
- Deeply nested JSON (e.g., {"a":{"a":{"a":...}}} repeated thousands of times) → Stack overflow or extreme parse time

These are denial-of-service vectors that can take down the process or the host running the script. The fix should include:

# Limit JSON depth when parsing with jq
jq --argjson max_depth 10 'if (. | path(..) | length) > $max_depth then error("JSON too deeply nested") else . end' input.json

# Or simply enforce a file size limit before parsing
[[ $(wc -c < "$INPUT_FILE") -gt 1048576 ]] && { echo "File too large"; exit 1; }

Conclusion

The vulnerability in openai_compat_report.sh is a great reminder that injection vulnerabilities aren't limited to web applications. Shell scripts that construct structured data formats like JSON are just as susceptible — and often less scrutinized.

The key takeaways from this fix:

  1. Never build JSON with printf and raw variable interpolation in shell scripts
  2. Always use jq --arg to safely pass shell variables into JSON structures
  3. Enforce file size and depth limits before parsing any externally-supplied data
  4. Validate inputs before they enter any structured format
  5. Run static analysis (ShellCheck, Semgrep) on all shell scripts, especially those that handle external data or make API calls

Security vulnerabilities in shell scripts are easy to overlook precisely because shell scripts feel "simple." But simplicity doesn't equal safety. Every time your script touches external data and produces output consumed by another system, you have an injection surface that deserves careful attention.

Secure coding isn't just for application developers — it's for everyone who writes code, including the humble shell script.


This vulnerability was identified and fixed as part of an automated security scanning process. Fix verified by scanner re-scan and LLM code review.

Found a security issue in your codebase? Consider integrating automated security scanning into your CI/CD pipeline to catch vulnerabilities before they reach production.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #1055

Related Articles

medium

Mass Assignment Vulnerability: Why Your Rails Models Need attr_accessible

A medium-severity mass assignment vulnerability was identified in a Ruby on Rails model that lacked proper attribute whitelisting via `attr_accessible` or strong parameters. Without this protection, attackers can manipulate any model attribute through crafted HTTP requests, potentially escalating privileges or corrupting data. The fix enforces explicit attribute allowlisting, closing the door on unauthorized mass assignment exploitation.

medium

Integer Overflow in Shared Memory Bounds Check: How a Missing Cast Opened the Door to Arbitrary Memory Writes

A subtle but dangerous integer overflow vulnerability was discovered in `lib/rpmi_shmem.c`, where bounds checks on shared memory operations could be silently bypassed due to 32-bit arithmetic overflow. By carefully crafting `offset` and `len` values, an OS-level or hypervisor-level caller could direct firmware writes to arbitrary memory addresses — including interrupt vector tables and security-critical configuration structures. The fix was elegantly simple: casting operands to 64-bit before add

medium

Buffer Overflow in Freestanding Runtime: How Unsafe strcpy() Puts Bare-Metal Systems at Risk

A critical buffer overflow vulnerability was discovered in the freestanding runtime's custom string library, where `strcpy()` and `memcpy()` implementations lacked any bounds checking whatsoever. In a bare-metal or kernel-like environment with no OS-level memory protection, this flaw could allow an attacker to overwrite adjacent memory regions — including function pointers and security-critical state — with arbitrary data. The fix introduces a safe `strlcpy()` implementation that enforces destin

medium

Integer Overflow in Packet Reassembly: How One Missing Check Enables Heap Corruption

A critical heap buffer overflow vulnerability was discovered in the network packet reassembly function of `net_channel_ex.c`, where an attacker-controlled `bodylen` field could be used to corrupt heap memory without any bounds validation. The fix introduces a simple yet effective integer overflow check before accumulating packet body lengths, preventing malformed packets from triggering memory corruption. This type of vulnerability is a stark reminder that even low-level arithmetic operations in

medium

Buffer Overflow via Unsafe sprintf() in C Game Menu: How Shared Campaign Files Could Lead to Code Execution

A series of unbounded `sprintf()` calls in `src/mainmenu.c` created a realistic buffer overflow attack chain, allowing an attacker to craft a malicious campaign file that triggers arbitrary code execution when loaded by a victim. The fix replaces each unsafe `sprintf()` with `snprintf()`, enforcing strict buffer size limits and eliminating the overflow conditions. Because campaign files are routinely shared in game communities, this vulnerability required no special access and posed a significan

medium

HTTP Basic Auth Over Plain HTTP: How ESP32 Credentials Were Exposed on Your Wi-Fi

A medium-severity vulnerability in the ESP32-audioI2S library allowed audio streaming credentials to be transmitted via HTTP Basic Authentication over unencrypted HTTP connections, making them trivially recoverable by anyone on the same network. The fix gates the Authorization header behind an SSL/TLS check, ensuring credentials are only sent when the connection is encrypted. For embedded IoT devices where credentials are often hardcoded in firmware, this kind of passive interception risk is esp