Back to Blog
high SEVERITY8 min read

SQL Injection in OceanBase Connector: How f-Strings Can Sink Your RAG Platform

A critical SQL injection vulnerability was discovered and patched in the OceanBase database connector used by a RAG (Retrieval-Augmented Generation) platform, where user-controlled filter expressions were directly embedded into SQL WHERE clauses using Python f-strings without any parameterization or validation. This flaw exposed the platform's entire knowledge base to complete compromise, including unauthorized data access, modification, and deletion. The fix replaces unsafe string interpolation

O
By orbisai0security
April 30, 2026

SQL Injection in OceanBase Connector: How f-Strings Can Sink Your RAG Platform

Severity: Critical | CVE Class: SQL Injection (CWE-89) | File: rag/utils/ob_conn.py


Introduction

There's a reason SQL injection has sat at the top of the OWASP Top 10 for over two decades. It's deceptively simple to introduce, devastatingly effective to exploit, and surprisingly easy to miss during code review — especially when it's hiding inside a modern Python f-string.

This post walks through a critical SQL injection vulnerability that was recently discovered and patched in the OceanBase database connector (ob_conn.py) of a Retrieval-Augmented Generation (RAG) platform. If you're building AI-powered applications backed by a knowledge base, this one should get your full attention.


What Is This Vulnerability?

At its core, this is a SQL injection vulnerability — specifically, one where user-controlled filter expressions were embedded directly into SQL WHERE clauses using Python f-strings. No parameterization. No allowlist validation. No escaping. Just raw user input flowing straight into a live database query.

The vulnerability was identified at multiple locations across ob_conn.py (lines 726, 777, 781, 787, 793, 821, and 827), meaning it wasn't an isolated mistake — it was a systemic pattern repeated throughout the file.

Why Should Developers Care?

If you're building a RAG application, your vector database or document store is your product. It contains the curated knowledge that powers your AI's responses. A SQL injection vulnerability here doesn't just leak a user table — it exposes your entire knowledge base to:

  • Unauthorized data exfiltration (an attacker reads everything)
  • Data manipulation (an attacker poisons your AI's knowledge)
  • Data destruction (an attacker deletes your knowledge base)
  • Privilege escalation (depending on database configuration)

This is why the security scanner flagged it as the most critical vulnerability in the codebase — it directly targets the platform's core business asset.


The Vulnerability Explained

Technical Details

The vulnerable pattern looks something like this:

# VULNERABLE CODE — Do not use this pattern
def search_documents(self, filter_expression: str, limit: int = 10):
    # filter_expression comes from user input or API response
    query = f"""
        SELECT * FROM documents
        WHERE {filter_expression}
        LIMIT {limit}
    """
    cursor.execute(query)
    return cursor.fetchall()

On the surface, this looks clean and Pythonic. The f-string is concise, readable, and feels natural to write. But here's the problem: filter_expression is a string that an attacker controls.

When you embed user input directly into a SQL statement, you're trusting the user to behave like a developer. They won't.

How Could It Be Exploited?

Consider what happens when an attacker passes this as a filter_expression:

1=1 UNION SELECT username, password, NULL FROM admin_users --

The resulting query becomes:

SELECT * FROM documents
WHERE 1=1 UNION SELECT username, password, NULL FROM admin_users --
LIMIT 10

The -- comments out the rest of the query. The UNION SELECT appends results from the admin_users table. The attacker now has credentials.

Or consider a destructive payload:

1=1; DROP TABLE documents; --

Which becomes:

SELECT * FROM documents
WHERE 1=1; DROP TABLE documents; --
LIMIT 10

Your entire knowledge base: gone.

Real-World Attack Scenario

Here's how an attack might unfold in practice:

  1. Reconnaissance: An attacker discovers the RAG platform accepts filter parameters through its API (e.g., a document search endpoint).

  2. Injection Testing: They submit a simple test payload like ' OR '1'='1 and observe whether the response changes — it does.

  3. Data Exfiltration: Using a tool like sqlmap or manual UNION-based injection, they enumerate table names, then systematically dump the entire knowledge base.

  4. Persistence or Destruction: Depending on their goal, they either exfiltrate the data quietly over time, or — if the database user has write permissions — they modify or delete records to corrupt the AI's outputs.

  5. The AI Becomes a Weapon: If the attacker can write to the knowledge base, they can inject malicious content that the RAG system will confidently serve to end users. This is a prompt injection via database poisoning — a particularly nasty second-order attack.

The Scope of the Problem

What makes this especially concerning is that the pattern appeared seven times across the same file. This suggests the vulnerability was introduced as a design pattern rather than a one-off oversight, meaning every query builder function that accepted filter expressions was affected.


The Fix

What Changed?

The fix replaces unsafe f-string interpolation with parameterized queries (also called prepared statements). This is the gold-standard defense against SQL injection.

Here's the before/after comparison:

Before (Vulnerable)

# BEFORE: Direct f-string interpolation — UNSAFE
def get_chunks_by_filter(self, filter_expr: str):
    query = f"SELECT * FROM chunks WHERE {filter_expr}"
    self.cursor.execute(query)
    return self.cursor.fetchall()

def delete_by_filter(self, filter_expr: str):
    query = f"DELETE FROM documents WHERE {filter_expr}"
    self.cursor.execute(query)
    self.conn.commit()

After (Fixed)

# AFTER: Parameterized queries with allowlist validation — SAFE
ALLOWED_FILTER_FIELDS = {"doc_id", "kb_id", "status", "created_at"}

def get_chunks_by_filter(self, doc_id: str, kb_id: str):
    # Use placeholders — the database driver handles escaping
    query = "SELECT * FROM chunks WHERE doc_id = %s AND kb_id = %s"
    self.cursor.execute(query, (doc_id, kb_id))
    return self.cursor.fetchall()

def delete_by_filter(self, doc_id: str):
    # Parameters are passed separately, never interpolated into the string
    query = "DELETE FROM documents WHERE doc_id = %s"
    self.cursor.execute(query, (doc_id,))
    self.conn.commit()

How Does This Solve the Problem?

The key insight is separation of code and data.

With f-strings, the user's input becomes part of the SQL statement itself. The database parser sees it as SQL code and executes it accordingly.

With parameterized queries, the SQL statement is sent to the database first as a template with placeholders (%s or ?). The user's values are sent separately as data. The database engine knows these are values, not code — no matter what characters they contain.

An attacker passing ' OR '1'='1 as a parameter value would cause the database to literally search for a doc_id equal to the string ' OR '1'='1 — which won't match anything. The injection attempt fails harmlessly.

Additional Defense: Allowlist Validation

For cases where dynamic field names are genuinely needed (e.g., sorting by a user-selected column), the fix also applies allowlist validation:

# Allowlist approach for dynamic field names (column names can't be parameterized)
SORTABLE_FIELDS = {"created_at", "updated_at", "doc_name", "relevance_score"}

def get_documents_sorted(self, sort_field: str, limit: int):
    if sort_field not in SORTABLE_FIELDS:
        raise ValueError(f"Invalid sort field: {sort_field}")

    # Only safe, pre-approved field names reach the query
    query = f"SELECT * FROM documents ORDER BY {sort_field} LIMIT %s"
    self.cursor.execute(query, (limit,))
    return self.cursor.fetchall()

Note that column/table names cannot be parameterized in most database drivers — this is where allowlists become essential.


Prevention & Best Practices

1. Always Use Parameterized Queries

This is non-negotiable. Every major database driver supports them:

# MySQL / OceanBase (mysql-connector-python)
cursor.execute("SELECT * FROM t WHERE id = %s", (user_id,))

# SQLite
cursor.execute("SELECT * FROM t WHERE id = ?", (user_id,))

# PostgreSQL (psycopg2)
cursor.execute("SELECT * FROM t WHERE id = %s", (user_id,))

# SQLAlchemy ORM (parameterization is automatic)
session.query(Document).filter(Document.id == user_id).all()

2. Use an ORM

Object-Relational Mappers like SQLAlchemy, Django ORM, or Tortoise ORM handle parameterization automatically. They're not a silver bullet, but they dramatically reduce the surface area for injection:

# SQLAlchemy — safe by default
documents = session.query(Document).filter(
    Document.kb_id == kb_id,
    Document.status == "active"
).all()

3. Apply the Principle of Least Privilege

Your database user should only have the permissions it needs:

  • A read-only search function → use a read-only DB user
  • Never use root or admin DB credentials in application code
  • Separate DB users for read vs. read/write operations

4. Input Validation (Defense in Depth)

Validate inputs before they reach the database layer:

import re

def validate_kb_id(kb_id: str) -> bool:
    # Only allow alphanumeric IDs and hyphens
    return bool(re.match(r'^[a-zA-Z0-9\-]{1,64}$', kb_id))

5. Enable SQL Query Logging in Development

Seeing the actual queries your application generates is invaluable:

# SQLAlchemy query logging
import logging
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)

6. Use Static Analysis Tools

Integrate security scanning into your CI/CD pipeline:

  • Bandit — Python-specific security linter, flags SQL injection patterns
  • Semgrep — Highly customizable static analysis with SQL injection rules
  • SonarQube — Enterprise-grade SAST with injection detection
  • CodeQL — GitHub's semantic code analysis engine
# Run Bandit on your codebase
pip install bandit
bandit -r rag/ -t B608  # B608 = SQL injection

7. Security Code Review Checklist

When reviewing database-related code, ask:

  • [ ] Are all user inputs passed as parameters, not interpolated?
  • [ ] Are dynamic identifiers (column/table names) validated against an allowlist?
  • [ ] Does the DB user have only the minimum required permissions?
  • [ ] Are error messages suppressed in production (to avoid leaking schema info)?
  • [ ] Is there logging/alerting for unusual query patterns?

Relevant Standards & References

Standard Reference
OWASP Top 10 A03:2021 – Injection
CWE CWE-89: SQL Injection
OWASP Cheat Sheet SQL Injection Prevention
NIST SP 800-53: SI-10 Information Input Validation

Conclusion

This vulnerability is a textbook reminder that modern syntax doesn't mean safe syntax. Python f-strings are elegant and powerful — but when they carry user input into a SQL statement, they're just as dangerous as the string concatenation patterns we've been warning against for 20 years.

The fix here isn't complicated. Parameterized queries have been the right answer since the 1990s, and every modern database driver supports them. The challenge is building team habits and code review processes that catch these patterns before they reach production.

Key Takeaways

  • 🚨 Never interpolate user input into SQL strings — not with f-strings, not with .format(), not with + concatenation
  • Always use parameterized queries — they're the only reliable defense against SQL injection
  • 🔒 Apply least privilege — limit what damage a successful injection can do
  • 🔍 Scan your codebase — tools like Bandit and Semgrep can find these patterns automatically
  • 📋 Validate allowlists for any case where dynamic identifiers are unavoidable

In RAG applications especially, your knowledge base is your crown jewel. Protect the queries that touch it with the same rigor you'd apply to any other critical security boundary.


This vulnerability was identified and patched as part of an automated security review. The fix was verified by re-scan and LLM-assisted code review before merging.

Found a security issue in your codebase? Consider integrating automated security scanning into your CI/CD pipeline to catch vulnerabilities before they reach production.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #14470

Related Articles

low

SQL Injection via String Formatting: How Parameterized Queries Save the Day

A database query in DBeaver's Altibase extension was constructing SQL statements using `String.format()` with user-controlled input, creating a classic SQL injection vulnerability. The fix replaces the unsafe string interpolation with parameterized queries using `PreparedStatement`, ensuring user input is always treated as data rather than executable SQL. This type of vulnerability is deceptively simple to introduce but equally simple to fix once you know what to look for.

high

Shell Injection via Unsafe String Concatenation in gRPCurl Command Generation

A high-severity vulnerability was discovered in PaddleOCR's deployment configuration where model download URLs were specified using unencrypted `http://`, exposing users to man-in-the-middle attacks that could allow an attacker to intercept and replace model files with malicious ones. The fix upgrades all model download URLs to use `https://`, ensuring encrypted transmission and integrity of the downloaded files. This change is a critical security baseline for any application that downloads bina

high

Locking Down Docker: Preventing Privilege Escalation in Container Services

A high-severity privilege escalation vulnerability was discovered in a Docker Compose configuration where the `nginx` service lacked the `no-new-privileges` security option and was running with a writable root filesystem. These misconfigurations could allow a compromised container process to gain elevated permissions or download and execute malicious payloads. The fix applies defense-in-depth by adding `no-new-privileges:true`, enforcing a read-only root filesystem, and redirecting writable path

high

Thread-Safe Tokenization: Fixing strtok() Reentrancy in Game Script Parsing

A high-severity vulnerability was discovered in `lvl_script_commands.c` where the use of the non-reentrant `strtok()` function during level script parsing created conditions for memory corruption and potential arbitrary code execution. The fix replaces all `strtok()` calls with the thread-safe `strtok_r()` variant, eliminating shared global state that could be exploited through maliciously crafted level files. This change is part of a broader effort to harden the game's script parsing pipeline a

high

Securing rpcbind: How Unauthenticated RPC Registration Exposes NFS Infrastructure

A high-severity vulnerability was discovered in an NFS utilities configuration where rpcbind (port 111) accepted RPC service registrations without any authentication, allowing any network-accessible attacker to register malicious services under legitimate RPC program numbers and redirect NFS clients. The fix adds critical security documentation and network isolation guidance, ensuring operators understand that rpcbind must be protected by host-level firewalling or Kubernetes network policies to

high

GPIO Bounds Checking: Fixing an Out-of-Bounds Access in py32ioexp Driver

A high-severity out-of-bounds access vulnerability was discovered and patched in the `py32ioexp` Linux GPIO expander driver. The `py32io_gpio_direction_input()` function failed to validate a user-supplied pin offset against the chip's declared GPIO count, opening the door to memory corruption via the GPIO character device interface. A two-line bounds check now closes the vulnerability cleanly and efficiently.