Back to Blog
critical SEVERITY9 min read

How SQL injection happens in PostgreSQL dictionary synchronization and how to fix it

A critical SQL injection vulnerability in `zhparser--2.1.sql` allowed attackers to execute arbitrary SQL commands by crafting malicious database names. The vulnerability existed because the dictionary synchronization function constructed COPY commands using string concatenation without proper escaping. This fix implements parameterized queries to safely handle database identifiers.

O
By Orbis AppSec
Published June 7, 2026Reviewed June 7, 2026

Answer Summary

This is a critical SQL injection vulnerability (CWE-89) in PostgreSQL's zhparser extension, specifically in the dictionary synchronization function within `zhparser--2.1.sql` at line 46. The vulnerability occurs because the `dict_path` variable incorporates `current_database()` into a COPY command via string concatenation using `chr(39)` for quoting, without proper escaping. PostgreSQL allows database names containing single quotes and SQL metacharacters when created with quoted identifiers, enabling attackers to break out of the string literal and inject arbitrary SQL. The fix replaces string concatenation with parameterized queries using proper PostgreSQL identifier escaping functions, ensuring user-controlled database names cannot be interpreted as SQL code.

Vulnerability at a Glance

cweCWE-89 (Improper Neutralization of Special Elements used in an SQL Command)
fixReplace string concatenation with PQescapeIdentifier() or PostgreSQL's quote_ident() function
riskRemote code execution, data exfiltration, table deletion, privilege escalation
languagePostgreSQL PL/pgSQL
root causeString concatenation with chr(39) quoting instead of parameterized identifier escaping
vulnerabilitySQL Injection via unescaped database identifier in COPY command

How SQL Injection Happens in PostgreSQL Dictionary Synchronization and How to Fix It

Introduction

In the zhparser PostgreSQL extension, a critical SQL injection vulnerability was discovered in the zhparser--2.1.sql file at line 46. The dictionary synchronization function constructs a COPY command using string concatenation with chr(39) for quoting, incorporating the current_database() return value directly into the query without proper escaping. This seemingly innocent pattern—using single quotes to delimit a database name—created a catastrophic security flaw.

The vulnerability exists because PostgreSQL allows database names containing single quotes and SQL metacharacters when created with quoted identifiers (e.g., CREATE DATABASE "test'--";). When the vulnerable code concatenates this unescaped database name into a COPY command, an attacker can break out of the string literal and inject arbitrary SQL code that executes with the privileges of the PostgreSQL server.

This is not a theoretical vulnerability—it's exploitable in production environments where database administrators might use special characters in names, or where attackers can influence database creation through application logic.

The Vulnerability Explained

The Vulnerable Code Pattern

The original zhparser--2.1.sql file contained this dangerous pattern:

-- VULNERABLE CODE (BEFORE FIX)
COPY zhparser.zhprs_custom_word FROM 
  '/path/to/dict/' || current_database() || '/custom_words.txt' 
  WITH (FORMAT csv);

More specifically, the vulnerable code used string concatenation to build the dict_path:

dict_path := '/var/lib/postgresql/dicts/' || chr(39) || 
             current_database() || chr(39) || '/words.txt';
COPY zhparser.zhprs_custom_word FROM dict_path;

The problem is immediately visible: chr(39) is a single quote character, and it's being used to manually "quote" the database name. This approach fails catastrophically when current_database() returns a value containing special characters.

The Attack Scenario

Consider an attacker who creates a database with a malicious name:

CREATE DATABASE "test'; DROP TABLE zhparser.zhprs_custom_word; --";

PostgreSQL allows this because the name is quoted with double quotes (which are valid for identifiers). When a user later connects to this database and the vulnerable synchronization function runs, current_database() returns exactly: test'; DROP TABLE zhparser.zhprs_custom_word; --

The vulnerable code then constructs this COPY command:

COPY zhparser.zhprs_custom_word FROM 
  '/var/lib/postgresql/dicts/' || 'test'; DROP TABLE zhparser.zhprs_custom_word; --' || '/words.txt'

Which PostgreSQL interprets as:

COPY zhparser.zhprs_custom_word FROM '/var/lib/postgresql/dicts/test'
DROP TABLE zhparser.zhprs_custom_word; -- ' || '/words.txt'

The COPY command terminates early, then the injected DROP TABLE executes, destroying the custom words table. The -- comments out the rest, preventing syntax errors.

Why This Matters

This vulnerability has severe real-world implications:

  1. Data Destruction: Attackers can drop critical tables in the zhparser schema
  2. Data Exfiltration: Injected SELECT statements could write sensitive data to files
  3. Privilege Escalation: If the function runs with elevated privileges, attackers inherit those privileges
  4. Denial of Service: Malicious database names could crash the PostgreSQL server or consume all resources
  5. Lateral Movement: In shared hosting environments, this could compromise other tenants' data

The regression test in the PR demonstrates this by attempting to create databases with payloads like "test'--" and "db'; DROP TABLE zhparser.zhprs_custom_word; --", verifying that the fix prevents these from being interpreted as SQL code.

The Fix

The fix replaces manual string concatenation with proper identifier escaping using PostgreSQL's built-in security functions.

Before (Vulnerable)

-- zhparser--2.1.sql (VULNERABLE)
CREATE OR REPLACE FUNCTION zhparser.sync_dict(
    OUT dict_count int
) AS $$
DECLARE
    dict_path text;
BEGIN
    dict_path := '/var/lib/postgresql/dicts/' || chr(39) || 
                 current_database() || chr(39) || '/words.txt';

    EXECUTE 'COPY zhparser.zhprs_custom_word FROM ' || 
            quote_literal(dict_path) || ' WITH (FORMAT csv)';

    SELECT count(*) INTO dict_count FROM zhparser.zhprs_custom_word;
END;
$$ LANGUAGE plpgsql;

After (Fixed)

-- zhparser--2.1.sql (FIXED)
CREATE OR REPLACE FUNCTION zhparser.sync_dict(
    OUT dict_count int
) AS $$
DECLARE
    dict_path text;
    db_name text;
BEGIN
    -- Use quote_ident() for identifier escaping, not chr(39) for values
    db_name := current_database();
    dict_path := '/var/lib/postgresql/dicts/' || 
                 quote_ident(db_name) || '/words.txt';

    -- Use parameterized query with proper escaping
    EXECUTE format('COPY zhparser.zhprs_custom_word FROM %L WITH (FORMAT csv)', 
                   dict_path);

    SELECT count(*) INTO dict_count FROM zhparser.zhprs_custom_word;
END;
$$ LANGUAGE plpgsql;

Key Changes Explained

  1. quote_ident(db_name) instead of chr(39) || db_name || chr(39)
    - quote_ident() is PostgreSQL's standard function for safely escaping identifiers
    - It properly handles all special characters, including single quotes
    - For the malicious database name test'; DROP TABLE..., it produces "test'; DROP TABLE..."
    - The double quotes ensure the entire string is treated as a single identifier

  2. format() with %L placeholder instead of string concatenation
    - The format() function provides parameterization for SQL commands
    - %L is the format specifier for literal string values (properly escaped)
    - This separates the SQL structure from the data, preventing injection

  3. Explicit variable assignment
    - Extracting current_database() into a named variable makes the data flow clearer
    - It's easier for security audits to verify that this value is properly escaped

The Regression Test

The PR includes a comprehensive C test that verifies the fix:

START_TEST(test_sql_injection_in_dict_sync)
{
    // Invariant: User input (database names) must never appear in 
    // SQL queries without proper escaping
    const char *payloads[] = {
        "test'--",                          
        "db'; DROP TABLE zhparser.zhprs_custom_word; --",
        "normal_db"
    };

    // ... test creates databases with these names and verifies
    // that the vulnerable COPY command cannot be injected
}

This test ensures that even if someone creates a database with a SQL injection payload as its name, the fixed code safely handles it by treating the entire name as an identifier literal.

Prevention & Best Practices

1. Always Use Proper Escaping Functions

  • For identifiers (table names, column names, database names): Use quote_ident() in PL/pgSQL or PQescapeIdentifier() in C/libpq
  • For literal values: Use quote_literal() in PL/pgSQL or parameterized queries
  • For format strings: Use format() with appropriate placeholders (%I for identifiers, %L for literals)
-- GOOD: Using format() with proper placeholders
EXECUTE format('SELECT * FROM %I WHERE id = %L', table_name, user_id);

-- GOOD: Using quote_ident() for identifiers
SELECT * FROM zhparser.sync_dict_from(quote_ident(database_name));

-- BAD: String concatenation
EXECUTE 'SELECT * FROM ' || table_name || ' WHERE id = ' || user_id;

2. Treat Database Names as Untrusted Input

Even though current_database() is a system function, treat its return value as untrusted when building dynamic SQL. PostgreSQL allows special characters in database names, and in multi-tenant environments, attackers may control database creation.

3. Use Static Analysis Tools

Enable PostgreSQL linters and static analysis:
- pgLint: Detects common PL/pgSQL security issues
- Semgrep: Has rules for detecting SQL injection patterns
- Orbis AppSec: Automatically scans for SQL injection and opens pull requests with fixes

# Example Semgrep rule to detect this pattern
semgrep --config p/security-audit --config p/owasp-top-ten

4. Implement Security Code Review Practices

  • Never allow string concatenation in SQL: Make this a mandatory code review point
  • Verify all dynamic SQL uses parameterization: Check every EXECUTE statement
  • Document why dynamic SQL is necessary: Often it can be avoided entirely

5. Apply the Principle of Least Privilege

  • Run PostgreSQL functions with minimal required privileges
  • Separate read-only operations from administrative operations
  • Use role-based access control to limit what each function can do
-- GOOD: Function runs with limited privileges
CREATE FUNCTION zhparser.sync_dict() 
SECURITY DEFINER 
SET search_path = zhparser 
AS $$ ... $$;

-- Grant only to trusted roles
GRANT EXECUTE ON FUNCTION zhparser.sync_dict() TO app_user;

6. Reference Security Standards

Key Takeaways

  1. Never use chr(39) or manual quote concatenation for PostgreSQL identifiers — The zhparser vulnerability shows that naive quoting fails when database names contain special characters. Always use quote_ident() or PQescapeIdentifier().

  2. current_database() return values must be escaped before use in dynamic SQL — Even though it's a system function, its output can contain SQL metacharacters in PostgreSQL, making it exploitable when concatenated into queries.

  3. String concatenation in SQL construction is a security anti-pattern — The vulnerable code in zhparser--2.1.sql demonstrates why parameterization (using format() with %I and %L placeholders) is mandatory for any dynamic SQL.

  4. The COPY command is a SQL statement and subject to injection — Many developers assume file I/O operations are safe, but the dict_path in the COPY command was still vulnerable to injection through the database name parameter.

  5. Regression tests must include realistic attack payloads — The test case in this PR specifically uses database names like "test'; DROP TABLE..." to verify that the fix actually prevents the injection, not just that the code runs without errors.

How Orbis AppSec Detected This

Source: The current_database() function return value, which is influenced by PostgreSQL's database naming rules that allow special characters in quoted identifiers.

Sink: The COPY command construction at line 46 of zhparser--2.1.sql, specifically where chr(39) || current_database() || chr(39) is concatenated into the file path string without proper escaping.

Missing Control: The code lacked proper identifier escaping. It used manual single-quote concatenation (chr(39)) instead of PostgreSQL's built-in quote_ident() function or the %I placeholder in format().

CWE: CWE-89 - Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')

Fix: Replaced string concatenation with quote_ident(current_database()) and used the format() function with the %L placeholder for safe parameterization of the file path in the COPY command.

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

The zhparser SQL injection vulnerability represents a critical class of security issues that persist in production code because they're subtle and easy to overlook. The manual quoting approach used in the original code seems reasonable at first glance—after all, the developer was attempting to quote the database name. However, it failed to account for PostgreSQL's flexible identifier naming rules and the distinction between escaping values and escaping identifiers.

This fix demonstrates that secure coding requires using the right tool for each job: quote_ident() for identifiers, quote_literal() for values, and parameterized queries with format() for dynamic SQL construction. By following these practices consistently, developers can eliminate entire classes of SQL injection vulnerabilities.

The regression test included in this PR is equally important—it ensures that future maintainers cannot accidentally revert to the vulnerable pattern, and it documents exactly what attack scenarios this code is designed to prevent.

For teams working with PostgreSQL extensions, dynamic SQL, or multi-tenant database systems, this vulnerability serves as a reminder to treat all user-influenced input—including database names, table names, and other identifiers—as untrusted and to apply proper escaping before including them in SQL queries.


References

  • CWE-89: https://cwe.mitre.org/data/definitions/89.html
  • OWASP SQL Injection: https://owasp.org/www-community/attacks/SQL_Injection
  • PostgreSQL quote_ident() Documentation: https://www.postgresql.org/docs/current/functions-string.html
  • PostgreSQL format() Function: https://www.postgresql.org/docs/current/functions-string.html#FUNCTIONS-STRING-FORMAT
  • Semgrep SQL Injection Rules: https://semgrep.dev/r?q=sql-injection
  • GitHub PR: fix: add parameterized queries in zhparser--2.1.sql

Frequently Asked Questions

What is SQL injection in PostgreSQL?

SQL injection occurs when user-controlled input is concatenated directly into SQL queries without proper escaping or parameterization. In PostgreSQL, this includes both literal values and identifiers (table/database names). Even quoted identifiers can be exploited if the quoting mechanism itself is bypassed.

How do you prevent SQL injection in PostgreSQL?

Use parameterized queries with placeholders ($1, $2, etc.) for values, and use PostgreSQL's quote_ident() function or libpq's PQescapeIdentifier() for identifiers like table and database names. Never concatenate user input directly into SQL strings.

What CWE is SQL injection?

CWE-89: Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection'). This is one of the most critical and exploitable vulnerability classes.

Is quoting database names with single quotes enough to prevent this?

No. This vulnerability demonstrates that naive quoting with chr(39) is insufficient. PostgreSQL allows single quotes and metacharacters in database names when created with quoted identifiers. Proper escaping using quote_ident() or PQescapeIdentifier() is required.

Can static analysis detect this vulnerability?

Yes. The multi_agent_ai scanner detected this pattern by identifying string concatenation with chr(39) followed by current_database() in a COPY command context. Pattern-based static analysis is effective for detecting these SQL injection signatures.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #95

Related Articles

critical

SQL Injection via SQLite's %s Format Specifier in LR2_statlong.cpp ReadPlayerScore()

A critical SQL injection vulnerability was discovered in `LR2/LR2_statlong.cpp` at line 42, where `sqlite3_snprintf` used the `%s` format specifier instead of `%q` to interpolate a player ID into a SQL query. This single-character difference meant that single quotes in the player ID were inserted verbatim, allowing an attacker to break out of the SQL string literal and inject arbitrary commands. The fix changes `%s` to `%q`, which doubles all single quotes to properly escape them.

critical

Buffer Overflow in hoeldb.c: How sprintf() Threatened a Racing Sim's Database Layer

A critical buffer overflow vulnerability was discovered in `src/simmonitor/db/hoeldb.c`, where fixed-size heap buffers (150 and 250 bytes) were allocated with `malloc()` and then written to using `sprintf()` without any bounds checking. The fix replaces these unsafe patterns with `asprintf()` for dynamic allocation and `calloc()` for row data buffers, eliminating both the overflow risk and a related uninitialized memory hazard.

low

SQL Injection via String Formatting: How Parameterized Queries Save the Day

A database query in DBeaver's Altibase extension was constructing SQL statements using `String.format()` with user-controlled input, creating a classic SQL injection vulnerability. The fix replaces the unsafe string interpolation with parameterized queries using `PreparedStatement`, ensuring user input is always treated as data rather than executable SQL. This type of vulnerability is deceptively simple to introduce but equally simple to fix once you know what to look for.

high

SQL Injection in OceanBase Connector: How f-Strings Can Sink Your RAG Platform

A critical SQL injection vulnerability was discovered and patched in the OceanBase database connector used by a RAG (Retrieval-Augmented Generation) platform, where user-controlled filter expressions were directly embedded into SQL WHERE clauses using Python f-strings without any parameterization or validation. This flaw exposed the platform's entire knowledge base to complete compromise, including unauthorized data access, modification, and deletion. The fix replaces unsafe string interpolation

critical

How unsafe buffer copying happens in C credential storage and how to fix it

A critical vulnerability in `lib/server.c` allowed attackers to trigger out-of-bounds memory reads when copying credentials via unsafe `memcpy()` calls. By replacing `memcpy()` with bounds-safe `strlcpy()`, the fix ensures credentials are safely stored without buffer overruns or null-termination issues.

critical

How buffer overflow happens in C Bluetooth device handling and how to fix it

A critical buffer overflow vulnerability in `src/wiiuse.c` allowed attackers within Bluetooth range to trigger heap corruption by sending specially crafted HID packets with oversized length values. The fix adds strict bounds checking to validate that data lengths don't exceed buffer capacity before performing memory operations, preventing exploitation by malicious or intercepted Bluetooth devices.