Unsafe Dict Merge in Scapy: How __dict__.update() Opens the Door to Object Injection
Introduction
When building networked applications in Python, it's tempting to use convenient shortcuts to populate object attributes from parsed data. One such shortcut—self.__dict__.update(entries)—looks harmless at first glance. After all, it's just copying some keys and values into an object, right?
Wrong. When the source of those keys and values is an untrusted network packet or external input, this single line of code can become a critical security vulnerability. This post breaks down a real-world vulnerability discovered in scapy/scapy_pcp.py, explains how it could be exploited, and walks through what a proper fix looks like.
Whether you're a seasoned security engineer or a developer just beginning to think about secure coding practices, this vulnerability offers a powerful lesson: never merge untrusted data directly into your object's internal namespace.
The Vulnerability Explained
What Is __dict__.update() and Why Is It Dangerous?
In Python, every object has a __dict__ attribute—a dictionary that stores the object's instance attributes. When you write:
self.__dict__.update(entries)
You are directly merging every key-value pair from entries into the object's attribute namespace. If entries comes from a trusted, controlled source (like a hardcoded config), this is fine. But if entries is derived from a parsed network packet or any other form of external input, you've just handed an attacker the keys to your object.
The Vulnerable Code
At line 37 of scapy/scapy_pcp.py, the vulnerable pattern looked something like this:
# VULNERABLE CODE - Do not use
class PCPMessage:
def __init__(self, entries):
# Directly merging externally-supplied dictionary into object namespace
self.__dict__.update(entries) # ← Line 37: DANGEROUS
At first glance, this seems like a convenient way to initialize an object from parsed packet fields. In practice, it's a wide-open door for attackers.
How Could It Be Exploited?
Python objects have a number of special "dunder" (double-underscore) attributes and methods that control fundamental behavior:
| Attribute | What It Controls |
|---|---|
__class__ |
The object's type/class |
__init__ |
The constructor method |
__repr__ |
String representation |
__reduce__ |
Pickle serialization behavior |
__module__ |
The module the class belongs to |
Because self.__dict__.update(entries) performs no filtering whatsoever, an attacker who can craft a malicious packet can inject any of these keys.
Consider a crafted packet payload that, when parsed, produces a dictionary like:
malicious_entries = {
"opcode": 1, # Legitimate field
"lifetime": 3600, # Legitimate field
"__class__": <some_malicious_class>, # INJECTED
"__init__": lambda self: exec("import os; os.system('rm -rf /')"), # INJECTED
"_internal_state": "corrupted" # INJECTED internal attribute
}
When self.__dict__.update(malicious_entries) runs, all of these keys—including the dangerous ones—get written directly into the object.
Real-World Attack Scenario
Imagine a network service that:
1. Listens for incoming PCP (Port Control Protocol) packets
2. Parses each packet using Scapy
3. Creates a PCPMessage object from the parsed fields
4. Passes that object to downstream business logic
An attacker on the network sends a specially crafted packet. The parser extracts fields from it and builds a dictionary. That dictionary gets passed to PCPMessage.__init__(). Because of the unchecked __dict__.update(), the attacker's injected keys overwrite critical object attributes.
Depending on how the application uses the resulting object, this could lead to:
- Object state corruption: Internal counters, flags, or state variables get overwritten with attacker-controlled values
- Method hijacking: Overwriting callable attributes causes the application to execute attacker-supplied logic
- Denial of Service: Injecting oversized payloads or recursive structures exhausts memory/CPU (related to the input size constraints issue noted in V-008)
- Privilege escalation: In some application architectures, corrupting object state can bypass authorization checks
The Fix
What Needs to Change
The core problem is the complete absence of input validation and key filtering. A proper fix must ensure that:
- Only expected keys are accepted — Define an allowlist of valid attribute names
- Dunder attributes are explicitly blocked — Never allow
__-prefixed keys from external input - Values are validated — Check types and sizes before assignment
- Unexpected keys are rejected or logged — Don't silently ignore potentially malicious input
The Secure Pattern (After Fix)
Here is what a hardened version of this code should look like:
# SECURE CODE - After fix
class PCPMessage:
# Explicit allowlist of valid, expected fields
ALLOWED_FIELDS = frozenset({
"opcode",
"lifetime",
"result_code",
"protocol",
"internal_port",
"external_port",
})
# Maximum allowed size for string/bytes fields
MAX_FIELD_SIZE = 1024 # bytes
def __init__(self, entries):
if not isinstance(entries, dict):
raise TypeError("entries must be a dictionary")
for key, value in entries.items():
# Block dunder and private attributes entirely
if key.startswith("_"):
raise ValueError(f"Illegal field name rejected: {key!r}")
# Only accept explicitly allowlisted keys
if key not in self.ALLOWED_FIELDS:
raise ValueError(f"Unknown field rejected: {key!r}")
# Enforce size constraints on string/bytes values
if isinstance(value, (str, bytes)) and len(value) > self.MAX_FIELD_SIZE:
raise ValueError(f"Field {key!r} exceeds maximum allowed size")
# Safe to set — key is validated and allowlisted
setattr(self, key, value)
Why This Fix Works
Let's walk through each defense layer:
1. Allowlist validation (ALLOWED_FIELDS)
Instead of accepting any key that arrives in the dictionary, we define exactly which keys are valid. Anything not on the list is rejected immediately. This is the classic allowlist over blocklist principle—far more robust than trying to enumerate all the bad things to block.
2. Dunder/private key blocking
The key.startswith("_") check ensures that even if someone somehow adds a new dunder attribute to Python in the future, it will still be blocked. Defense in depth.
3. setattr() instead of __dict__.update()
Using setattr(self, key, value) respects Python's attribute setting protocol, including any __setattr__ overrides you might add for additional validation. Direct __dict__ manipulation bypasses these safeguards entirely.
4. Size constraints
Enforcing MAX_FIELD_SIZE addresses the related Denial of Service vector (V-008) where oversized payloads could exhaust server resources during processing.
5. Type checking
isinstance(entries, dict) ensures we fail fast if something unexpected is passed in, rather than producing confusing errors downstream.
Prevention & Best Practices
1. Never Use __dict__.update() with Untrusted Data
This is the cardinal rule. If your data source is a network packet, a user-submitted form, an API request, or any other external input, never pass it directly to __dict__.update().
# ❌ NEVER do this with external data
self.__dict__.update(untrusted_data)
# ✅ Always validate first
for key, value in untrusted_data.items():
if key in ALLOWED_FIELDS:
setattr(self, key, value)
2. Use Data Validation Libraries
Libraries like Pydantic or marshmallow are purpose-built for this problem. They enforce schemas, validate types, and reject unexpected fields automatically:
from pydantic import BaseModel, Field
from typing import Optional
class PCPMessage(BaseModel):
opcode: int = Field(..., ge=0, le=255)
lifetime: int = Field(..., ge=0, le=86400)
result_code: Optional[int] = Field(None, ge=0, le=255)
internal_port: int = Field(..., ge=0, le=65535)
external_port: int = Field(..., ge=0, le=65535)
class Config:
# Reject any extra fields not defined in the model
extra = "forbid"
With Pydantic, attempting to pass __class__ or any other unexpected key will raise a ValidationError automatically.
3. Apply the Principle of Least Privilege to Data
When parsing network packets, only extract and store the fields your application actually needs. Discard everything else at the parsing stage, before it ever reaches your business logic objects.
4. Enforce Input Size Limits Early
Size validation should happen at the network/API boundary, not deep in your business logic:
MAX_PAYLOAD_SIZE = 4096 # bytes
def handle_packet(raw_data: bytes):
if len(raw_data) > MAX_PAYLOAD_SIZE:
raise ValueError("Packet exceeds maximum allowed size")
# ... proceed with parsing
5. Use Static Analysis Tools
Several tools can catch this class of vulnerability automatically:
- Bandit — Python security linter that flags dangerous patterns including
__dict__manipulation - Semgrep — Highly configurable static analysis with rules for injection vulnerabilities
- PyLint with security plugins
- Snyk Code — AI-powered SAST that understands context
Run these tools in your CI/CD pipeline so vulnerabilities are caught before they reach production.
6. Know Your CWEs
This vulnerability maps to several well-documented weakness categories:
| CWE | Description |
|---|---|
| CWE-915 | Improperly Controlled Modification of Dynamically-Determined Object Attributes |
| CWE-20 | Improper Input Validation |
| CWE-400 | Uncontrolled Resource Consumption (DoS aspect) |
| CWE-94 | Improper Control of Generation of Code |
Familiarizing yourself with the CWE catalog is an excellent way to recognize vulnerability patterns before you accidentally introduce them.
7. OWASP References
This vulnerability is relevant to several OWASP categories:
- OWASP Top 10 A03:2021 – Injection: Attacker-controlled data influencing program logic
- OWASP Top 10 A04:2021 – Insecure Design: Lack of input validation at design level
- OWASP Top 10 A05:2021 – Security Misconfiguration: Overly permissive data handling
Conclusion
The self.__dict__.update(entries) pattern is a perfect example of how a single convenient line of code can introduce a serious security vulnerability. When entries comes from a network packet—as is the case in Scapy-based applications—you're essentially letting the network tell your object what it is and how it behaves.
The key takeaways from this vulnerability are:
- Treat all external data as hostile until proven otherwise
- Use allowlists, not blocklists, when validating input keys
- Never bypass Python's attribute protocol by writing directly to
__dict__ - Enforce size constraints early to prevent resource exhaustion
- Use schema validation libraries like Pydantic to make safe-by-default data handling easy
Security vulnerabilities in network parsing code are particularly dangerous because they can often be triggered remotely without authentication. Taking the time to add proper input validation isn't just good practice—in network-facing code, it's essential.
The fix applied here is a great template for any Python code that needs to initialize objects from external data sources. Copy the pattern, adapt the allowlist to your domain, and your code will be significantly more resilient against this class of attack.
Found a security vulnerability in your codebase? Consider integrating automated security scanning into your CI/CD pipeline to catch issues like this before they reach production.