How Server-Side Request Forgery (SSRF) happens in Python requests.get() and how to fix it
Introduction
The models/common.py file in the YOLOv5 object detection framework handles model inference including loading images from various sources. At line 882, when the input string starts with 'http', the code calls requests.get() to fetch the image — but without any validation of where that URL actually points. This created a critical SSRF vulnerability that could allow attackers interacting with the Flask REST API to probe internal infrastructure, access cloud metadata endpoints, and potentially exfiltrate sensitive credentials.
The vulnerability is particularly dangerous because YOLOv5 is widely deployed as a service behind REST APIs, meaning user-supplied image URLs are a standard input vector. An attacker doesn't need any special privileges — they just need to supply a crafted URL where an image URL is expected.
The Vulnerability Explained
Server-Side Request Forgery occurs when a server-side application fetches a resource from a URL that an attacker can control, and the application doesn't validate whether that URL targets internal resources.
In models/common.py, the vulnerable pattern looked like this:
# Vulnerable code - fetches from any URL without validation
if str(im).startswith('http'):
response = requests.get(im)
# Process image from response...
The problem is straightforward: if im is "http://169.254.169.254/latest/meta-data/iam/security-credentials/", the server dutifully makes that request from within the cloud environment — where the metadata endpoint is accessible. The response containing IAM credentials gets returned to the attacker.
Concrete attack scenarios against this code:
-
AWS credential theft: An attacker sends
http://169.254.169.254/latest/meta-data/iam/security-credentials/as an image URL to the YOLOv5 inference API. The server fetches IAM temporary credentials and returns them (or an error containing the response body). -
Internal port scanning: By sending URLs like
http://192.168.1.1:8080/,http://192.168.1.1:3306/, etc., an attacker can map internal services by observing response times and error messages. -
Accessing internal services: URLs like
http://localhost:6379/could interact with internal Redis instances, potentially allowing data exfiltration or command execution.
Additionally, the diff reveals a secondary issue on line 514 where eval(meta["names"]) was used to parse model metadata — an arbitrary code execution risk if an attacker can control model files:
# Also vulnerable - arbitrary code execution via eval()
stride, names = int(meta["stride"]), eval(meta["names"])
The Fix
The fix introduces two new functions and applies multiple security improvements:
1. DNS-resolution-based URL validation (_validate_ssrf_url)
def _validate_ssrf_url(url: str) -> None:
"""Raise ValueError if url resolves to any private/internal address."""
hostname = urlparse(url).hostname or ""
try:
results = socket.getaddrinfo(hostname, None)
except socket.gaierror as e:
raise ValueError(f"Could not resolve hostname '{hostname}': {e}") from e
for _family, _type, _proto, _canonname, sockaddr in results:
addr = ipaddress.ip_address(sockaddr[0])
if addr.is_private or addr.is_loopback or addr.is_link_local or addr.is_reserved or addr.is_multicast:
raise ValueError(f"Blocked request to internal address: {addr}")
This function resolves the hostname to actual IP addresses using socket.getaddrinfo(), then checks every resolved address against Python's ipaddress module properties. This approach is superior to regex-based hostname matching because:
- It catches DNS names that resolve to private IPs (e.g.,
internal.attacker.com → 169.254.169.254) - It handles all IP representations (IPv4, IPv6, mapped addresses)
- It validates the actual network destination, not just the URL string
2. Redirect-safe fetching (_request_ssrf_url)
def _request_ssrf_url(url: str, max_redirects: int = 5):
"""Fetch a URL after validating against SSRF..."""
This wrapper handles HTTP redirects safely — each redirect target is re-validated before following it. This prevents DNS rebinding attacks where the initial URL passes validation but redirects to an internal address.
3. Replacing eval() with ast.literal_eval()
# Before (dangerous):
stride, names = int(meta["stride"]), eval(meta["names"])
# After (safe):
stride, names = int(meta["stride"]), ast.literal_eval(meta["names"])
This change ensures that model metadata parsing only accepts Python literals (strings, numbers, tuples, lists, dicts, booleans, None) and cannot execute arbitrary code.
New imports added:
import ipaddress
import socket
from urllib.parse import urljoin, urlparse # urljoin added
Prevention & Best Practices
1. Always validate URLs at the network level, not the string level
String-based validation (regex on hostnames) is consistently bypassed. Resolve the hostname and validate the IP:
import socket, ipaddress
def is_safe_url(url):
hostname = urlparse(url).hostname
for result in socket.getaddrinfo(hostname, None):
addr = ipaddress.ip_address(result[4][0])
if addr.is_private or addr.is_loopback or addr.is_link_local:
return False
return True
2. Validate after every redirect
HTTP 301/302 redirects can point to internal addresses. Disable automatic redirect following and validate each hop:
response = requests.get(url, allow_redirects=False)
while response.is_redirect:
redirect_url = response.headers['Location']
_validate_ssrf_url(redirect_url) # Validate before following
response = requests.get(redirect_url, allow_redirects=False)
3. Never use eval() on external data
Replace eval() with ast.literal_eval() for parsing data structures, or use proper serialization formats (JSON, YAML with safe loading).
4. Apply network-level controls
Even with code-level validation, defense in depth matters:
- Use network policies to restrict outbound traffic from application servers
- Block metadata endpoint access via instance firewall rules
- Use IMDSv2 (token-required) on AWS to mitigate metadata theft
Key Takeaways
requests.get()on user-supplied URLs is an SSRF sink — any code path where external input reaches an HTTP client must validate the resolved IP addresses, not just the URL string.- DNS resolution is the correct validation layer —
socket.getaddrinfo()+ipaddressmodule catches bypasses that string-matching misses, including DNS rebinding and alternative IP representations. - Redirect following must re-validate each hop — the
_request_ssrf_url()function withmax_redirectsprevents attackers from using open redirects to reach internal services. eval()on model metadata was a hidden RCE — theast.literal_eval()replacement eliminates arbitrary code execution from malicious model files, a supply-chain attack vector.- ML inference APIs are high-value SSRF targets — because they naturally accept URLs as input (for images, models, datasets), they require explicit SSRF protection that general web frameworks don't provide by default.
How Orbis AppSec Detected This
- Source: User-supplied image URL string passed through the Flask REST API inference endpoint, flowing into the
imparameter inmodels/common.py - Sink:
requests.get(im)atmodels/common.py:882— an unvalidated HTTP request to an attacker-controlled destination - Missing control: No DNS resolution validation, no IP address range checking, no redirect validation before fetching arbitrary URLs
- CWE: CWE-918 (Server-Side Request Forgery)
- Fix: Added
_validate_ssrf_url()function that resolves hostnames viasocket.getaddrinfo()and blocks requests to any private, loopback, link-local, reserved, or multicast IP addresses
Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.
Conclusion
This SSRF vulnerability in YOLOv5's models/common.py demonstrates a pattern common in ML/AI services: image processing pipelines that accept URLs as input without considering that the server's network position grants access to internal resources an external attacker shouldn't reach. The fix — resolving hostnames to IPs and validating each address against private ranges — is the gold standard for SSRF prevention in Python. Combined with redirect validation and the bonus eval() → ast.literal_eval() hardening, this patch closes both a network-level and code-execution attack surface.
If your application fetches resources from user-supplied URLs, implement DNS-resolution-based validation before every request. Don't trust the URL string — trust the resolved IP address.