Introduction
In the xxd hex dump utility, we discovered a critical buffer overflow vulnerability in src/xxd/xxd.c at line 576. The xxdline() function, which processes hex dump file input for the revert mode (xxd -r), used strcpy(z, l) to copy line data without any bounds checking. Since the source buffer l is populated directly from user-provided file input, an attacker could craft a hex dump file with lines exceeding the destination buffer z's size, triggering a stack buffer overflow.
This vulnerability is particularly dangerous because xxd is a command-line utility that processes arbitrary user-provided files, making the attack vector directly accessible to anyone who can supply input to the tool.
The Vulnerability Explained
The vulnerable code resided in the xxdline() function, which handles line-by-line processing of hex dump files during revert operations. Here's the problematic code:
static void xxdline(FILE *fp, char *l, char *colors, int nz)
{
static signed char zero_seen = 0;
if (!nz && zero_seen == 1) {
strcpy(z, l); // VULNERABLE: No bounds checking!
if (colors) {
memcpy(z_colors, colors, strlen(z));
}
The issue is straightforward but severe:
lis file-derived input: The bufferlcontains data read from a hex dump file provided by the userzis a fixed-size buffer: The destination bufferzhas a predetermined sizestrcpy()has no length limit: This function copies bytes until it encounters a null terminator, regardless of the destination buffer's capacity
Attack Scenario
An attacker could exploit this vulnerability with these steps:
- Create a malicious hex dump file with a line containing more than 256 bytes of hex characters
- Run
xxd -r malicious_file.hex > output.bin - When
xxdline()processes the oversized line,strcpy(z, l)writes beyondz's boundary - This corrupts the stack, potentially overwriting the return address
- With a carefully crafted payload, the attacker achieves arbitrary code execution
A simple proof-of-concept payload would be a hex dump file containing a single line with 512+ hex characters:
444444444444444444444444444444444444444444444444... (512+ '4' characters)
The regression test in the PR demonstrates this exact attack vector:
/* Generate oversized hex payloads */
char payload_256[513];
memset(payload_256, '4', 512);
payload_256[512] = '\0';
The Fix
The fix is elegant and follows C security best practices—replacing the unbounded strcpy() with the bounds-checked snprintf():
Before (Vulnerable)
strcpy(z, l);
After (Fixed)
snprintf(z, sizeof(z), "%s", l);
This change provides several security guarantees:
sizeof(z)enforces the buffer limit: The second argument explicitly tellssnprintf()the maximum number of bytes to write- Automatic null-termination: Unlike
strncpy(),snprintf()always null-terminates the output (if size > 0) - Truncation over corruption: If
lexceeds the buffer size, the data is truncated rather than overflowing
The PR also added a functional test to prevent regression:
it('handles long lines in revert mode', function()
t.skip(t.is_arch('s390x'), 'FIXME: xxd not built correctly on s390x with QEMU?')
local long_line = ('4'):rep(512) .. '\n'
fn.system({ testprg('xxd'), '-r' }, long_line)
eq(0, eval('v:shell_error'))
end)
This test ensures xxd gracefully handles 512-character lines without crashing—a direct verification that the buffer overflow is mitigated.
Note on Additional Vulnerable Locations
The PR notes that line 1115 in the same file uses a similar pattern and may need review. This highlights an important principle: when fixing one instance of an unsafe pattern, always search for similar patterns throughout the codebase.
Prevention & Best Practices
Immediate Actions
- Ban unsafe string functions: Configure your compiler and linters to warn on
strcpy(),strcat(),sprintf(), andgets() - Use bounds-checked alternatives:
-strcpy()→snprintf()orstrlcpy()
-strcat()→strncat()orstrlcat()
-sprintf()→snprintf()
-gets()→fgets()
Compiler Flags
Enable stack protection and buffer overflow detection:
gcc -fstack-protector-strong -D_FORTIFY_SOURCE=2 -O2 source.c
Static Analysis
Use tools that detect buffer overflow patterns:
- Semgrep: Rules for detecting unsafe C string functions
- Clang Static Analyzer: Built-in checks for buffer overflows
- Coverity: Commercial tool with deep buffer analysis
Code Review Checklist
When reviewing C code that handles external input:
- [ ] Are all string copies bounds-checked?
- [ ] Is
sizeof()used correctly (not on pointers)? - [ ] Are buffer sizes validated before use?
- [ ] Is input length checked before processing?
Key Takeaways
- Never use
strcpy()with file-derived input: Thexxdline()function processed user-provided hex dump files, makingstrcpy(z, l)a direct attack vector snprintf()is the safe replacement forstrcpy()in C: It enforces bounds and always null-terminates- Command-line utilities processing user files are high-risk: xxd's
-rmode reads arbitrary files, requiring defensive coding throughout - Search for pattern siblings: The PR notes line 1115 may have the same issue—one vulnerability often indicates more
- Regression tests prevent security fixes from being undone: The Lua test with 512-character input ensures this specific attack vector stays closed
How Orbis AppSec Detected This
- Source: File input processed by the xxd utility's revert mode (
xxd -r), specifically the line bufferlpopulated from user-provided hex dump files - Sink:
strcpy(z, l)atsrc/xxd/xxd.c:576in thexxdline()function - Missing control: No bounds checking between the file-derived input length and the destination buffer
z's capacity - CWE: CWE-120 (Buffer Copy without Checking Size of Input)
- Fix: Replaced
strcpy(z, l)withsnprintf(z, sizeof(z), "%s", l)to enforce the buffer size limit
Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.
Conclusion
Buffer overflows remain one of the most dangerous vulnerability classes in C programming, and this xxd vulnerability demonstrates why. A single strcpy() call without bounds checking created a critical security flaw that could be exploited through a crafted input file. The fix—replacing strcpy() with snprintf()—is simple but essential.
When working with C code that processes external input, always assume the input is malicious. Use bounds-checked functions, enable compiler protections, and implement regression tests for security fixes. The few extra characters of code for snprintf(z, sizeof(z), "%s", l) versus strcpy(z, l) are the difference between secure software and a critical vulnerability.