What is a buffer overflow vulnerability?

A buffer overflow occurs when a program writes data beyond the allocated memory boundary, potentially corrupting adjacent memory, crashing the application, or enabling arbitrary code execution.

How do you prevent buffer overflow in C?

Use bounds-checked functions like snprintf() instead of strcpy(), strncpy() instead of strcat(), and always validate input lengths before copying data into fixed-size buffers.

What CWE is buffer overflow?

Buffer overflow is classified as CWE-120 (Buffer Copy without Checking Size of Input), part of the broader CWE-119 family of memory corruption vulnerabilities.

Is using strncpy() enough to prevent buffer overflow?

strncpy() helps but has pitfalls—it doesn't null-terminate if the source exceeds the limit. snprintf() is safer as it always null-terminates and returns the number of characters that would have been written.

Can static analysis detect buffer overflow?

Yes, static analysis tools can detect many buffer overflow patterns, especially unsafe function usage like strcpy(), gets(), and sprintf(). However, complex data flows may require dynamic analysis or manual review.

How buffer overflow happens in C xxd utility

Introduction

In the xxd hex dump utility, we discovered a critical buffer overflow vulnerability in src/xxd/xxd.c at line 576. The xxdline() function, which processes hex dump file input for the revert mode (xxd -r), used strcpy(z, l) to copy line data without any bounds checking. Since the source buffer l is populated directly from user-provided file input, an attacker could craft a hex dump file with lines exceeding the destination buffer z's size, triggering a stack buffer overflow.

This vulnerability is particularly dangerous because xxd is a command-line utility that processes arbitrary user-provided files, making the attack vector directly accessible to anyone who can supply input to the tool.

The Vulnerability Explained

The vulnerable code resided in the xxdline() function, which handles line-by-line processing of hex dump files during revert operations. Here's the problematic code:

static void xxdline(FILE *fp, char *l, char *colors, int nz)
{
  static signed char zero_seen = 0;

  if (!nz && zero_seen == 1) {
    strcpy(z, l);  // VULNERABLE: No bounds checking!
    if (colors) {
      memcpy(z_colors, colors, strlen(z));
    }

The issue is straightforward but severe:

l is file-derived input: The buffer l contains data read from a hex dump file provided by the user
z is a fixed-size buffer: The destination buffer z has a predetermined size
strcpy() has no length limit: This function copies bytes until it encounters a null terminator, regardless of the destination buffer's capacity

Attack Scenario

An attacker could exploit this vulnerability with these steps:

Create a malicious hex dump file with a line containing more than 256 bytes of hex characters
Run xxd -r malicious_file.hex > output.bin
When xxdline() processes the oversized line, strcpy(z, l) writes beyond z's boundary
This corrupts the stack, potentially overwriting the return address
With a carefully crafted payload, the attacker achieves arbitrary code execution

A simple proof-of-concept payload would be a hex dump file containing a single line with 512+ hex characters:

444444444444444444444444444444444444444444444444... (512+ '4' characters)

The regression test in the PR demonstrates this exact attack vector:

/* Generate oversized hex payloads */
char payload_256[513];
memset(payload_256, '4', 512);
payload_256[512] = '\0';

The Fix

The fix is elegant and follows C security best practices—replacing the unbounded strcpy() with the bounds-checked snprintf():

Before (Vulnerable)

strcpy(z, l);

After (Fixed)

snprintf(z, sizeof(z), "%s", l);

This change provides several security guarantees:

sizeof(z) enforces the buffer limit: The second argument explicitly tells snprintf() the maximum number of bytes to write
Automatic null-termination: Unlike strncpy(), snprintf() always null-terminates the output (if size > 0)
Truncation over corruption: If l exceeds the buffer size, the data is truncated rather than overflowing

The PR also added a functional test to prevent regression:

it('handles long lines in revert mode', function()
  t.skip(t.is_arch('s390x'), 'FIXME: xxd not built correctly on s390x with QEMU?')
  local long_line = ('4'):rep(512) .. '\n'
  fn.system({ testprg('xxd'), '-r' }, long_line)
  eq(0, eval('v:shell_error'))
end)

This test ensures xxd gracefully handles 512-character lines without crashing—a direct verification that the buffer overflow is mitigated.

Note on Additional Vulnerable Locations

The PR notes that line 1115 in the same file uses a similar pattern and may need review. This highlights an important principle: when fixing one instance of an unsafe pattern, always search for similar patterns throughout the codebase.

Prevention & Best Practices

Immediate Actions

Ban unsafe string functions: Configure your compiler and linters to warn on strcpy(), strcat(), sprintf(), and gets()
Use bounds-checked alternatives:
- strcpy() → snprintf() or strlcpy()
- strcat() → strncat() or strlcat()
- sprintf() → snprintf()
- gets() → fgets()

Compiler Flags

Enable stack protection and buffer overflow detection:

gcc -fstack-protector-strong -D_FORTIFY_SOURCE=2 -O2 source.c

Static Analysis

Use tools that detect buffer overflow patterns:

Semgrep: Rules for detecting unsafe C string functions
Clang Static Analyzer: Built-in checks for buffer overflows
Coverity: Commercial tool with deep buffer analysis

Code Review Checklist

When reviewing C code that handles external input:

[ ] Are all string copies bounds-checked?
[ ] Is sizeof() used correctly (not on pointers)?
[ ] Are buffer sizes validated before use?
[ ] Is input length checked before processing?

Key Takeaways

Never use strcpy() with file-derived input: The xxdline() function processed user-provided hex dump files, making strcpy(z, l) a direct attack vector
snprintf() is the safe replacement for strcpy() in C: It enforces bounds and always null-terminates
Command-line utilities processing user files are high-risk: xxd's -r mode reads arbitrary files, requiring defensive coding throughout
Search for pattern siblings: The PR notes line 1115 may have the same issue—one vulnerability often indicates more
Regression tests prevent security fixes from being undone: The Lua test with 512-character input ensures this specific attack vector stays closed

How Orbis AppSec Detected This

Source: File input processed by the xxd utility's revert mode (xxd -r), specifically the line buffer l populated from user-provided hex dump files
Sink: strcpy(z, l) at src/xxd/xxd.c:576 in the xxdline() function
Missing control: No bounds checking between the file-derived input length and the destination buffer z's capacity
CWE: CWE-120 (Buffer Copy without Checking Size of Input)
Fix: Replaced strcpy(z, l) with snprintf(z, sizeof(z), "%s", l) to enforce the buffer size limit

Orbis AppSec automatically detected this vulnerability and opened a pull request with the fix. Try Orbis AppSec on your repositories to find and fix issues like this automatically.

Conclusion

Buffer overflows remain one of the most dangerous vulnerability classes in C programming, and this xxd vulnerability demonstrates why. A single strcpy() call without bounds checking created a critical security flaw that could be exploited through a crafted input file. The fix—replacing strcpy() with snprintf()—is simple but essential.

When working with C code that processes external input, always assume the input is malicious. Use bounds-checked functions, enable compiler protections, and implement regression tests for security fixes. The few extra characters of code for snprintf(z, sizeof(z), "%s", l) versus strcpy(z, l) are the difference between secure software and a critical vulnerability.

cwe	CWE-120
fix	Replace strcpy() with snprintf() using sizeof(z) as the size limit
risk	Remote code execution or denial of service via crafted input file
language	C
root cause	strcpy() used without bounds checking on file-derived input
vulnerability	Buffer Overflow (Classic)

How buffer overflow happens in C xxd utility and how to fix it

Answer Summary

Vulnerability at a Glance