hexdump and hexdump -C#
While exploring null bytes in this issue I learned that the hexdump
command on macOS (and presumably other Unix systems) has a confusing default output.
Consider the following:
1$ echo -n 'abc\0' | hexdump20000000 6261 006330000004
Compared to:
1$ echo -n 'a' | hexdump20000000 006130000001
I’m using echo -n
here to avoid adding an extra newline, which encodes as 0a
.
My shell hell is zsh
- bash
requires different treatment, see below.
How come abc\0
starts with 6261 where a
starts with 0061
?
It turns out hexdump
default format is 16-bit words in little-endian format, which is really confusing.
hexdump -C#
Using the -C
option fixes this:
1$ echo -n 'a' | hexdump -C200000000 61 |a|3000000014$ echo -n 'abc\0' | hexdump -C500000000 61 62 63 00 |abc.|600000004
C
here stands for “canonical”.
In addition to causing hexdump
to output byte by byte, it also includes an ASCII representation on the right hand side.
Null bytes in Bash#
Karl Pettersson pointed out that these examples won’t work on Bash.
I ran bash
on my Mac and found the following:
1bash-3.2$ echo -n 'abc\0' | hexdump -C200000000 61 62 63 5c 30 |abc\0|3000000054bash-3.2$ echo -n $'abc\0' | hexdump -C500000000 61 62 63 |abc|6000000037bash-3.2$ printf 'abc\0' | hexdump -C800000000 61 62 63 00 |abc.|90000000410bash-3.2$ printf $'abc\0' | hexdump -C1100000000 61 62 63 |abc|1200000003
So it looks like using printf 'abc\0'
is the best recipe for Bash on macOS. I’m not sure if Bash on other platforms differs.
Bill Mill suggested echo -ne
for this:
1bash-3.2$ echo -ne 'abc\0' | hexdump -C200000000 61 62 63 00 |abc.|300000004
The -e
option enables the interpretation of backslash escapes.