hexdump and hexdump -C#
While exploring null bytes in this issue I learned that the hexdump command on macOS (and presumably other Unix systems) has a confusing default output.
Consider the following:
1$ echo -n 'abc\0' | hexdump20000000 6261 006330000004Compared to:
1$ echo -n 'a' | hexdump20000000 006130000001I’m using echo -n here to avoid adding an extra newline, which encodes as 0a.
My shell hell is zsh - bash requires different treatment, see below.
How come abc\0 starts with 6261 where a starts with 0061?
It turns out hexdump default format is 16-bit words in little-endian format, which is really confusing.
hexdump -C#
Using the -C option fixes this:
1$ echo -n 'a' | hexdump -C200000000 61 |a|3000000014$ echo -n 'abc\0' | hexdump -C500000000 61 62 63 00 |abc.|600000004C here stands for “canonical”.
In addition to causing hexdump to output byte by byte, it also includes an ASCII representation on the right hand side.
Null bytes in Bash#
Karl Pettersson pointed out that these examples won’t work on Bash.
I ran bash on my Mac and found the following:
1bash-3.2$ echo -n 'abc\0' | hexdump -C200000000 61 62 63 5c 30 |abc\0|3000000054bash-3.2$ echo -n $'abc\0' | hexdump -C500000000 61 62 63 |abc|6000000037bash-3.2$ printf 'abc\0' | hexdump -C800000000 61 62 63 00 |abc.|90000000410bash-3.2$ printf $'abc\0' | hexdump -C1100000000 61 62 63 |abc|1200000003So it looks like using printf 'abc\0' is the best recipe for Bash on macOS. I’m not sure if Bash on other platforms differs.
Bill Mill suggested echo -ne for this:
1bash-3.2$ echo -ne 'abc\0' | hexdump -C200000000 61 62 63 00 |abc.|300000004The -e option enables the interpretation of backslash escapes.