Newsletter
TechAnV Blog
Get updates on security engineering, Rust, eBPF, and DevSecOps. No spam, unsubscribe anytime.
Check your inbox and click the confirmation link to complete your subscription.
Using iconv to convert the text encoding of a file#
In sqlite-utils issue 439 I was testing against a CSV file that used UTF16 little endian encoding, also known as utf-16-le.
I converted it to UTF-8 using iconv like this:
1iconv -f UTF-16LE -t UTF-8 file-in-utf16le.csv > file-in-utf8.csvThe -f argument here is the input encoding and -t is the desired output encoding.
I figured out the -f argument should be UTF-16LE (after first trying and failing with utf-16-le) by running:
1iconv -lThis outputs all of the available encoding options. It’s a pretty long list so I filtered it like this:
1% iconv -l | grep UTF2UTF-8 UTF83UTF-8-MAC UTF8-MAC4UTF-165UTF-16BE6UTF-16LE7UTF-328UTF-32BE9UTF-32LE10UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7Discarding invalid characters#
I picked up this tip from Ben Brandwood: you can also use iconv to fix problems when a file includes invalid UTF-8 characters.
The trick is to use the -c option, which iconv --help tells you will “discard unconvertible characters”.
Here’s Ben’s recipe:
1iconv -f utf-8 -t utf-8 -c FILE.txt -o NEW_FILENote that the input encoding (-f) and the output encoding (-t) are the same here. The -c option does all of the work.