Using iconv to convert the text encoding of a file#
In sqlite-utils issue 439 I was testing against a CSV file that used UTF16 little endian encoding, also known as utf-16-le.
I converted it to UTF-8 using iconv like this:
1iconv -f UTF-16LE -t UTF-8 file-in-utf16le.csv > file-in-utf8.csvThe -f argument here is the input encoding and -t is the desired output encoding.
I figured out the -f argument should be UTF-16LE (after first trying and failing with utf-16-le) by running:
1iconv -lThis outputs all of the available encoding options. It’s a pretty long list so I filtered it like this:
1% iconv -l | grep UTF2UTF-8 UTF83UTF-8-MAC UTF8-MAC4UTF-165UTF-16BE6UTF-16LE7UTF-328UTF-32BE9UTF-32LE10UNICODE-1-1-UTF-7 UTF-7 CSUNICODE11UTF7Discarding invalid characters#
I picked up this tip from Ben Brandwood: you can also use iconv to fix problems when a file includes invalid UTF-8 characters.
The trick is to use the -c option, which iconv --help tells you will “discard unconvertible characters”.
Here’s Ben’s recipe:
1iconv -f utf-8 -t utf-8 -c FILE.txt -o NEW_FILENote that the input encoding (-f) and the output encoding (-t) are the same here. The -c option does all of the work.