Handling CSV files with wide columns in Python#
Users were reporting the following error using sqlite-utils
to import some CSV files:
1_csv.Error: field larger than field limit (131072)
It turns out the Python standard library CSV module enforces a default field size limit on columns, and anything with more than 128KB of text in a column will raise an error.
You can modify this error using the csv.field_size_limit(new_limit) function.
There’s one catch: the method doesn’t provide a way to say “no limit”. And it can throw an error if you feed it a value that is larger than the C long integer size on your platform.
So how do you set it to the maximum possible value? There’s an extensive StackOverflow thread about this, with a number of different proposed solutions. Several of those use ctypes
to find the correct value.
I didn’t want to add a ctypes
dependency out of paranoia that someone would try to use my library on a platform that didn’t support it (I don’t know if that paranoia has any basis at all). So I picked this pattern, suggested by StackOverflow user user1251007:
1# Increase CSV field size limit to maximim possible2# https://stackoverflow.com/a/150639413field_size_limit = sys.maxsize4
5while True:6 try:7 csv_std.field_size_limit(field_size_limit)8 break9 except OverflowError:10 field_size_limit = int(field_size_limit / 10)
This appears to work just fine. On macOS sys.maxsize
works already, and on other platforms it should pick a field size limit that works.