Newsletter
TechAnV Blog
Get updates on security engineering, Rust, eBPF, and DevSecOps. No spam, unsubscribe anytime.
Check your inbox and click the confirmation link to complete your subscription.
Handling CSV files with wide columns in Python#
Users were reporting the following error using sqlite-utils to import some CSV files:
1_csv.Error: field larger than field limit (131072)It turns out the Python standard library CSV module enforces a default field size limit on columns, and anything with more than 128KB of text in a column will raise an error.
You can modify this error using the csv.field_size_limit(new_limit) function.
There’s one catch: the method doesn’t provide a way to say “no limit”. And it can throw an error if you feed it a value that is larger than the C long integer size on your platform.
So how do you set it to the maximum possible value? There’s an extensive StackOverflow thread about this, with a number of different proposed solutions. Several of those use ctypes to find the correct value.
I didn’t want to add a ctypes dependency out of paranoia that someone would try to use my library on a platform that didn’t support it (I don’t know if that paranoia has any basis at all). So I picked this pattern, suggested by StackOverflow user user1251007:
1# Increase CSV field size limit to maximim possible2# https://stackoverflow.com/a/150639413field_size_limit = sys.maxsize4
5while True:6 try:7 csv_std.field_size_limit(field_size_limit)8 break9 except OverflowError:10 field_size_limit = int(field_size_limit / 10)This appears to work just fine. On macOS sys.maxsize works already, and on other platforms it should pick a field size limit that works.