Newsletter
TechAnV Blog
Get updates on security engineering, Rust, eBPF, and DevSecOps. No spam, unsubscribe anytime.
Check your inbox and click the confirmation link to complete your subscription.
Using io.BufferedReader to peek against a non-peekable stream#
When building the —sniff option for sqlite-utils insert (which attempts to detect the correct CSV delimiter and quote character by looking at the first 2048 bytes of a CSV file) I had the need to peek ahead in an incoming stream of data.
I use Click, and Click can automatically handle both files and standard input. The problem I had is that peeking ahead in a file is easy (you can call .read() and then .seek(0), or use the .peek() method directly) but peaking ahead in standard input is not - anything you consume from that is not available to rewind to later on.
Since my code works by passing a file-like object to the csv.reader() function I needed a way to read the first 2048 bytes but then reset the stream ready for that function to consume it.
I figured out how to do that using the io.BufferedReader class. Here’s the pattern:
1import io2import sys3import csv4
5# Get a file-like object in binary mode6fp = open("myfile.csv", "rb")7# Or from standard input (need to use .buffer here)8fp = sys.stdin.buffer9
10# Wrap it in a buffered reader with a 4096 byte buffer11buffered = io.BufferedReader(fp, buffer_size=4096)12
13# Wrap THAT in a text io wrapper that can decode to unicode14decoded = io.TextIOWrapper(buffered, encoding="utf-8")15
16# Now I can read the first 2048 bytes...17first_bytes = buffered.peek(2048)18
19# But I can still pass the "decoded" object to csv.reader20reader = csv.reader(decoded)21for row in reader:22 print(row)My implementation is in this commit.