Using io.BufferedReader to peek against a non-peekable stream#
When building the —sniff option for sqlite-utils insert
(which attempts to detect the correct CSV delimiter and quote character by looking at the first 2048 bytes of a CSV file) I had the need to peek ahead in an incoming stream of data.
I use Click, and Click can automatically handle both files and standard input. The problem I had is that peeking ahead in a file is easy (you can call .read()
and then .seek(0)
, or use the .peek()
method directly) but peaking ahead in standard input is not - anything you consume from that is not available to rewind to later on.
Since my code works by passing a file-like object to the csv.reader()
function I needed a way to read the first 2048 bytes but then reset the stream ready for that function to consume it.
I figured out how to do that using the io.BufferedReader
class. Here’s the pattern:
1import io2import sys3import csv4
5# Get a file-like object in binary mode6fp = open("myfile.csv", "rb")7# Or from standard input (need to use .buffer here)8fp = sys.stdin.buffer9
10# Wrap it in a buffered reader with a 4096 byte buffer11buffered = io.BufferedReader(fp, buffer_size=4096)12
13# Wrap THAT in a text io wrapper that can decode to unicode14decoded = io.TextIOWrapper(buffered, encoding="utf-8")15
16# Now I can read the first 2048 bytes...17first_bytes = buffered.peek(2048)18
19# But I can still pass the "decoded" object to csv.reader20reader = csv.reader(decoded)21for row in reader:22 print(row)
My implementation is in this commit.