One-liner for running queries against CSV files with SQLite
I figured out how to run a SQL query directly against a CSV file using the sqlite3 command-line utility:
sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' \ 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'This uses the special :memory: filename to open an in-memory database. Then it uses two -cmd options to turn on CSV mode and import the taxi.csv file into a table called taxi. Then it runs the SQL query.
Instead of setting the mode with .mode you can use .import -csv like this (thanks, Mark Lawrence):
sqlite3 :memory: -cmd '.import -csv taxi.csv taxi' \ 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'You can get taxi.csv by downloading the compressed file from here and running:
7z e -aos taxi.csv.7zI figured this out while commenting on this issue.
The output looks like this:
"",128020,32.23715114825530,42228,17.02140167661511,1533197,17.64188330679992,286461,18.09758707114563,72852,17.91539587109234,25510,18.4527749901965,50291,17.27092481756726,32623,17.60029641663677,2,87.178,2,95.7059,1,113.6Add -cmd '.mode column' to output in columns instead:
$ sqlite3 :memory: -cmd '.mode csv' -cmd '.import taxi.csv taxi' -cmd '.mode column' \ 'SELECT passenger_count, COUNT(*), AVG(total_amount) FROM taxi GROUP BY passenger_count'passenger_count COUNT(*) AVG(total_amount)--------------- -------- ----------------- 128020 32.23715114825530 42228 17.02140167661511 1533197 17.64188330679992 286461 18.09758707114563 72852 17.91539587109234 25510 18.4527749901965 50291 17.27092481756726 32623 17.60029641663677 2 87.178 2 95.7059 1 113.6Or use -cmd '.mode markdown' to get a Markdown table:
| passenger_count | COUNT(*) | AVG(total_amount) |
|---|---|---|
| 128020 | 32.2371511482553 | |
| 0 | 42228 | 17.0214016766151 |
| 1 | 1533197 | 17.6418833067999 |
| 2 | 286461 | 18.0975870711456 |
| 3 | 72852 | 17.9153958710923 |
| 4 | 25510 | 18.452774990196 |
| 5 | 50291 | 17.2709248175672 |
| 6 | 32623 | 17.6002964166367 |
| 7 | 2 | 87.17 |
| 8 | 2 | 95.705 |
| 9 | 1 | 113.6 |
A full list of output modes can be seen like this:
% sqlite3 -cmd '.help mode'.mode MODE ?TABLE? Set output mode MODE is one of: ascii Columns/rows delimited by 0x1F and 0x1E box Tables using unicode box-drawing characters csv Comma-separated values column Output in columns. (See .width) html HTML <table> code insert SQL insert statements for TABLE json Results in a JSON array line One value per line list Values delimited by "|" markdown Markdown table format quote Escape answers as for SQL table ASCII-art table tabs Tab-separated values tcl TCL list elementsOther options
There are a whole bunch of other tools that can be used for this kind of thing!
My own sqlite-utils memory command can load data from JSON, CSV or TSV into an in-memory database and run a query against it. It’s a LOT slower than using sqlite3 directly though.
dsq is a tool that does this kind of thing (and a lot more). Author Phil Eaton compiled a collection of benchmarks of other similar tools, and his benchmarking script demonstrates how to use each one of them.