Running PyPy on macOS using Homebrew#
Towards Inserting One Billion Rows in SQLite Under A Minute includes this snippet:
All I had to do was run my existing code, without any change, using PyPy. It worked and the speed bump was phenomenal. The batched version took only 2.5 minutes to insert 100M rows. I got close to 3.5x speed :)
I decided to try this out against my own Python tool for inserting CSV files, sqlite-utils
.
I installed PyPy using Homebrew:
1brew install pypy3
Having run this, pypy3
was available on my command-line.
I used that to create a PyPy virtual environment in my /tmp
directory:
1cd /tmp2pypy3 -m venv venv3source venv/bin/activate
Running python --version
confirmed that this had worked:
1% python --version2Python 3.7.13 (7e0ae751533460d5f89f3ac48ce366d8642d1db5, Apr 26 2022, 09:29:08)3[PyPy 7.3.9 with GCC Apple LLVM 13.1.6 (clang-1316.0.21.2)]
Then I installed sqlite-utils
into that virtual environment like so:
1pip install sqlite-utils
And confirmed the installation like this:
1(venv) /tmp % which sqlite-utils2/private/tmp/venv/bin/sqlite-utils3(venv) /tmp % head $(which sqlite-utils)4#!/private/tmp/venv/bin/pypy35# -*- coding: utf-8 -*-6import re7import sys8from sqlite_utils.cli import cli9if __name__ == '__main__':10 sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])11 sys.exit(cli())
Then I tested an import against a large CSV file like so:
1(venv) /tmp % time sqlite-utils insert pypy.db t /tmp/en.openfoodfacts.org.products.csv --csv2 [------------------------------------] 0%3 [###################################-] 99%4 12.67s user 2.53s system 92% cpu 16.514 total
I tried the same thing using regular Python sqlite-utils
too:
1~ % time sqlite-utils insert pydb t /tmp/en.openfoodfacts.org.products.csv --csv2 [------------------------------------] 0%3 [###################################-] 99%4 12.74s user 2.40s system 93% cpu 16.172 total
Surprisingly I didn’t get any meaningful difference in performance between the two. But at least I know how to run things using PyPy now.