Sort by number of JSON intersections#

This post on Reddit asked how to run a query that takes a list of items (in this case ingredients) as the input and returns all rows with at least one of those items in a JSON list, ordered by the most matches.

I jumped on this as a great opportunity to demonstrate Datasette Lite as a tool for answering SQL questions.

Creating an example database in Datasette Lite#

I started with the following SQL to create the demo table:

1
create table recipes (id integer primary key, name text, ingredients text);
2
insert into recipes values (1, 'Cake', '["flour", "sugar", "eggs"]');
3
insert into recipes values (2, 'Pancakes', '["flour", "eggs", "butter"]');
4
insert into recipes values (3, 'Waffles', '["flour", "milk", "eggs"]');
5
insert into recipes values (4, 'Pizza', '["flour", "sugar", "eggs", "cheese"]');

I actually got GitHub Copilot to write most of that for me:

In GitHub Copilot, I start with: create table recipes (id integer primary key, name text, ingredients text); I type "insert into" and it autocompletes to: insert into recipes values (1, 'Pizza', 'Tomato Sauce, Cheese, and Pizza'); I edit that to 'Cake' instead and it fills in "flour, sugar, eggs, butter' Then I edit that to be a JSON array instead It completes (with a few tweaks from me) with: insert into recipes values (2, 'Pancakes', '["flour", "sugar", "eggs", "butter"]'); insert into recipes values (3, 'Pizza', '["vegetables", "cheese", "tomatoes", "sauce"]'); insert into recipes values (4, 'Spaghetti', '["pasta", "tomatoes", "sauce"]');

I saved that to a gist and opened it in Datasette Lite with this URL:

https://lite.datasette.io/?sql=https://gist.githubusercontent.com/simonw/1f8a91123ccefd8844187225b1832d7a/raw/5069075b86aa79358fbab3d4482d1d269077d632/recipes.sql

Solving the problem#

Here’s the SQL recipe I came up with to solve the question:

1
select id, name, ingredients, (
2
  select json_group_array(value) from json_each(ingredients)
3
  where value in (select value from json_each(:p0))
4
) as matching_ingredients
5
from recipes
6
where json_array_length(matching_ingredients) > 0
7
order by json_array_length(matching_ingredients) desc

Example of that query with an input for p0 of ["sugar", "cheese"], which returns:

id	name	ingredients	matching_ingredients
4	Pizza	[“flour”, “sugar”, “eggs”, “cheese”]	[“sugar”,“cheese”]
1	Cake	[“flour”, “sugar”, “eggs”]	[“sugar”]

The key trick here is the bit that creates that matching_ingredients column, which uses a sub-select like this:

1
select json_group_array(value) from json_each(ingredients)
2
where value in (select value from json_each(:p0))

json_group_array(value) is an aggregation function that turns the results into a JSON array.

where value in (select value from json_each('["sugar", "cheese"]') is the bit that calculates the intersection of the two JSON arrays.

Then at the end I use json_array_length() to remove rows with no overlap and then sort with the most matching ingredients first.