Sort by number of JSON intersections#
This post on Reddit asked how to run a query that takes a list of items (in this case ingredients) as the input and returns all rows with at least one of those items in a JSON list, ordered by the most matches.
I jumped on this as a great opportunity to demonstrate Datasette Lite as a tool for answering SQL questions.
Creating an example database in Datasette Lite#
I started with the following SQL to create the demo table:
1create table recipes (id integer primary key, name text, ingredients text);2insert into recipes values (1, 'Cake', '["flour", "sugar", "eggs"]');3insert into recipes values (2, 'Pancakes', '["flour", "eggs", "butter"]');4insert into recipes values (3, 'Waffles', '["flour", "milk", "eggs"]');5insert into recipes values (4, 'Pizza', '["flour", "sugar", "eggs", "cheese"]');
I actually got GitHub Copilot to write most of that for me:
![In GitHub Copilot, I start with: create table recipes (id integer primary key, name text, ingredients text); I type "insert into" and it autocompletes to: insert into recipes values (1, 'Pizza', 'Tomato Sauce, Cheese, and Pizza'); I edit that to 'Cake' instead and it fills in "flour, sugar, eggs, butter' Then I edit that to be a JSON array instead It completes (with a few tweaks from me) with: insert into recipes values (2, 'Pancakes', '["flour", "sugar", "eggs", "butter"]'); insert into recipes values (3, 'Pizza', '["vegetables", "cheese", "tomatoes", "sauce"]'); insert into recipes values (4, 'Spaghetti', '["pasta", "tomatoes", "sauce"]');](https://user-images.githubusercontent.com/9599/185261271-cf8aff79-67dc-4359-89a2-5f8a857e944a.gif)
I saved that to a gist and opened it in Datasette Lite with this URL:
Solving the problem#
Here’s the SQL recipe I came up with to solve the question:
1select id, name, ingredients, (2 select json_group_array(value) from json_each(ingredients)3 where value in (select value from json_each(:p0))4) as matching_ingredients5from recipes6where json_array_length(matching_ingredients) > 07order by json_array_length(matching_ingredients) desc
Example of that query with an input for p0
of ["sugar", "cheese"]
, which returns:
id | name | ingredients | matching_ingredients |
---|---|---|---|
4 | Pizza | [“flour”, “sugar”, “eggs”, “cheese”] | [“sugar”,“cheese”] |
1 | Cake | [“flour”, “sugar”, “eggs”] | [“sugar”] |
The key trick here is the bit that creates that matching_ingredients
column, which uses a sub-select like this:
1select json_group_array(value) from json_each(ingredients)2where value in (select value from json_each(:p0))
json_group_array(value)
is an aggregation function that turns the results into a JSON array.
where value in (select value from json_each('["sugar", "cheese"]')
is the bit that calculates the intersection of the two JSON arrays.
Then at the end I use json_array_length()
to remove rows with no overlap and then sort with the most matching ingredients first.