520 words
3 minutes
Returning related rows in a single SQL query using JSON

Returning related rows in a single SQL query using JSON#

When building database-backed applications you’ll often find yourself wanting to return a row from the database along with its related rows.

A few examples:

  • Retrieving a list of congressional legislators and their terms, following a foreign key relationship
  • Return blog entries and their tags in one go, via a many-to-many table

You can do this in SQLite using the json_group_array() aggregation function. A couple of examples.

Legislators and their terms, via a foreign key#

Simplified schema for this database:

CREATE TABLE [legislators] (
[id] TEXT PRIMARY KEY,
[name] TEXT,
[bio_birthday] TEXT
);
CREATE TABLE [legislator_terms] (
[legislator_id] TEXT REFERENCES [legislators]([id]),
[type] TEXT,
[state] TEXT,
[start] TEXT,
[end] TEXT,
[party] TEXT
);

Here’s a query that returns each legislator along with a JSON array of their terms:

select
legislators.id,
legislators.name,
json_group_array(json_object(
'type', legislator_terms.type,
'state', legislator_terms.state,
'start', legislator_terms.start,
'end', legislator_terms.end,
'party', legislator_terms.party
)) as terms,
count(*) as num_terms
from
legislators join legislator_terms on legislator_terms.legislator_id = legislators.id
group by legislators.id
order by
id
limit
10

And the result:

Screenshot of a query result. There is a terms column containing a JSON list of terms.

Note that this query does group by legislators.id which is allowed in SQLite but may not work in other databases, which might require group by legislators.id, legislators.name instead.

Tags on blog entries, via a many-to-many table#

Simplified schema:

CREATE TABLE [blog_entry] (
[id] INTEGER PRIMARY KEY,
[title] TEXT
);
CREATE TABLE [blog_tag] (
[id] INTEGER PRIMARY KEY,
[tag] TEXT
);
CREATE TABLE [blog_entry_tags] (
[id] INTEGER PRIMARY KEY,
[entry_id] INTEGER,
[tag_id] INTEGER,
FOREIGN KEY([entry_id]) REFERENCES [blog_entry]([id]),
FOREIGN KEY([tag_id]) REFERENCES [blog_tag]([id])
);

Query to retrieve entries with their tags:

select
blog_entry.id,
blog_entry.title,
json_group_array(json_object('tag', blog_tag.tag)) as tags
from
blog_entry
join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
join blog_tag on blog_tag.id = blog_entry_tags.tag_id
group by
blog_entry.id
order by
blog_entry.id desc

Result:

idtitletags
8191I don’t know how to solve prompt injection[{“tag”:“ai”},{“tag”:“security”},{“tag”:“openai”}]
8190Weeknotes: Datasette Lite, s3-credentials, shot-scraper, datasette-edit-templates and more[{“tag”:“shotscraper”},{“tag”:“datasette”},{“tag”:“plugins”},{“tag”:“datasettelite”},{“tag”:“projects”},{“tag”:“s3credentials”},{“tag”:“weeknotes”}]
8189Prompt injection attacks against GPT-3[{“tag”:“ai”},{“tag”:“gpt3”},{“tag”:“security”},{“tag”:“openai”}]

There’s a subtle bug in the above: if an entry has no tags at all it will be excluded from the query results entirely.

You can fix that using left joins like this:

select
blog_entry.id,
blog_entry.title,
json_group_array(json_object('tag', blog_tag.tag)) as tags
from
blog_entry
left join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
left join blog_tag on blog_tag.id = blog_entry_tags.tag_id
where blog_entry.id < 4
group by
blog_entry.id
order by
blog_entry.id desc

This almost works, but it outputs the following returning {"tag": null} for entries with no tags:

idtitletags
3Todo list[{“tag”}]
2Blogging aint easy[{“tag”}]
1WaSP Phase II[{“tag”}]

David Fetter showed me the solution:

select
blog_entry.id,
blog_entry.title,
json_group_array(
json_object('tag', blog_tag.tag)
) filter (
where
blog_tag.tag is not null
) as tags
from
blog_entry
left join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
left join blog_tag on blog_tag.id = blog_entry_tags.tag_id
group by
blog_entry.id
order by
blog_entry.id

That extra filter on the aggregation does the trick!

Other databases#

Other databases are capable of the same thing, but using different functions. PostgreSQL has json_agg() for example, which is also available in Django as JSONBAgg.

Here’s an equivalent query in PostgreSQL syntax:

select
blog_entry.id,
title,
slug,
created,
coalesce(json_agg(json_build_object(blog_tag.id, blog_tag.tag)) filter (
where
blog_tag.tag is not null
), json_build_array()) as tags
from
blog_entry
left join blog_entry_tags on blog_entry.id = blog_entry_tags.entry_id
left join blog_tag on blog_entry_tags.tag_id = blog_tag.id
group by
blog_entry.id
order by
blog_entry.id

See that running here in django-sql-dashboard.

Returning related rows in a single SQL query using JSON
https://mranv.pages.dev/posts/returning-related-rows-in-a-single-sql-query-using-json/
Author
Anubhav Gain
Published at
2024-06-02
License
CC BY-NC-SA 4.0