229 words
1 minute
Using recursive CTEs to explore hierarchical Twitter threads
Anubhav Gain
2024-01-12

Using recursive CTEs to explore hierarchical Twitter threads#

This TIL adapted from a Gist I put together in 2019, before I started tracking TILs here.

My twitter-to-sqlite tool produced a SQLite table with an in_reply_to_status column that referenced another tweet ID, for recording reply-to conversations.

I wanted to find the “deepest” tweets in my database - the tweets at the end of the longest reply-to thread.

I started by adapting this recipe by Robin Houston. Here’s the query I came up with:

with recursive thread as (
select id, in_reply_to_status_id, 0 as depth
from tweets
where in_reply_to_status_id is null
union
select tweets.id, tweets.in_reply_to_status_id, 1 + thread.depth as depth
from thread join tweets on tweets.in_reply_to_status_id = thread.id)
select * from thread order by depth desc

This uses a recursive CTE to sythensize a thread table.

The result I got looked like this (truncated):

idin_reply_to_status_iddepth
1576674019239407616157667316348782182463
1576673163487821824157667286677017804862
1574621292988440580157437978257353113661
1574845776152432649157484567208737587361
1574846026602713104157484567208737587361
1574847148218322954157484567208737587361
1574848163017547777157484567208737587361
1576672866770178048157484567208737587361
1574338300923777024157426871321321062460

Sure enough, tweet 1576674019239407616 is a reply to a VERY long Twitter thread I had created about Stable Diffusion.

Matthew Somerville suggested the following improvement, which returns the full path of tweet IDs leading to that tweet:

with recursive thread as (
select id, in_reply_to_status_id, 0 as depth, id as ids
from tweets
where in_reply_to_status_id is null
union
select tweets.id, tweets.in_reply_to_status_id, 1 + thread.depth as depth, thread.ids || ',' || tweets.id as ids
from thread join tweets on tweets.in_reply_to_status_id = thread.id)
select * from thread where depth > 1 order by depth asc

The results look like this:

idin_reply_to_status_iddepthids
4609905293460887139824608471362,4608871398,4609905293
2756614208727564750598227563022963,27564750598,27566142087
2839272749828062128369228048800241,28062128369,28392727498
Using recursive CTEs to explore hierarchical Twitter threads
https://mranv.pages.dev/posts/using-recursive-ctes-to-explore-hierarchical-twitter-threads/
Author
Anubhav Gain
Published at
2024-01-12
License
CC BY-NC-SA 4.0