959 words
5 minutes
Downloading every video for a TikTok account

Downloading every video for a TikTok account#

TikTok may or may not be banned in the USA within the next 24 hours or so. Here’s a pattern you can use to download all of the videos from a specific account.

Using yt-dlp directly#

After first publishing this TIL I found out you can point yt-dlp directly at an account page and it will download all videos for you:

yt-dlp 'https://www.tiktok.com/@username_goes_here' -o "downloads/%(title)s-%(id)s.%(ext)s"

The rest of this TIL describes a more complex method.

Scrape the list of video URLs#

I used a variant of my Twitter scraping trick. Start by loading up a profile page - like https://www.tiktok.com/@ilgallinaio_special - in Firefox or Chrome or Safari.

Open up the DevTools and paste in the following JavaScript:

window.videoUrls = new Set();
function collect() {
Array.from(document.querySelectorAll('a[href*="/video/"]'), el => el.href).forEach(href => {
window.videoUrls.add(href);
})
};
setInterval(collect, 500);

This will scan the page every half a second looking for links to TikTok videos - links with /video/ in their URL - and add those to a growing set called videoUrls.

Now switch to the “oldest” sort direction and scroll down the page until you reach the bottom. TikTok implements infinite-ish scrolling so this may take a while for an account with a lot of videos.

Once you get to the bottom, copy out the collected list of URLs. In Firefox I used this command for that:

copy(Array.from(window.videoUrls))

That copied the array of URLs to my clipboard. I then pasted them into a file and saved it as videos.json - the file contents looked something like this (but a lot longer):

[
"https://www.tiktok.com/@ilgallinaio_special/video/7204803049351695622",
"https://www.tiktok.com/@ilgallinaio_special/video/7204877634189151493",
"https://www.tiktok.com/@ilgallinaio_special/video/7205157890372537606",
"https://www.tiktok.com/@ilgallinaio_special/video/7205189803074211077"
]

Download them all with yt-dlp#

The yt-dlp Python program can download from TikTok. I ran it against all of the URLs in my videos.json file like this:

Terminal window
mkdir -p downloads
jq -r '.[]' videos.json | while read url; do
uvx yt-dlp -o "downloads/%(title)s-%(id)s.%(ext)s" "$url"
if [[ $? -eq 0 ]]; then
echo "Successfully downloaded: $url"
else
echo "Failed to download: $url"
fi
sleep 1
done

This creates a downloads/ folder containing files with names like this:

#perte -7204803049351695622.mp4
#perte -7204877634189151493.mp4
#perte -7205189803074211077.mp4
#perte i galli moroseta🐓🐓🌸🍾🍾💪😅-7205157890372537606.mp4

Bonus: running Whisper against them#

I did this against an account that wasn’t just dancing chickens and decided to use Whisper running on macOS via mlx-whisper to generate text files with transcripts, so I could search that content later on.

Here’s the recipe I used for that, powered by uv run:

Terminal window
for f in *.mp4; do [[ ! -f "${f:r}.txt" ]] && echo "Processing $f" && uv run --with mlx-whisper mlx_whisper "$f"; done

This can be run multiple times - it checks to see if a .txt file exists already and only executes against .mp4 files that have not yet been processed.

Extra bonus: adding a progress bar#

After I kicked this off against a larger account I realized a progress bar would be nice. I got ChatGPT o1 to write me this script:

#!/usr/bin/env python3
import sys
import time
import subprocess
def main():
if len(sys.argv) < 3:
print(f"Usage: {sys.argv[0]} <total> <shell_command>")
sys.exit(1)
total = int(sys.argv[1])
# If your command may include spaces, you might need to do this:
# shell_command = ' '.join(sys.argv[2:])
# but for the simple example provided:
shell_command = sys.argv[2]
# -- Step 1: Get initial progress and record the time --
try:
initial_output = subprocess.check_output(shell_command, shell=True)
done_initial = int(initial_output.strip())
except Exception as e:
print(f"Error running initial command: {shell_command}\n{e}")
sys.exit(1)
# Clamp in case the command returns something above the total or below zero
if done_initial < 0:
done_initial = 0
if done_initial > total:
done_initial = total
time_initial = time.time()
# Print one quick update before we start the loop
print_progress(done_initial, total, 0, 0)
# If we already reached (or exceeded) the total, exit immediately
if done_initial >= total:
print("\nDone!")
sys.exit(0)
# -- Step 2: Repeatedly poll the command to update progress --
polling_interval = 1.0 # seconds between checks
while True:
time.sleep(polling_interval)
# Fetch current progress
try:
output = subprocess.check_output(shell_command, shell=True)
done = int(output.strip())
except Exception as e:
print(f"\nError running command: {shell_command}\n{e}")
sys.exit(1)
# Clamp done to never exceed total or go below 0
if done < 0:
done = 0
if done > total:
done = total
# How much progress has been made since we started measuring?
delta_done = done - done_initial
delta_time = time.time() - time_initial
# Print the progress bar
print_progress(done, total, delta_done, delta_time)
if done >= total:
break
print("\nDone!")
def print_progress(done, total, delta_done, delta_time):
"""
Print a single-line progress bar with percentage and ETA (if possible).
Overwrites the previous line via carriage return.
"""
# Fraction complete
fraction = done / total if total else 1.0
# Build the bar
bar_length = 50
filled_length = int(bar_length * fraction)
bar = "#" * filled_length + "-" * (bar_length - filled_length)
# Compute ETA based only on new progress (delta_done)
if delta_done > 0:
time_per_item = delta_time / delta_done
remaining = total - done
eta_seconds = int(time_per_item * remaining)
eta_string = format_eta(eta_seconds)
else:
# If no new items have completed since the script started, can't guess yet
eta_string = "calculating..."
progress_line = (
f"\r[{bar}] {done}/{total} ({fraction*100:.1f}%) - ETA: {eta_string}"
)
print(progress_line, end='', flush=True)
def format_eta(seconds):
"""Convert number of seconds into a H:MM:SS or M:SS format string."""
h = seconds // 3600
m = (seconds % 3600) // 60
s = seconds % 60
if h > 0:
return f"{h:d}:{m:02d}:{s:02d}"
else:
return f"{m:02d}:{s:02d}"
if __name__ == "__main__":
main()

Which I can then run like this:

Terminal window
uv run progress.py 45 'ls *.mp4 | wc -l'

The 45 there is the expected number of downloads (found with jq length < videos.json). The ls *.mp4 | wc -l string is a command to run on each iteration to count how many items have been processed.

This command provides both a visible ASCII progress bar and an ETA prediction of when the program will finish, based on how many items have been processed and how quickly they appear to be running.

Downloading every video for a TikTok account
https://mranv.pages.dev/posts/downloading-every-video-for-a-tiktok-account/
Author
Anubhav Gain
Published at
2024-06-11
License
CC BY-NC-SA 4.0