Downloading every video for a TikTok account#

TikTok may or may not be banned in the USA within the next 24 hours or so. Here’s a pattern you can use to download all of the videos from a specific account.

Using yt-dlp directly#

After first publishing this TIL I found out you can point yt-dlp directly at an account page and it will download all videos for you:

1
yt-dlp 'https://www.tiktok.com/@username_goes_here' -o "downloads/%(title)s-%(id)s.%(ext)s"

The rest of this TIL describes a more complex method.

Scrape the list of video URLs#

I used a variant of my Twitter scraping trick. Start by loading up a profile page - like https://www.tiktok.com/@ilgallinaio_special - in Firefox or Chrome or Safari.

Open up the DevTools and paste in the following JavaScript:

1
window.videoUrls = new Set();
2

3
function collect() {
4
    Array.from(document.querySelectorAll('a[href*="/video/"]'), el => el.href).forEach(href => {
5
        window.videoUrls.add(href);
6
    })
7
};
8

9
setInterval(collect, 500);

This will scan the page every half a second looking for links to TikTok videos - links with /video/ in their URL - and add those to a growing set called videoUrls.

Now switch to the “oldest” sort direction and scroll down the page until you reach the bottom. TikTok implements infinite-ish scrolling so this may take a while for an account with a lot of videos.

Once you get to the bottom, copy out the collected list of URLs. In Firefox I used this command for that:

1
copy(Array.from(window.videoUrls))

That copied the array of URLs to my clipboard. I then pasted them into a file and saved it as videos.json - the file contents looked something like this (but a lot longer):

1
[
2
  "https://www.tiktok.com/@ilgallinaio_special/video/7204803049351695622",
3
  "https://www.tiktok.com/@ilgallinaio_special/video/7204877634189151493",
4
  "https://www.tiktok.com/@ilgallinaio_special/video/7205157890372537606",
5
  "https://www.tiktok.com/@ilgallinaio_special/video/7205189803074211077"
6
]

Download them all with yt-dlp#

The yt-dlp Python program can download from TikTok. I ran it against all of the URLs in my videos.json file like this:

1
mkdir -p downloads
2
jq -r '.[]' videos.json | while read url; do
3
    uvx yt-dlp -o "downloads/%(title)s-%(id)s.%(ext)s" "$url"
4
    if [[ $? -eq 0 ]]; then
5
        echo "Successfully downloaded: $url"
6
    else
7
        echo "Failed to download: $url"
8
    fi
9
    sleep 1
10
done

This creates a downloads/ folder containing files with names like this:

1
#perte -7204803049351695622.mp4
2
#perte -7204877634189151493.mp4
3
#perte -7205189803074211077.mp4
4
#perte i galli moroseta🐓🐓🌸🍾🍾💪😅-7205157890372537606.mp4

Bonus: running Whisper against them#

I did this against an account that wasn’t just dancing chickens and decided to use Whisper running on macOS via mlx-whisper to generate text files with transcripts, so I could search that content later on.

Here’s the recipe I used for that, powered by uv run:

1
for f in *.mp4; do [[ ! -f "${f:r}.txt" ]] && echo "Processing $f" && uv run --with mlx-whisper mlx_whisper "$f"; done

This can be run multiple times - it checks to see if a .txt file exists already and only executes against .mp4 files that have not yet been processed.

Extra bonus: adding a progress bar#

After I kicked this off against a larger account I realized a progress bar would be nice. I got ChatGPT o1 to write me this script:

1
#!/usr/bin/env python3
2

3
import sys
4
import time
5
import subprocess
6

7
def main():
8
    if len(sys.argv) < 3:
9
        print(f"Usage: {sys.argv[0]} <total> <shell_command>")
10
        sys.exit(1)
11

12
    total = int(sys.argv[1])
13
    # If your command may include spaces, you might need to do this:
14
    # shell_command = ' '.join(sys.argv[2:])
15
    # but for the simple example provided:
16
    shell_command = sys.argv[2]
17

18
    # -- Step 1: Get initial progress and record the time --
19
    try:
20
        initial_output = subprocess.check_output(shell_command, shell=True)
21
        done_initial = int(initial_output.strip())
22
    except Exception as e:
23
        print(f"Error running initial command: {shell_command}\n{e}")
24
        sys.exit(1)
25

26
    # Clamp in case the command returns something above the total or below zero
27
    if done_initial < 0:
28
        done_initial = 0
29
    if done_initial > total:
30
        done_initial = total
31

32
    time_initial = time.time()
33

34
    # Print one quick update before we start the loop
35
    print_progress(done_initial, total, 0, 0)
36

37
    # If we already reached (or exceeded) the total, exit immediately
38
    if done_initial >= total:
39
        print("\nDone!")
40
        sys.exit(0)
41

42
    # -- Step 2: Repeatedly poll the command to update progress --
43
    polling_interval = 1.0  # seconds between checks
44

45
    while True:
46
        time.sleep(polling_interval)
47

48
        # Fetch current progress
49
        try:
50
            output = subprocess.check_output(shell_command, shell=True)
51
            done = int(output.strip())
52
        except Exception as e:
53
            print(f"\nError running command: {shell_command}\n{e}")
54
            sys.exit(1)
55

56
        # Clamp done to never exceed total or go below 0
57
        if done < 0:
58
            done = 0
59
        if done > total:
60
            done = total
61

62
        # How much progress has been made since we started measuring?
63
        delta_done = done - done_initial
64
        delta_time = time.time() - time_initial
65

66
        # Print the progress bar
67
        print_progress(done, total, delta_done, delta_time)
68

69
        if done >= total:
70
            break
71

72
    print("\nDone!")
73

74

75
def print_progress(done, total, delta_done, delta_time):
76
    """
77
    Print a single-line progress bar with percentage and ETA (if possible).
78
    Overwrites the previous line via carriage return.
79
    """
80

81
    # Fraction complete
82
    fraction = done / total if total else 1.0
83

84
    # Build the bar
85
    bar_length = 50
86
    filled_length = int(bar_length * fraction)
87
    bar = "#" * filled_length + "-" * (bar_length - filled_length)
88

89
    # Compute ETA based only on new progress (delta_done)
90
    if delta_done > 0:
91
        time_per_item = delta_time / delta_done
92
        remaining = total - done
93
        eta_seconds = int(time_per_item * remaining)
94
        eta_string = format_eta(eta_seconds)
95
    else:
96
        # If no new items have completed since the script started, can't guess yet
97
        eta_string = "calculating..."
98

99
    progress_line = (
100
        f"\r[{bar}] {done}/{total} ({fraction*100:.1f}%) - ETA: {eta_string}"
101
    )
102
    print(progress_line, end='', flush=True)
103

104

105
def format_eta(seconds):
106
    """Convert number of seconds into a H:MM:SS or M:SS format string."""
107
    h = seconds // 3600
108
    m = (seconds % 3600) // 60
109
    s = seconds % 60
110

111
    if h > 0:
112
        return f"{h:d}:{m:02d}:{s:02d}"
113
    else:
114
        return f"{m:02d}:{s:02d}"
115

116

117
if __name__ == "__main__":
118
    main()

Which I can then run like this:

1
uv run progress.py 45 'ls *.mp4 | wc -l'

The 45 there is the expected number of downloads (found with jq length < videos.json). The ls *.mp4 | wc -l string is a command to run on each iteration to count how many items have been processed.

This command provides both a visible ASCII progress bar and an ETA prediction of when the program will finish, based on how many items have been processed and how quickly they appear to be running.