Paginating through the GitHub GraphQL API with Python#

(See also Building a self-updating profile README for GitHub on my blog)

For my auto-updating personal README I needed to fetch the latest release for every repository I have on GitHub. Since I have 316 public repos I wanted the most efficent way possible to do this. I decided to use the GitHub GraphQL API.

Their API allows you to fetch up to 100 repositories at once, and each one can return up to 100 releases. Since I only wanted the most recent release my query ended up looking like this:

1
query {
2
  viewer {
3
    repositories(first: 100, privacy: PUBLIC) {
4
      pageInfo {
5
        hasNextPage
6
        endCursor
7
      }
8
      nodes {
9
        name
10
        releases(last:1) {
11
          totalCount
12
          nodes {
13
            name
14
            publishedAt
15
            url
16
          }
17
        }
18
      }
19
    }
20
  }
21
}

This gives me back my 100 first repos, and for each one returns the most recent release (if a release exists).

Just one problem: I needed to paginate through all 316. The way you do this with the GitHub GraphQL API is using the after: argument and the endcursor returned from pageInfo. You can send after:null to get the first page, then after:TOKEN where TOKEN is the endCursor from the previous results.

My Python code ended up looking like this (using python-graphql-client):

1
from python_graphql_client import GraphqlClient
2

3
client = GraphqlClient(endpoint="https://api.github.com/graphql")
4

5
def make_query(after_cursor=None):
6
    return """
7
query {
8
  viewer {
9
    repositories(first: 100, privacy: PUBLIC, after:AFTER) {
10
      pageInfo {
11
        hasNextPage
12
        endCursor
13
      }
14
      nodes {
15
        name
16
        releases(last:1) {
17
          totalCount
18
          nodes {
19
            name
20
            publishedAt
21
            url
22
          }
23
        }
24
      }
25
    }
26
  }
27
}
28
""".replace(
29
        "AFTER", '"{}"'.format(after_cursor) if after_cursor else "null"
30
    )
31

32

33
def fetch_releases(oauth_token):
34
    repos = []
35
    releases = []
36
    repo_names = set()
37
    has_next_page = True
38
    after_cursor = None
39

40
    while has_next_page:
41
        data = client.execute(
42
            query=make_query(after_cursor),
43
            headers={"Authorization": "Bearer {}".format(oauth_token)},
44
        )
45
        print()
46
        print(json.dumps(data, indent=4))
47
        print()
48
        for repo in data["data"]["viewer"]["repositories"]["nodes"]:
49
            if repo["releases"]["totalCount"] and repo["name"] not in repo_names:
50
                repos.append(repo)
51
                repo_names.add(repo["name"])
52
                releases.append(
53
                    {
54
                        "repo": repo["name"],
55
                        "release": repo["releases"]["nodes"][0]["name"]
56
                        .replace(repo["name"], "")
57
                        .strip(),
58
                        "published_at": repo["releases"]["nodes"][0][
59
                            "publishedAt"
60
                        ].split("T")[0],
61
                        "url": repo["releases"]["nodes"][0]["url"],
62
                    }
63
                )
64
        has_next_page = data["data"]["viewer"]["repositories"]["pageInfo"][
65
            "hasNextPage"
66
        ]
67
        after_cursor = data["data"]["viewer"]["repositories"]["pageInfo"]["endCursor"]
68
    return releases

Full code here: https://github.com/simonw/simonw/blob/50d4188f9f067b68b2203540f1983750d51800db/build_readme.py