Using curl to run GraphQL queries from the command line
I wanted to run a query against the GitHub GraphQL API using curl
on the command line, while keeping the query itself as readable as possible. Here’s the recipe I came up with (tested in both bash
and zsh
), with TOKEN replaced by my GitHub API personal access token:
curl -s https://api.github.com/graphql -X POST \-H "Authorization: Bearer TOKEN" \-H "Content-Type: application/json" \-d "$(jq -c -n --arg query '{ search(type: REPOSITORY, query: "user:simonw topic:git-scraping", first: 100) { repositoryCount nodes { __typename ... on Repository { nameWithOwner description defaultBranchRef { name target { ... on Commit { committedDate url message } } } } } }}' '{"query":$query}')"
As you can see, the GraphQL query itself is embedded in plain text inside a complex set of escaping tricks.
Building a JSON document with jq
I needed to encode the query as part of a JSON document that looks like this:
{"query":"\n{\n search(type: REPOSITORY, query: \"user:simonw topic:git-scraping\", first: 100) {\n repositoryCount\n nodes {\n __typename\n ... on Repository {\n nameWithOwner\n description\n defaultBranchRef {\n name\n target {\n ... on Commit {\n committedDate\n url\n message\n }\n }\n }\n }\n }\n }\n}"}
The jq
recipe handles the construction of that document for me:
jq -c -n --arg query '{ search(type: REPOSITORY, query: "user:simonw topic:git-scraping", first: 100) { repositoryCount nodes { __typename ... on Repository { nameWithOwner description defaultBranchRef { name target { ... on Commit { committedDate url message } } } } } }}' '{"query":$query}'
jq -c
means “compact syntax” - so the JSON itself is produced as a single line.
The -n
option stands for --null-input
- which is described as:
Don´t read any input at all! Instead, the filter is run once using null as the input. This is useful when using
jq
as a simple calculator or to construct JSON data from scratch.
Then --arg query '...'
sets a variable within jq
to the string representing my GraphQL query.
Finally I evaluate the jq
expression '{"query":$query}'
which constructs the final document with my GraphQL query as the value for the "query"
key.
Passing that to curl with ”$()”
Having constructed the JSON document, I needed to pass it to the curl -d
option to submit it to the server.
The recipe for doing that is:
-d "$(jq -c -n --arg query ...)"
I tried doing this with -d $(jq ...)
first, and it didn’t work - because whitespace inside the substition was treated as separate tokens passed to curl
.
Adding the wrapping double quotes caused the substition result to be treated as a single value.
I was worried that double quotes within the string itself would break out of the pattern, but this Stackoverflow answer reassured me otherwise:
Once one is inside
$(...)
, quoting starts all over from scratch.In other words,
"..."
and$(...)
can nest within each other. Command substitution,$(...)
, can contain one or more complete double-quoted strings.
The resulting (but truncated) JSON from the GraphQL query looks like this:
{ "data": { "search": { "repositoryCount": 22, "nodes": [ { "__typename": "Repository", "nameWithOwner": "simonw/csv-diff", "description": "Python CLI tool and library for diffing CSV and JSON files", "defaultBranchRef": { "name": "main", "target": { "committedDate": "2021-02-23T02:53:11Z", "url": "https://github.com/simonw/csv-diff/commit/33e0a5918283c02a339a1fb507fc7a9cda89a198", "message": "Handle missing JSON keys, refs #13" } } },
Combining it with sqlite-utils insert
My end goal was to create a SQLite database with a record for each of my GitHub repositories that were tagged git-scraping
that included the date of their most recent commit. Here’s how I did that:
curl https://api.github.com/graphql -X POST \-H "Authorization: Bearer ..." \-H "Content-Type: application/json" \-d "$(jq -c -n --arg query '{ search(type: REPOSITORY, query: "user:simonw topic:git-scraping", first: 100) { repositoryCount nodes { __typename ... on Repository { nameWithOwner description defaultBranchRef { name target { ... on Commit { committedDate url message } } } } } }}' '{"query":$query}')" \ | jq .data.search.nodes | sqlite-utils insert /tmp/github.db repos - --flatten
The line doing the work at the end is:
| jq .data.search.nodes | sqlite-utils insert /tmp/github.db repos - --flatten
This uses jq
to pull out the {"data": {"search": {"nodes": [...]
array from the returned JSON, then pipes that into sqlite-utils insert.
This line does the rest:
sqlite-utils insert /tmp/github.db repos - --flatten
That reads from standard input (-
) and creates a repos
table in the new github.db
SQLite file.
The --flatten
option at the end ensures that nested fields such as {"defaultBranchRef": {"target": {"committedDate": ...
are flattened to columns with names like defaultBranchRef_target_committedDate
.
The final table schema looks like this:
% sqlite-utils schema /tmp/github.dbCREATE TABLE [repos] ( [__typename] TEXT, [nameWithOwner] TEXT, [description] TEXT, [defaultBranchRef_name] TEXT, [defaultBranchRef_target_committedDate] TEXT, [defaultBranchRef_target_url] TEXT, [defaultBranchRef_target_message] TEXT);