Tailing Google Cloud Run request logs and importing them into SQLite
The gcloud CLI tool has the alpha ability to tail log files - but it’s a bit of a pain to setup.
You have to install two extras for it. First, this:
gcloud alpha logging tailThat installs the functionality, but as the documentation will tell you:
To use
gcloud alpha logging tail, you need to have Python 3 and thegrpcioPython package installed.
Assuming you have Python 3, the problem you have to solve is which Python is the gcloud tool using to run. After digging around in the source code using cat $(which gcloud) I spotted the following:
CLOUDSDK_PYTHON=$(order_python python3 python2 python2.7 python)So it looks like (on macOS at least) it prefers to use the python3 binary if it can find it.
So this works to install grpcio somewhere it can see it:
python3 -m pip install grpcioHaving done that, you can start running commands. gcloud logging logs list shows a list of logs:
~ % gcloud logging logs listNAMEprojects/datasette-222320/logs/cloudaudit.googleapis.com%2Factivityprojects/datasette-222320/logs/cloudaudit.googleapis.com%2Fdata_accessprojects/datasette-222320/logs/cloudaudit.googleapis.com%2Fsystem_eventprojects/datasette-222320/logs/cloudbuildprojects/datasette-222320/logs/clouderrorreporting.googleapis.com%2Finsightsprojects/datasette-222320/logs/cloudtrace.googleapis.com%2FTraceLatencyShiftDetectedprojects/datasette-222320/logs/run.googleapis.com%2Frequestsprojects/datasette-222320/logs/run.googleapis.com%2Fstderrprojects/datasette-222320/logs/run.googleapis.com%2Fstdoutprojects/datasette-222320/logs/run.googleapis.com%2Fvarlog%2FsystemThen you can use gcloud alpha logging tail projects/datasette-222320/logs/run.googleapis.com%2Frequests to start logging. Only you also need a CLOUDSDK_PYTHON_SITEPACKAGES=1 environment variable so that gcloud knows to look for the grpcio dependency.
CLOUDSDK_PYTHON_SITEPACKAGES=1 \ gcloud alpha logging tail projects/datasette-222320/logs/run.googleapis.com%2FrequestsThe default format is verbose YAML. A log entry looks like this:
httpRequest: latency: 0.123684963s remoteIp: 66.249.69.240 requestMethod: GET requestSize: '510' requestUrl: https://www.niche-museums.com/browse/museums.json?_facet_size=max&country=United+States&_facet=osm_city&_facet=updated&_facet=osm_suburb&_facet=osm_footway&osm_city=Santa+Cruz responseSize: '6403' serverIp: 142.250.125.121 status: 200 userAgent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)insertId: 611171fe000a38a469d59595labels: instanceId: 00bf4bf02dab164592dbbb9220b56c3ce64cb0f1c1f37812d1d61e851a931e9964ba539c2ede42886773c82662cc28aa858749d2697f537ff7a61e7b service: niche-museumslogName: projects/datasette-222320/logs/run.googleapis.com%2FrequestsreceiveTimestamp: '2021-08-09T18:20:46.935658405Z'resource: labels: configuration_name: niche-museums location: us-central1 project_id: datasette-222320 revision_name: niche-museums-00039-sur service_name: niche-museums type: cloud_run_revisionseverity: INFOtimestamp: '2021-08-09T18:20:46.669860Z'trace: projects/datasette-222320/traces/306a0d6e7e055ba66172003a74c926c2I decided to import into a SQLite database so I could use Datasette to analyze the log files (hooray for facets).
Adding --format json switches the output to JSON - but it’s a pretty-printed array of JSON objects, something like this:
[ { "httpRequest": { "latency": "0.112114537s", "remoteIp": "40.77.167.88", "requestMethod": "GET", "requestSize": "534", "requestUrl": "https://datasette.io/content/repos?forks=0&_facet=homepage&_facet=size&_facet=open_issues&open_issues=3&size=564&_sort=readme_html", "responseSize": "72757", "serverIp": "216.239.38.21", "status": 200, "userAgent": "Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)" }, "insertId": "6111722f000b5b4c4d4071e2", "labels": { "instanceId": "00bf4bf02d1d7fe4402c3aff8a34688d9a910e6ee6d2545ceebc1edefb99461481e6d9f9ae8de4e907e3d18b98ea9c7f57b2abb527c8857d9163ed193db766c349a1ee", "service": "datasette-io" }, "logName": "projects/datasette-222320/logs/run.googleapis.com%2Frequests", "receiveTimestamp": "2021-08-09T18:21:36.061693305Z", "resource": { "labels": { "configuration_name": "datasette-io", "location": "us-central1", "project_id": "datasette-222320", "revision_name": "datasette-io-00416-coy", "service_name": "datasette-io" }, "type": "cloud_run_revision" }, "severity": "INFO", "timestamp": "2021-08-09T18:21:35.744268Z", "trace": "projects/datasette-222320/traces/016d640caf845fbf8709486bc8dff9c7" }]I want to stream the logs into sqlite-utils using newline-delimited JSON since that can insert while the data is still being tailed.
I ended up using two new jq recipes:
cat example.json | jq -cn --stream 'fromstream(1|truncate_stream(inputs))'This turns an [{"array": "of objects"}, {"like": "this one"}] into a stream of newline-delimited objects. I found the recipe here - I don’t understand it.
As you can see above, the objects are nested. I want them as flat objects so that sqlite-utils insert will create a separate column for each nested value. I used this recipe for that.
The end result was this:
CLOUDSDK_PYTHON_SITEPACKAGES=1 gcloud alpha logging tail \ projects/datasette-222320/logs/run.googleapis.com%2Frequests \ --format json \| jq -cn --stream 'fromstream(1|truncate_stream(inputs))' \| jq -c '[leaf_paths as $path | { "key": $path | join("_"), "value": getpath($path)}] | from_entries' \| sqlite-utils insert /tmp/logs.db logs - --nl --alter --batch-size 1That last line inserts the data into the /tmp/logs.db database file. --nl means “expect newline-delimited JSON”, --alter means “add new columns if they are missing”, --batch-size 1 means “commit after every record” (so I can see them in Datasette while they are streaming in).
UPDATE: sqlite-utils 3.15 added a --flatten option which you can use instead of that second jq recipe, so this should work instead:
CLOUDSDK_PYTHON_SITEPACKAGES=1 gcloud alpha logging tail \ projects/datasette-222320/logs/run.googleapis.com%2Frequests \ --format json \| jq -cn --stream 'fromstream(1|truncate_stream(inputs))' \| sqlite-utils insert /tmp/logs.db logs - --nl --alter --batch-size 1 --flattenThe resulting schema looks like this (via sqlite-utils schema /tmp/logs.db):
CREATE TABLE [logs] ( [httpRequest_latency] TEXT, [httpRequest_remoteIp] TEXT, [httpRequest_requestMethod] TEXT, [httpRequest_requestSize] TEXT, [httpRequest_requestUrl] TEXT, [httpRequest_responseSize] TEXT, [httpRequest_serverIp] TEXT, [httpRequest_status] INTEGER, [httpRequest_userAgent] TEXT, [insertId] TEXT, [labels_instanceId] TEXT, [labels_service] TEXT, [logName] TEXT, [receiveTimestamp] TEXT, [resource_labels_configuration_name] TEXT, [resource_labels_location] TEXT, [resource_labels_project_id] TEXT, [resource_labels_revision_name] TEXT, [resource_labels_service_name] TEXT, [resource_type] TEXT, [severity] TEXT, [timestamp] TEXT, [trace] TEXT, [httpRequest_referer] TEXT);Then I ran datasette /tmp/logs.db to start exploring the logs. Faceting by resource_labels_service_name was particularly useful.

The httpRequest_latency column contains text data that looks like 0.012572683s - thankfully if you cast it to a float the trailing s will be ignored. Here’s an example query showing the services with the highest average latency:
select resource_labels_service_name, avg(cast(httpRequest_latency as float)) as avg_latency, count(*)from logsgroup by resource_labels_service_nameorder by avg_latency descUsing the Logs explorer
Alternatively, you can use the Google Cloud logs explorer! It has pretty decent faceted search built in.
Here’s a query showing results from that log file:
resource.type="cloud_run_revision"log_name="projects/datasette-222320/logs/run.googleapis.com%2Frequests"Run that at https://console.cloud.google.com/logs/query - or here’s a link I can use to execute that directly (for the last 7 days): https://console.cloud.google.com/logs/query;query=resource.type%3D%22cloud_run_revision%22%0Alog_name%3D%22projects%2Fdatasette-222320%2Flogs%2Frun.googleapis.com%252Frequests%22;timeRange=P7D;?project=datasette-222320