Cloud Run Service alerting
Summary
This issue proposes introducing a minimal set of core Cloud Run alerts for our Cloud Run services, aligned with our use of the gcp-deploy-boilerplate. The alerts outlined below are to be implemented in theory. As part of this work, we will also assess the practical feasibility of implementing each alert.
-
5xx error rate
- Metric: Use
run.googleapis.com/request_countto calculate the ratio of 5xx repsonses. - Conditions:
- Critical: >2% for 10 mins
- Metric: Use
-
P95 request latency
run.googleapis.com/request_latencies- Conditions:
- Critical: p95 > 1000ms for 5 mins
-
CPU utilisation
- Metric:
run.googleapis.com/container/cpu/utilizations - Conditions:
- Critical: P90 CPU > 95% for 15 minutes
- Metric:
-
Memory utilisation
- Metric:
run.googleapis.com/container/memory/utilizations- Conditions:
- Critical: P90 Memory > 95% for 15 minutes
- Conditions:
- Metric:
Edited by Ryan Kowalewski