FAQ | This is a LIVE service | Changelog

Cloud Run Service alerting

Summary

This issue proposes introducing a minimal set of core Cloud Run alerts for our Cloud Run services, aligned with our use of the gcp-deploy-boilerplate. The alerts outlined below are to be implemented in theory. As part of this work, we will also assess the practical feasibility of implementing each alert.

  • 5xx error rate
    • Metric: Use run.googleapis.com/request_count to calculate the ratio of 5xx repsonses.
    • Conditions:
      • Critical: >2% for 10 mins
  • P95 request latency
    • run.googleapis.com/request_latencies
    • Conditions:
      • Critical: p95 > 1000ms for 5 mins
  • CPU utilisation
    • Metric: run.googleapis.com/container/cpu/utilizations
    • Conditions:
      • Critical: P90 CPU > 95% for 15 minutes
  • Memory utilisation
    • Metric: run.googleapis.com/container/memory/utilizations
      • Conditions:
        • Critical: P90 Memory > 95% for 15 minutes
Edited by Ryan Kowalewski