FAQ | This is a LIVE service | Changelog

Commit 37f05bf2 authored by Dr Rich Wareham's avatar Dr Rich Wareham
Browse files

make use of our common monitoring module

Rather than ship our own monitoring module, make use of the
gcp-site-monitoring module. This effectively makes this module require
0.13 so a major version bump is required.

Closes #13
parent 1db58428
Pipeline #58967 passed with stage
in 43 seconds
......@@ -18,6 +18,12 @@ editor access to the target project. You can use the ``gcloud`` command line
tool to set your personal credentials as application default credentials. See
the ``gcloud auth application-default`` command output for more information.
## Versioning
The `master` branch contains the tip of development and corresponds to the `v2`
branch. The `v1` branch will maintain source compatibility with the initial
release.
## Custom domain mapping
Setting the `dns_name` will create a domain mapping for the webapp. Before
......@@ -29,8 +35,16 @@ can be found in the DevOps division guidebook.
## Monitoring and Alerting
If the variable [alerting_email_address](variables.tf) is set, the module adds
basic uptime alerting via email for failing http polling. See [variables.tf](variables.tf)
for how to configure alerting and monitoring.
basic uptime *alerting* via email for failing http polling.
If the variable [disable_monitoring](variables.tf) is true, the module will
disable *monitoring*. This is different from disabling alerting; if no
alerting email addresses are provided, the uptime checks will still be
configured, there just won't be any alerts sent if they fail. Disabling
monitoring will also disable alerting as without any monitoring there is nothing
to alert(!)
See [variables.tf](variables.tf) for how to configure alerting and monitoring.
Note that the project containing resources to be monitored must be in a
Stackdriver monitoring workspace and this must be configured manually. At the
......
......@@ -6,4 +6,11 @@ locals {
# Should a DNS domain mapping be created?
domain_mapping_present = var.dns_name != ""
# Hosts to monitor. We use the automatic host from Cloud Run and any custom
# domain mapped host.
monitor_hosts = var.disable_monitoring ? [] : concat(
[trimsuffix(trimprefix(google_cloud_run_service.webapp.status[0].url, "https://"), "/")],
var.dns_name != "" ? [var.dns_name] : []
)
}
......@@ -179,14 +179,28 @@ resource "google_cloud_run_domain_mapping" "webapp" {
}
module "uptime_monitoring" {
source = "./modules/monitoring"
project = var.project
email_address = var.alerting_email_address
uptime_timeout = var.alerting_uptime_timeout
uptime_period = var.alerting_uptime_period
monitored_domain = var.dns_name
polling_path = var.monitoring_path
enabled = var.alerting_enabled
for_each = toset(local.monitor_hosts)
source = "git::https://gitlab.developers.cam.ac.uk/uis/devops/infra/terraform/gcp-site-monitoring.git?ref=initial-implementation"
host = each.value
project = var.project
alert_email_addresses = var.alerting_email_address != "" ? [var.alerting_email_address] : []
uptime_check = {
# Accept either e.g. "60s" or 60 for timeout and periods for compatibility
# with previous releases.
timeout = tonumber(trimsuffix(var.alerting_uptime_timeout, "s"))
period = tonumber(trimsuffix(var.alerting_uptime_period, "s"))
path = var.monitoring_path
alert_enabled = var.alerting_enabled
}
tls_check = {
alert_enabled = var.alerting_enabled
}
providers = {
google = google.stackdriver
......
# Basic email uptime alerting
This provides basic uptime alerting via email for failing http polling. See
[variables.tf](variables.tf) for how to configure this module.
Note that the project containing resources to be monitored must be in a
Stackdriver monitoring workspace and this must be configured manually. At the
time of writing there is no terraform support for this. This module will error
when applying if this is not so.
Stackdriver distinguishes between workspaces and projects within those
workspaces. Each workspace must have a host project and that project *must* be
the default project of the `google` provider used by this module. The project
which contains the resources being monitored should be specified via the
`project` variable.
If the workspace host project differs from the project which contains the
resources to be monitored, you can use a provider alias:
```tf
provider "google" {
project = "my-project"
}
provider "google" {
project = "stackdriver-host-project"
alias = "stackdriver"
}
module "uptime_monitoring" {
project = "my-project"
# ... other parameters ...
providers = {
google = google.stackdriver
}
}
```
locals {
# this is a hack to allow disabling everying. In tf 0.13 (in beta at the
# time of writing) count can be applied to the module inclusion phase so this
# won't be needed.
count = var.email_address == "" ? 0 : 1
}
resource "google_monitoring_uptime_check_config" "https" {
count = local.count
display_name = "https-uptime-check"
timeout = var.uptime_timeout
period = var.uptime_period
http_check {
path = var.polling_path
port = "443"
use_ssl = true
validate_ssl = true
}
monitored_resource {
type = "uptime_url"
labels = {
project_id = var.project
host = var.monitored_domain
}
}
# workaround - see https://github.com/terraform-providers/terraform-provider-google/issues/3133
lifecycle {
create_before_destroy = true
}
}
resource "google_monitoring_notification_channel" "notification_email" {
count = local.count
display_name = "Notifications Email"
type = "email"
labels = {
email_address = var.email_address
}
}
resource "google_monitoring_alert_policy" "uptime_alert" {
enabled = var.enabled
count = local.count
display_name = "HTTP uptime alert"
notification_channels = [google_monitoring_notification_channel.notification_email[count.index].id]
combiner = "OR"
conditions {
display_name = "http check failing for ${var.monitored_domain}${var.polling_path}"
condition_threshold {
filter = <<-EOT
metric.type="monitoring.googleapis.com/uptime_check/check_passed" AND
metric.label.check_id="${google_monitoring_uptime_check_config.https[count.index].uptime_check_id}" AND
resource.type="uptime_url"
EOT
duration = "60s"
comparison = "COMPARISON_GT"
threshold_value = "1"
trigger { count = "1" }
# I don't fully understand this stuff, but leaving this empty doesn't
# work; although it used to (either the API or the terrfaform provider
# has changed). This config was arrived at my following
# https://cloud.google.com/monitoring/uptime-checks via the dashboard,
# and then examining the differences that terraform wants to apply. It
# seems to work OK.
aggregations {
alignment_period = "120s"
group_by_fields = ["resource.*"]
cross_series_reducer = "REDUCE_COUNT_FALSE"
per_series_aligner = "ALIGN_NEXT_OLDER"
}
}
}
}
variable "email_address" {
default = ""
type = string
description = "Email address for alerts"
}
variable "monitored_domain" {
type = string
description = "domain component of url to be monitored"
}
variable "polling_path" {
type = string
default = "/"
description = "path component of url to be monitored"
}
variable "project" {
type = string
description = "Project being *monitored*. Resources are created in provider default project."
}
variable "uptime_timeout" {
type = string
default = "30s"
description = "timeout for http polling"
}
variable "uptime_period" {
type = string
default = "300s"
description = "Frequency of uptime checks"
}
variable "enabled" {
type = bool
default = true
description = "Whether the alerting policy is enabled"
}
......@@ -163,3 +163,15 @@ variable "template_annotations" {
template.
EOL
}
variable "disable_monitoring" {
default = false
description = <<-EOL
Optional. If true, do not create uptime checks. This is useful if, for
example, the service is configured to require authenticated invocations.
Note that this is different from not specifying an alerting email address.
If no alerting email address is specified the uptime checks are still
created, they just don't alert if they fail.
EOL
}
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment