Allow for uptime check not to run for up to 120s, and explicitly monitor for uptime check failure
Allow for up to 120s for the uptime check to run - and explicitly look for uptime check failing, so we can alert early if a failure occurs.
Currently deployed to the identity project (staging) using https://gitlab.developers.cam.ac.uk/uis/devops/iam/deploy-identity/-/commit/b22148ed3cd88914f8392d877b51b75ae097657f which enables the changes made here.
To test I undeployed the Function which proxies the card api monitoring request, which resulted in the below:
The top metric - number of successful uptime checks - falls to 0, and therefore below threshold. The bottom metric - number of failing uptime checks - rises above the threshold of 0, and therefore the alert fires, with emails just sent to wgd23 to stop this testing worrying the team.
Note after merge (and rebasing branch v1
on master), we should redeploy deploy-identity from master on staging and production - to remove the test deployment from staging, and deploy these changes to production.
Closes #1 (closed)