Document agreed approach for backup testing
In a meeting with @av603, @mk2155, @rc118, @rjg21, @rk725 and myself we discussed a common approach to backup testing (https://gitlab.developers.cam.ac.uk/uis/devops/tools/meta/-/issues/2).
There were many product-specific considerations but we agreed on a minimally viable answer to the NFR:
- Backups are tested weekly by manually restoring the Google Cloud SQL instance in staging and observing and regressions in the staging instance.
- Disaster recovery from a clean deployment is scheduled quarterly. The process for DR will be product specific and so the divisional recommendation is just that it be scheduled.
- Performance of these tasks is to be tracked with due-date issues in GitLab.
Individual products may experiment with automating restores from Google managed backups via CI/Cloud Scheduler jobs but we'll wait for some operational experience before recommending this.
Moving forward we want to have an opt-in service which will allow for automating the restore from SQL dumps into Cloud SQL instance databases. (Scheduled restores from Google backups can be entirely managed within products.) This will form part of the wider work around refreshing SQL backups tracked in https://gitlab.developers.cam.ac.uk/uis/devops/infra/sql-backup/-/issues/13.
In the first instance, document the minimally viable answer outlined above as a recommended approach.