- Jan 05, 2023
-
-
Dr Catherine Pitt authored
The incremental backups on some cluster head nodes are growing quite large, and most of the churn is the uncompressed MySQL dumpfile which changes with every backup and can be over 1GB. This commit compresses that data, which reduces the size of the file by 90% on at least one machine.
-
- Dec 19, 2022
-
-
Dr Adam Thorn authored
This can happen for a number of reasons: - child ZFS has a quota bigger than its parent - we have over-provisioned such that the sum of quotas is bigger than the disk - an individual ZFS has a quota bigger than the disk
-
Dr Adam Thorn authored
..and convert such things to human-friendly versions if required. This is to facilitate extra checks where we want to make numeric calculations involving quotas and other similar properties.
-
- Oct 31, 2022
-
-
Dr Adam Thorn authored
-
- Oct 26, 2022
-
-
Dr Adam Thorn authored
-
- May 25, 2022
-
-
Dr Adam Thorn authored
As of b4b89ed3 we need the prepare script to exit zero on success. We only use the stdout from this command, not the return code, for determining if the target is a xen VM.
-
- May 24, 2022
-
-
Dr Adam Thorn authored
This has been silently masking failures, which is a Bad Thing. The only errors I've spotted so far are ones hopefully fixed by ad91da90, which has the effect that we've been backing up more things than we needed (which is better than the opposite possibility, at least!)
-
- Apr 25, 2022
-
-
Dr Adam Thorn authored
Now that we use pygrub, these dirs are populated by quite a lot of files that we don't want to back up but are dynamically built by the kernel/related packages so do not make it into the the prepared "excludes" file
-
- Mar 10, 2022
-
-
Dr Adam Thorn authored
i.e. we can now simply delete from host where hostname='example.ch.private.cam.ac.uk'; without having to chase the foreign keys. I've made the equivalent change on our live backup servers with an ad hoc script.
-
- Mar 09, 2022
-
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
-
- Dec 20, 2021
-
-
Dr Adam Thorn authored
-
- Nov 17, 2021
-
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
https://tickets.ch.cam.ac.uk/rt/Ticket/Display.html?id=211460 e.g. RT 211460, 211465. spri-musuem-rt-2025 had been set up with an adhoc script which was not properly tidying up after itself due to bailing early on a "set -e" error.
-
- Jul 14, 2021
-
-
Dr Adam Thorn authored
This will let us use zfs_target as the name of a subtest which in turn means we would be able to separately log and graph multiple backup targets associated with a single host. This change does not affect the current parsing performed when we input data into postgres: it uses non-anchored regexps to identify SpaceUsed etc so prepending extra text won't change anything
-
- Jul 08, 2021
-
-
Dr Adam Thorn authored
We don't always need the role data, if the presumption is that we'll be doing a pg_restore in conjunction with an ansible role which creates all required roles. But, having a copy of the role data will never hurt! It also gives us a straightforward way of restoring a database to a standalone postgres instance without having to have provisioned a dedicated VM with the relevant ansible roles.
-
Dr Adam Thorn authored
At present we use myriad one-off per host scripts to do a pg_dump, and they all do (or probably should do) the same thing. In combination with setting options in the host's backup config file, I think this single script covers all our routine pg backups.
-
Dr Adam Thorn authored
we were just passing the hostname. Adding extra args should not impact any existing script, but will let us write better/ more maintainable/deduplicated PRE scripts
-
- Jun 18, 2021
-
-
Dr Adam Thorn authored
This is just a change to the packaging, not to the actual deployed contents of the package. This deb has quite a few conffiles which made unattended-upgrades flag the mistake when I tried to upgrade. We should now have the right list of conffiles: makedeb@08d12c3c
-
Dr Adam Thorn authored
I don't think there's a sensible default quota; the value for a workstation will be very different to a tiny VM, for example.
-
- Jun 15, 2021
-
-
Dr Catherine Pitt authored
prepare-nondebian does not work on RedHat machines running MySQL as the paths are different, so providing a fixed version. prepare-nondebian has historically been used more widely than just RedHat, hence the decision to provide a RedHat-specific version and not just edit it.
-
- Jun 08, 2021
-
-
Dr Adam Thorn authored
This is needed on focal if a client is to be able to access snapshots over NFS. From the docs I don't see why we didn't also need this option on xenial, but empirically, we need it on focal. (e.g. RT-207229)
-
- May 12, 2021
-
-
Dr Catherine Pitt authored
The generation of the command to unexport NFS filesystems could generate an invalid command. Leading spaces were not being stripped, and in cases where there is more than one backup target for a machine we need to unexport every target. Because we also had 'set -e' in operation at this point, the script would fail there and never clean up the moved ZFS. I don't mind if we fail to unexport; if that's subsequently a problem for removing the ZFS then the script will fail at that point. This change makes the script generate better exportfs -u commands and not exit if they fail.
-
- Apr 30, 2021
-
-
Dr Catherine Pitt authored
The code used to open a database connection for each thread and leave them open for as long as the scheduler ran. This worked reasonably well until we moved to PostgreSQL 13 on Focal, although the scheduler would fail if the database was restarted because there was no logic to reconnect after a connection dropped. On Focal/PG13 the connection for the 'cron' thread steadily consumes memory until it has exhausted everything in the machine. This appears to be a Postgres change rather than a Perl DBI change: the problem can be reproduced by sitting in psql and running 'select * from backup_queue' repeatedly. Once or twice a minute an instance of this query will cause the connection to consume another MB of RAM which is not released until the database connection is closed. The cron thread runs that query every two seconds. My guess is it's something peculiar about the view that query selects from - the time interval thing is interesting. This needs more investigation. But in the meantime I'd like to have backup servers that don't endlessly gobble RAM, so this change makes the threads connect to the database only when they need to, and closes the connection afterwards. This should also make things work better over database restarts but that's not been carefully tested.
-
- Jan 18, 2021
-
-
Dr Catherine Pitt authored
-
- Jan 06, 2021
-
-
Dr Adam Thorn authored
As of focal a bunch of top-level dirs are symlinks (eg /lib -> /usr/lib) but the deb packages still deploy files to the symlink rather than the real dir. Thus if we just take the contents of all the *.list files we end up not excluding lots of files that are in fact provided by debs
-
- Dec 11, 2020
-
-
Dr Catherine Pitt authored
This package depends on a postgres server being available, but annoyingly we can't use the 'postgresql' metapackage in the dependencies because on Ubuntu that depends on the specific distro-provided version, which usually isn't the one we want. So we have to add our supported Postgres versions one by one.
-
Dr Catherine Pitt authored
-
- Nov 09, 2020
-
-
Dr Catherine Pitt authored
The default prepare script now uses appropriate options if it detects Postgres 12 or higher. The error handling still needs work though. Update prepare script for newer Postgres
-
- Nov 06, 2020
-
-
Dr Catherine Pitt authored
-
- Oct 07, 2020
-
-
Dr Adam Thorn authored
We don't need these backed up, and having them in the list leads to an error due to not having show_compatibility_56 enabled
-
- Oct 06, 2020
-
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
Resolves #3
-
- Apr 07, 2020
-
- Dec 18, 2019
-
-
Dr Adam Thorn authored
-
- Jul 30, 2019
-
-
Dr Adam Thorn authored
We now e.g. have a cron job on cerebro-backup which calls these scripts, where /sbin is not on the $PATH.
-
- Jul 23, 2019
-
-
Dr Catherine Pitt authored
For cerebro-backup where we do backups by the directory rather than the whole server. We need to have one machine being backed up (cerebro-filestore) but a task per directory.
-
- Apr 23, 2019
-
-
Dr Catherine Pitt authored
The new-backup-rsnapshot script understands a 'postgres' argument, but this set up a postgres backup in an old style that we no longer use. This change updates it to do some of the work of setting up a new style postgres backup and tell the user what else they might need to edit to make it go; it varies quite a lot depending on server.
-
- Jan 16, 2019
-
-
Dr Adam Thorn authored
Let's not worry if we haven't backed up a machine that has been offline for 3 months (instead of 6 months)
-