Commits · d2a7ea19c68a341b6d776586243afd7c4e9e2d57 · Yusuf Hamied Department of Chemistry / COs / backup-scheduler

Jan 05, 2023

Redhat prepare scripts compresses MySQL backup · d2a7ea19

The incremental backups on some cluster head nodes are growing quite
large, and most of the churn is the uncompressed MySQL dumpfile which
changes with every backup and can be over 1GB. This commit compresses
that data, which reduces the size of the file by 90% on at least one
machine.

d2a7ea19

Dec 19, 2022

Raise xymon alert if quota for a ZFS exceeds available+used · 17296322

Dr Adam Thorn authored 2 years ago

This can happen for a number of reasons:

- child ZFS has a quota bigger than its parent
- we have over-provisioned such that the sum of quotas is bigger than the disk
- an individual ZFS has a quota bigger than the disk

17296322

switch to internally using 'parsable (exact) values' for zfs commands · a04d74be

Dr Adam Thorn authored 2 years ago

..and convert such things to human-friendly versions if required.

This is to facilitate extra checks where we want to make numeric
calculations involving quotas and other similar properties.

a04d74be

Oct 31, 2022
- analyse-snaphot-usage: add optional second argument for specifying the initial snapshot · 563ae0b1
  Dr Adam Thorn authored 2 years ago
  
  0.9-ch97
  
  563ae0b1
Oct 26, 2022
- Add script to report the space that would be reclaimed by deleting older snapshots · 48485319
  Dr Adam Thorn authored 2 years ago
  
  0.9-ch96
  
  48485319
May 25, 2022

Ignore non-zero exit from systemd-detect-virt in prepare script · 73ad8e1d

Dr Adam Thorn authored 2 years ago

As of b4b89ed3 we need the prepare script to exit zero on success.

We only use the stdout from this command, not the return code, for
determining if the target is a xen VM.

73ad8e1d

May 24, 2022

Stop returning "true" from default prepare script · b4b89ed3

Dr Adam Thorn authored 2 years ago

This has been silently masking failures, which is a Bad Thing. The only
errors I've spotted so far are ones hopefully fixed by ad91da90, which
has the effect that we've been backing up more things than we needed
(which is better than the opposite possibility, at least!)

b4b89ed3

Apr 25, 2022

Make default prepare scripts skip /lib/{modules,firmware} on xen VMs · 491ba4b1

Dr Adam Thorn authored 2 years ago

Now that we use pygrub, these dirs are populated by quite a lot of files
that we don't want to back up but are dynamically built by the kernel/related
packages so do not make it into the the prepared "excludes" file

491ba4b1

Mar 10, 2022

Ensure foreign keys will ON DELETE CASCADE when deleting a host · c206b71f

Dr Adam Thorn authored 3 years ago

i.e. we can now simply

delete from host where hostname='example.ch.private.cam.ac.uk';

without having to chase the foreign keys.

I've made the equivalent change on our live backup servers with
an ad hoc script.

c206b71f

Mar 09, 2022
- Set TASKNAME correctly in send-backup-to-server.sh · 520764f0
  Dr Adam Thorn authored 3 years ago
  
  0.9-ch91
  
  520764f0
- Release 0.9-ch90 · 63abf952
  Dr Adam Thorn authored 3 years ago
  
  0.9-ch90
  
  63abf952
Dec 20, 2021
- move-machine.sh: update/fix usage message · 0f146873
  Dr Adam Thorn authored 3 years ago
  
  0.9-ch89
  
  0f146873
Nov 17, 2021
- Update usage message to specify quotas should be set on sub-ZFSs, not the parent · 16445075
  Dr Adam Thorn authored 3 years ago
  
  0.9-ch88
  
  16445075
- Further improve help message for new postgres backups · 90ae5b4e
  Dr Adam Thorn authored 3 years ago
  
  0.9-ch87
  
  90ae5b4e
- Update help message for postgres targets to encourage usage of canonical script · 9d0bc09f
  Dr Adam Thorn authored 3 years ago
  
  https://tickets.ch.cam.ac.uk/rt/Ticket/Display.html?id=211460 e.g. RT 211460, 211465. spri-musuem-rt-2025 had been set up with an adhoc script which was not properly tidying up after itself due to bailing early on a "set -e" error.
  0.9-ch86
  
  9d0bc09f
Jul 14, 2021

Prepend reporting lines with the zfs target name · 6e536df5

Dr Adam Thorn authored 3 years ago

This will let us use zfs_target as the name of a subtest which
in turn means we would be able to separately log and graph multiple
backup targets associated with a single host.

This change does not affect the current parsing performed when
we input data into postgres: it uses non-anchored regexps to
identify SpaceUsed etc so prepending extra text won't change
anything

6e536df5

Jul 08, 2021

Ensure pg-dump-script includes a dump of roles · 67d141b5

Dr Adam Thorn authored 3 years ago

We don't always need the role data, if the presumption is that we'll
be doing a pg_restore in conjunction with an ansible role which creates
all required roles. But, having a copy of the role data will never hurt!
It also gives us a straightforward way of restoring a database to a
standalone postgres instance without having to have provisioned a
dedicated VM with the relevant ansible roles.

67d141b5

Add a script to do a postgres backup via pg_dump · 5b4a8757

Dr Adam Thorn authored 3 years ago

At present we use myriad one-off per host scripts to do a pg_dump,
and they all do (or probably should do) the same thing. In combination
with setting options in the host's backup config file, I think
this single script covers all our routine pg backups.

5b4a8757

Call PRE and POST with same args as zfs-rsync.sh · e01d7ebc

Dr Adam Thorn authored 3 years ago

we were just passing the hostname. Adding extra args should
not impact any existing script, but will let us write better/
more maintainable/deduplicated PRE scripts

e01d7ebc

Jun 18, 2021

Release with correct list of conffiles · 92b25e88

Dr Adam Thorn authored 3 years ago

This is just a change to the packaging, not to the actual deployed
contents of the package. This deb has quite a few conffiles which
made unattended-upgrades flag the mistake when I tried to upgrade.

We should now have the right list of conffiles:

makedeb@08d12c3c

92b25e88

Add a reminder to set a quota on a newly-created backup · c08df044

Dr Adam Thorn authored 3 years ago

I don't think there's a sensible default quota; the value for
a workstation will be very different to a tiny VM, for example.

c08df044

Jun 15, 2021

Add prepare-redhat script · 83e54e26

Dr Catherine Pitt authored 3 years ago

prepare-nondebian does not work on RedHat machines running MySQL as the
paths are different, so providing a fixed version. prepare-nondebian has
historically been used more widely than just RedHat, hence the decision
to provide a RedHat-specific version and not just edit it.

83e54e26

Jun 08, 2021

Add crossmnt to list of default NFS options · ca4eaf3b

Dr Adam Thorn authored 3 years ago

This is needed on focal if a client is to be able to access snapshots over NFS.
From the docs I don't see why we didn't also need this option on xenial,
but empirically, we need it on focal. (e.g. RT-207229)

ca4eaf3b

May 12, 2021

Fix a bug in the move-machine script · 75db08dc

Dr Catherine Pitt authored 3 years ago

The generation of the command to unexport NFS filesystems could generate
an invalid command. Leading spaces were not being stripped, and in cases
where there is more than one backup target for a machine we need to
unexport every target. Because we also had 'set -e' in operation at this
point, the script would fail there and never clean up the moved ZFS. I
don't mind if we fail to unexport; if that's subsequently a problem for
removing the ZFS then the script will fail at that point.

This change makes the script generate better exportfs -u commands and
not exit if they fail.

75db08dc

Apr 30, 2021

Make database connections short-lived · e40c1a55

Dr Catherine Pitt authored 3 years ago

The code used to open a database connection for each thread and leave
them open for as long as the scheduler ran. This worked reasonably well
until we moved to PostgreSQL 13 on Focal, although the scheduler would
fail if the database was restarted because there was no logic to
reconnect after a connection dropped.

On Focal/PG13 the connection for the 'cron' thread steadily consumes
memory until it has exhausted everything in the machine. This appears to
be a Postgres change rather than a Perl DBI change: the problem can be
reproduced by sitting in psql and running 'select * from backup_queue'
repeatedly. Once or twice a minute an instance of this query will cause
the connection to consume another MB of RAM which is not released until
the database connection is closed. The cron thread runs that query every
two seconds. My guess is it's something peculiar about the view that
query selects from - the time interval thing is interesting.
This needs more investigation.

But in the meantime I'd like to have backup servers that don't endlessly
gobble RAM, so this change makes the threads connect to the database
only when they need to, and closes the connection afterwards. This
should also make things work better over database restarts but that's
not been carefully tested.

e40c1a55

Jan 18, 2021
- Add missing template file for directory backups · 86279e65
  Dr Catherine Pitt authored 4 years ago
  
  0.9-ch74
  
  86279e65
Jan 06, 2021

Pipe list of package-files through realpath, to resolve any symlinks · c714a331

Dr Adam Thorn authored 4 years ago

As of focal a bunch of top-level dirs are symlinks (eg /lib -> /usr/lib) but
the deb packages still deploy files to the symlink rather than the real dir.
Thus if we just take the contents of all the *.list files we end up not
excluding lots of files that are in fact provided by debs

c714a331

Dec 11, 2020

Include postgresql-13 as a possible database server · 1f0640b7

Dr Catherine Pitt authored 4 years ago

This package depends on a postgres server being available, but annoyingly we
can't use the 'postgresql' metapackage in the dependencies because on Ubuntu
that depends on the specific distro-provided version, which usually isn't the
one we want. So we have to add our supported Postgres versions one by one.

1f0640b7

Release new version · 3c8e0990
Dr Catherine Pitt authored 4 years ago

0.9-ch71

3c8e0990

Nov 09, 2020

Default prepare script supports Postgres 12 and up · c58c79f1

Dr Catherine Pitt authored 4 years ago

The default prepare script now uses appropriate options if it detects Postgres
12 or higher. The error handling still needs work though.

Update prepare script for newer Postgres

c58c79f1

Nov 06, 2020
- Release 0.9-ch69 · b37983ee
  Dr Catherine Pitt authored 4 years ago
  
  0.9-ch69
  
  b37983ee
Oct 07, 2020

Don't backup information/performance_schema dbs for mysql · 293288d4

Dr Adam Thorn authored 4 years ago

We don't need these backed up, and having them in the list leads to
an error due to not having show_compatibility_56 enabled

293288d4

Oct 06, 2020
- Release 0.9-ch67 · f61c3710
  Dr Adam Thorn authored 4 years ago
  
  0.9-ch67
  
  f61c3710
- Make mysql-dump-script detect and use .my.cnf where possible · 960baea3
  Dr Adam Thorn authored 4 years ago
  
  Resolves #3
  0.9-ch66
  
  960baea3
Apr 07, 2020

version bump · dfa0504c

A.J. Hall authored 4 years ago

changed maintainer
mention name of git repository since it doesn't match package name

dfa0504c

Dec 18, 2019
- Add check to see if ZFS for host exists in retirehost · 046ae157
  Dr Adam Thorn authored 5 years ago
  
  046ae157
Jul 30, 2019

Fully-qualify paths to zfs, zpool commands · 28abf346

Dr Adam Thorn authored 5 years ago

We now e.g. have a cron job on cerebro-backup which calls
these scripts, where /sbin is not on the $PATH.

28abf346

Jul 23, 2019

Add script for adding new task to existing server · 82b033a3

Dr Catherine Pitt authored 5 years ago

For cerebro-backup where we do backups by the directory rather than the
whole server. We need to have one machine being backed up
(cerebro-filestore) but a task per directory.

82b033a3

Apr 23, 2019

new-backup-rsnapshot better support for postgres backups · 6f4dfa12

Dr Catherine Pitt authored 5 years ago

The new-backup-rsnapshot script understands a 'postgres' argument, but
this set up a postgres backup in an old style that we no longer use.
This change updates it to do some of the work of setting up a new style
postgres backup and tell the user what else they might need to edit to
make it go; it varies quite a lot depending on server.

6f4dfa12

Jan 16, 2019
- Let's not worry if we haven't backed up a machine that has been offline for 3... · 3c040a2c
  Dr Adam Thorn authored 6 years ago
  
  Let's not worry if we haven't backed up a machine that has been offline for 3 months (instead of 6 months)
  0.9-ch60
  
  3c040a2c