Commits · 3941a9df5c0753ff32f2122423dd5d49676deb61 · Yusuf Hamied Department of Chemistry / COs / backup-scheduler

Jul 09, 2021

Partial fix for behaviour where we see multiple backups for one task running at once · 3941a9df

Dr Adam Thorn authored 3 years ago

In some versions of backup_queue (I think just on splot4 now), we use
backup_log.isrunning as part of the logic to determine if a task should be
enqueued. The problem is that scheduler.pl makes three writes on the table:

1) an insert when the task is queued (a trigger sets isrunning='t' here)
2) an update to set started_processing when the task begins (a trigger
sets isrunning='f' here!!!!)
3) an update to set ended_processing when the task finishes (a trigger
again sets isrunning='f' here)

Thus, being careful to only set isrunning='f' when a backup task is finished
(i.e. when we set ended_processing=now() in scheduler.pl) seems sensible, and
empirically does seem to lead to the right backup_queue without duplicates.

This commit will only affect new setups of backup servers; the change has been
deployed to live servers with an ad hoc script I've run.

I think we only see this on splot4 because it has a very different definition of
the backup_queue view to a) the one defined in this file, b) the one that's on
all the other backup servers. If I just try to replace the view on splot4, though,
any attempt to select from it just times out so there may be other relations on
splot4 that need updating too.

NB the obvious thing missing on splot4 is

WHERE ((backup_log.backup_task_id = a.backup_task_id) AND (backup_log.ended_processing IS NULL))) < 1))

which feels like a hack but nonetheless ensures in practice that we don't get
duplicate queued tasks.

3941a9df

Jul 08, 2021

Ensure pg-dump-script includes a dump of roles · 67d141b5

Dr Adam Thorn authored 3 years ago

We don't always need the role data, if the presumption is that we'll
be doing a pg_restore in conjunction with an ansible role which creates
all required roles. But, having a copy of the role data will never hurt!
It also gives us a straightforward way of restoring a database to a
standalone postgres instance without having to have provisioned a
dedicated VM with the relevant ansible roles.

67d141b5

Add a script to do a postgres backup via pg_dump · 5b4a8757

Dr Adam Thorn authored 3 years ago

At present we use myriad one-off per host scripts to do a pg_dump,
and they all do (or probably should do) the same thing. In combination
with setting options in the host's backup config file, I think
this single script covers all our routine pg backups.

5b4a8757

Call PRE and POST with same args as zfs-rsync.sh · e01d7ebc

Dr Adam Thorn authored 3 years ago

we were just passing the hostname. Adding extra args should
not impact any existing script, but will let us write better/
more maintainable/deduplicated PRE scripts

e01d7ebc

Jun 29, 2021

Add an outline script for moving a whole zpool · 7ae60f97

Dr Catherine Pitt authored 3 years ago

This came about because a disk has failed on nest-backup, which only has
subdirectory backups of nest-filestore-0 and so move-machine.sh was not
going to be helpful - it assumes all tasks for a machine are on the same
zpool which isn't true there. In this case I did the move by hand, but
have sketched out the steps in the script in the hope that next time we
have to do this we'll do it by looking at the script and running bits by
hand, then improve the script a bit, and continue until it's usable.

7ae60f97

Jun 18, 2021

Release with correct list of conffiles · 92b25e88

Dr Adam Thorn authored 3 years ago

This is just a change to the packaging, not to the actual deployed
contents of the package. This deb has quite a few conffiles which
made unattended-upgrades flag the mistake when I tried to upgrade.

We should now have the right list of conffiles:

makedeb@08d12c3c

92b25e88

Add a reminder to set a quota on a newly-created backup · c08df044

Dr Adam Thorn authored 3 years ago

I don't think there's a sensible default quota; the value for
a workstation will be very different to a tiny VM, for example.

c08df044

Jun 15, 2021

Add prepare-redhat script · 83e54e26

Dr Catherine Pitt authored 3 years ago

prepare-nondebian does not work on RedHat machines running MySQL as the
paths are different, so providing a fixed version. prepare-nondebian has
historically been used more widely than just RedHat, hence the decision
to provide a RedHat-specific version and not just edit it.

83e54e26

Jun 08, 2021

Add crossmnt to list of default NFS options · ca4eaf3b

Dr Adam Thorn authored 3 years ago

This is needed on focal if a client is to be able to access snapshots over NFS.
From the docs I don't see why we didn't also need this option on xenial,
but empirically, we need it on focal. (e.g. RT-207229)

ca4eaf3b

May 12, 2021

Fix a bug in the move-machine script · 75db08dc

Dr Catherine Pitt authored 3 years ago

The generation of the command to unexport NFS filesystems could generate
an invalid command. Leading spaces were not being stripped, and in cases
where there is more than one backup target for a machine we need to
unexport every target. Because we also had 'set -e' in operation at this
point, the script would fail there and never clean up the moved ZFS. I
don't mind if we fail to unexport; if that's subsequently a problem for
removing the ZFS then the script will fail at that point.

This change makes the script generate better exportfs -u commands and
not exit if they fail.

75db08dc

Apr 30, 2021

Make database connections short-lived · e40c1a55

Dr Catherine Pitt authored 3 years ago

The code used to open a database connection for each thread and leave
them open for as long as the scheduler ran. This worked reasonably well
until we moved to PostgreSQL 13 on Focal, although the scheduler would
fail if the database was restarted because there was no logic to
reconnect after a connection dropped.

On Focal/PG13 the connection for the 'cron' thread steadily consumes
memory until it has exhausted everything in the machine. This appears to
be a Postgres change rather than a Perl DBI change: the problem can be
reproduced by sitting in psql and running 'select * from backup_queue'
repeatedly. Once or twice a minute an instance of this query will cause
the connection to consume another MB of RAM which is not released until
the database connection is closed. The cron thread runs that query every
two seconds. My guess is it's something peculiar about the view that
query selects from - the time interval thing is interesting.
This needs more investigation.

But in the meantime I'd like to have backup servers that don't endlessly
gobble RAM, so this change makes the threads connect to the database
only when they need to, and closes the connection afterwards. This
should also make things work better over database restarts but that's
not been carefully tested.

e40c1a55

Jan 18, 2021
- Add missing template file for directory backups · 86279e65
  Dr Catherine Pitt authored 4 years ago
  
  0.9-ch74
  
  86279e65
Jan 06, 2021

Pipe list of package-files through realpath, to resolve any symlinks · c714a331

Dr Adam Thorn authored 4 years ago

As of focal a bunch of top-level dirs are symlinks (eg /lib -> /usr/lib) but
the deb packages still deploy files to the symlink rather than the real dir.
Thus if we just take the contents of all the *.list files we end up not
excluding lots of files that are in fact provided by debs

c714a331

Dec 11, 2020

Include postgresql-13 as a possible database server · 1f0640b7

Dr Catherine Pitt authored 4 years ago

This package depends on a postgres server being available, but annoyingly we
can't use the 'postgresql' metapackage in the dependencies because on Ubuntu
that depends on the specific distro-provided version, which usually isn't the
one we want. So we have to add our supported Postgres versions one by one.

1f0640b7

Release new version · 3c8e0990
Dr Catherine Pitt authored 4 years ago

0.9-ch71

3c8e0990

Remove deprecated 'keys on scalar' usage · ef6e3f36

Dr Catherine Pitt authored 4 years ago

Can't say "keys $foo", must say "keys %{$foo}" to get the keys of a hash
pointed to by a hash reference $foo. This broke with the perl in focal.

ef6e3f36

Nov 16, 2020
- Remove explicit conffiles list (makedeb autopopulates with everything in /etc) · c46ffcfa
  Dr Adam Thorn authored 4 years ago
  
  c46ffcfa
Nov 09, 2020

Default prepare script supports Postgres 12 and up · c58c79f1

Dr Catherine Pitt authored 4 years ago

The default prepare script now uses appropriate options if it detects Postgres
12 or higher. The error handling still needs work though.

Update prepare script for newer Postgres

c58c79f1

Nov 06, 2020
- Release 0.9-ch69 · b37983ee
  Dr Catherine Pitt authored 4 years ago
  
  0.9-ch69
  
  b37983ee
- Add /swap.img to global ignore list · 1015bae9
  Dr Catherine Pitt authored 4 years ago
  
  Looks like Subiquity sets up a swapfile at /swap.img on Ubuntu 20.04 by default and we certainly don't want to back that up.
  1015bae9
Oct 07, 2020

Don't backup information/performance_schema dbs for mysql · 293288d4

Dr Adam Thorn authored 4 years ago

We don't need these backed up, and having them in the list leads to
an error due to not having show_compatibility_56 enabled

293288d4

Oct 06, 2020
- Release 0.9-ch67 · f61c3710
  Dr Adam Thorn authored 4 years ago
  
  0.9-ch67
  
  f61c3710
- Use set -e on mysql-dump-script (so any error bubbles up to zfs-rsync.d) · e9250341
  Dr Adam Thorn authored 4 years ago
  
  e9250341
- Make zfs-rsync.sh exit in error if PRE script fails · 9f8b5dc4
  Dr Adam Thorn authored 4 years ago
  
  9f8b5dc4
- Log starting time of zfs-rsync.sh · 33980780
  Dr Adam Thorn authored 4 years ago
  
  33980780
- Move setting of vars earlier in zfs-rsync.sh · 4a32235f
  Dr Adam Thorn authored 4 years ago
  
  NB this is so we can put things in LOGFILE at earlier points than the script does at present
  4a32235f
- Stop setting -e in zfs-rsync.sh · 5c0ee60b
  Dr Adam Thorn authored 4 years ago
  
  We capture the error code at a few points in the script, and then (intend to) log something about what went wrong. But if we set -e we immediately bail before logging anything useful!
  5c0ee60b
- remove commented lines from script · 9af454d5
  Dr Adam Thorn authored 4 years ago
  
  9af454d5
- Remove legacy script · 61051a45
  Dr Adam Thorn authored 4 years ago
  
  Resolves #1
  61051a45
- Make mysql-dump-script detect and use .my.cnf where possible · 960baea3
  Dr Adam Thorn authored 4 years ago
  
  Resolves #3
  0.9-ch66
  
  960baea3
- Add -f to rm command in mysql dump script, to ignore missing files · ef5a7640
  Dr Adam Thorn authored 4 years ago
  
  On a new machine, there's nothing to remove and so rm will return an error. I'd like our prepare scripts not to return errors unnecessarily!
  ef5a7640
Apr 07, 2020
- version bump · dfa0504c
  A.J. Hall authored 4 years ago
  
  changed maintainer mention name of git repository since it doesn't match package name
  0.9-ch65
  
  dfa0504c
- Bail if hostname is misspelt · 33fe82fc
  A.J. Hall authored 4 years ago
  
  33fe82fc
Dec 18, 2019
- Add check to see if ZFS for host exists in retirehost · 046ae157
  Dr Adam Thorn authored 5 years ago
  
  046ae157
Jul 30, 2019

Fully-qualify paths to zfs, zpool commands · 28abf346

Dr Adam Thorn authored 5 years ago

We now e.g. have a cron job on cerebro-backup which calls
these scripts, where /sbin is not on the $PATH.

28abf346

Jul 23, 2019

Add script for adding new task to existing server · 82b033a3

Dr Catherine Pitt authored 5 years ago

For cerebro-backup where we do backups by the directory rather than the
whole server. We need to have one machine being backed up
(cerebro-filestore) but a task per directory.

82b033a3

Jul 15, 2019

set perms on logrotate file in postinst · 4cdae9df

Dr Adam Thorn authored 5 years ago

NB arguably we "should" be doing this via debian/rules and
calling dh_fixperms. Doing that is left as an exercise for
whoever volunteers to refactor the way we build all of our
local debs!

4cdae9df

Apr 23, 2019

new-backup-rsnapshot better support for postgres backups · 6f4dfa12

Dr Catherine Pitt authored 5 years ago

The new-backup-rsnapshot script understands a 'postgres' argument, but
this set up a postgres backup in an old style that we no longer use.
This change updates it to do some of the work of setting up a new style
postgres backup and tell the user what else they might need to edit to
make it go; it varies quite a lot depending on server.

6f4dfa12

Jan 16, 2019
- Let's not worry if we haven't backed up a machine that has been offline for 3... · 3c040a2c
  Dr Adam Thorn authored 6 years ago
  
  Let's not worry if we haven't backed up a machine that has been offline for 3 months (instead of 6 months)
  0.9-ch60
  
  3c040a2c
Dec 06, 2018
- Exclude /root/.ansible · 6b43427c
  Dr Adam Thorn authored 6 years ago
  
  0.9-ch59
  
  6b43427c