Commits · 871511b2738cecaad9ae9d0ab4fbba8f4ef2299e · Yusuf Hamied Department of Chemistry / COs / backup-scheduler

Jul 27, 2023

move zfs-rync template config files out of /etc · 871511b2
Dr Adam Thorn authored 1 year ago
```
These are static files provided by our package, not config files.
```
871511b2

move prepare scripts out of /etc · ced34c01

Dr Adam Thorn authored 1 year ago

These are not config files, and we should not be modifying the package-provided
versions of these files. I'm leaving symlinks behind to make sure we don't break
all our existing backups though!

ced34c01

remove unused PRE scripts · 3602fd4b

Dr Adam Thorn authored 1 year ago

I've checked all our live backup servers and made sure these aren't in use
(and updated some backup configs to achieve this in some cases)

3602fd4b

Jul 24, 2023

add option to specify SSHOPTIONS to rsync tasks · 53f5ba49

Dr Adam Thorn authored 1 year ago

This could/should probably supercede the specific option for SSHPORT
as I think usage of that is minimal or perhaps even zero, but we'd
have to check if that's in use and make suitable updates to config
files before removing it.

53f5ba49

May 31, 2023
- report which backup tasks are missing in the xymon report · 7563b636
  Dr Adam Thorn authored 1 year ago
  
  0.9-ch102
  
  7563b636
May 30, 2023

ensure xymon dot is red if we have a backup task that has never completed · 5fbb64c2

Dr Adam Thorn authored 1 year ago

We had been raising a failure report if we had never seen a succesful backup
for a host. However, when we have a host with more than one backup task, we
can have the situation where one backup is working OK but the other has never
completed correctly. This let to a green report as we had a non-zero number
of rows, but we require number_of_good_backups == number_of_tasks !

5fbb64c2

May 03, 2023

replace deprecated tempfile call with mktemp · d3c8ab49

Dr Adam Thorn authored 1 year ago

tempfile is a debian-ism and jammy warns us that use is deprecated.
mktemp has been available on all our debian/ubuntu machines for a long time
via coreutils (e.g. it was definitely in wheezy and trusty, and I'm pretty
sure since before then too)

d3c8ab49

Jan 05, 2023

Redhat prepare scripts compresses MySQL backup · d2a7ea19

Dr Catherine Pitt authored 2 years ago

The incremental backups on some cluster head nodes are growing quite
large, and most of the churn is the uncompressed MySQL dumpfile which
changes with every backup and can be over 1GB. This commit compresses
that data, which reduces the size of the file by 90% on at least one
machine.

d2a7ea19

Dec 19, 2022
- Raise xymon alert if quota for a ZFS exceeds available+used · 17296322
  Dr Adam Thorn authored 2 years ago
  
  This can happen for a number of reasons: - child ZFS has a quota bigger than its parent - we have over-provisioned such that the sum of quotas is bigger than the disk - an individual ZFS has a quota bigger than the disk
  0.9-ch98
  
  17296322
- rename variable in xymon script · ee1dd04d
  Dr Adam Thorn authored 2 years ago
  
  ee1dd04d
- Add generic thisproperty() in xymon script, analagous to thisquota() · 82023deb
  Dr Adam Thorn authored 2 years ago
  
  82023deb
- switch to internally using 'parsable (exact) values' for zfs commands · a04d74be
  Dr Adam Thorn authored 2 years ago
  
  ..and convert such things to human-friendly versions if required. This is to facilitate extra checks where we want to make numeric calculations involving quotas and other similar properties.
  a04d74be
- whitespace only: replace tabs with spaces · 62ab3e5d
  Dr Adam Thorn authored 2 years ago
  
  62ab3e5d
Oct 31, 2022
- analyse-snaphot-usage: add optional second argument for specifying the initial snapshot · 563ae0b1
  Dr Adam Thorn authored 2 years ago
  
  0.9-ch97
  
  563ae0b1
Oct 26, 2022
- Add script to report the space that would be reclaimed by deleting older snapshots · 48485319
  Dr Adam Thorn authored 2 years ago
  
  0.9-ch96
  
  48485319
May 25, 2022

Ignore non-zero exit from systemd-detect-virt in prepare script · 73ad8e1d

Dr Adam Thorn authored 2 years ago

As of b4b89ed3 we need the prepare script to exit zero on success.

We only use the stdout from this command, not the return code, for
determining if the target is a xen VM.

73ad8e1d

May 24, 2022

Stop returning "true" from default prepare script · b4b89ed3

Dr Adam Thorn authored 2 years ago

This has been silently masking failures, which is a Bad Thing. The only
errors I've spotted so far are ones hopefully fixed by ad91da90, which
has the effect that we've been backing up more things than we needed
(which is better than the opposite possibility, at least!)

b4b89ed3

More ensuring errors propagate upwards from the prepare script · 238e3bfc
Dr Adam Thorn authored 2 years ago

238e3bfc

Fix generation of list of package files to exclude in the standard prepare script · ad91da90

Dr Adam Thorn authored 2 years ago

c714a331 introduced an annoying bug here. If the filenames piped to xargs include
an apostrophe, xargs will complain and stop. We have thus, in practice, been
including all files that appear in the list after

/usr/share/sounds/ubuntu/ringtones/Sam's Song.ogg

!! Also, we've not been properly handling filenames with spaces in, and perhaps
other filenames too. We thus null-terminate the filenames for ourself.

Because I want to get to the point where we can have a sensible return code
from the prepare script, this commit also adds some set -e commands to try
to ensure errors bubble up. This is a little tedious to achieve due to all
the subshells in this script.

For the same reason, we split off the "diff" command into a "set +e" block
because diff returns non-zero if differences are found. We do not consider
that to be an error!

ad91da90

Improve check of whether mysql is(/should be) running in default prepare script · 40f64d5c

Dr Adam Thorn authored 2 years ago

We think the intention of the old version of this block is

"if mysql is running, dump the databases".

The check has been buggy for a long long time: it reads the contents
of my.cnf ... which nowadays just has some !includedir directives.
This lead to setting

SOCKET=""

and, it turns out, [ -S "" ] returns true.

The main intention of this part of the prepare script is to backup
vaguely normal machines running a simple mysql database, such as
small group webservers. We thus don't need to consider every
eventuality.

40f64d5c

Stop trying to dump ldap in prepare script (as we don't have any debian/ubuntu ldap servers!!) · b67e2b82
Dr Adam Thorn authored 2 years ago

b67e2b82

Cease trying to configure dateext logrotation in prepare script · dd28c9af

Dr Adam Thorn authored 2 years ago

On servers we configure this via ansible.

On workstations it is not having the desired effect because the stock
logrotate.conf includes the line ...

and so the simple grep thinks it's already configured! I'll add an
ansible task in due course to actually configure this.

dd28c9af

Apr 25, 2022

Make default prepare scripts skip /lib/{modules,firmware} on xen VMs · 491ba4b1

Dr Adam Thorn authored 2 years ago

Now that we use pygrub, these dirs are populated by quite a lot of files
that we don't want to back up but are dynamically built by the kernel/related
packages so do not make it into the the prepared "excludes" file

491ba4b1

Mar 10, 2022

Ensure foreign keys will ON DELETE CASCADE when deleting a host · c206b71f

Dr Adam Thorn authored 3 years ago

i.e. we can now simply

delete from host where hostname='example.ch.private.cam.ac.uk';

without having to chase the foreign keys.

I've made the equivalent change on our live backup servers with
an ad hoc script.

c206b71f

Mar 09, 2022

Set TASKNAME correctly in send-backup-to-server.sh · 520764f0
Dr Adam Thorn authored 3 years ago

0.9-ch91

520764f0
Release 0.9-ch90 · 63abf952
Dr Adam Thorn authored 3 years ago

0.9-ch90

63abf952

Add script to move a backup to another server · 3d245d54

Dr Adam Thorn authored 3 years ago

i.e. send the ZFS, update db records and copy the config files. The
sql inserts should broadly mirror those done when setting up a new
backup, though with field values matching those in the source database
rather than just using the defaults.

This script has so only been tested for the case of moving a "simple"
backup where a host has a single backup task and no special config.
It's quite likely there'll be bugs to fix for other cases that
we'll find in due course.

3d245d54

Remove move-machine.sh script · 56a40eed

Dr Adam Thorn authored 3 years ago

I'm about to add a script to send a backup to a different backup server.
It's thus probably best if the script names describe their functions in a
little more detail

56a40eed

Dec 20, 2021
- move-machine.sh: update/fix usage message · 0f146873
  Dr Adam Thorn authored 3 years ago
  
  0.9-ch89
  
  0f146873
Nov 17, 2021
- Update usage message to specify quotas should be set on sub-ZFSs, not the parent · 16445075
  Dr Adam Thorn authored 3 years ago
  
  0.9-ch88
  
  16445075
- Further improve help message for new postgres backups · 90ae5b4e
  Dr Adam Thorn authored 3 years ago
  
  0.9-ch87
  
  90ae5b4e
- Update help message for postgres targets to encourage usage of canonical script · 9d0bc09f
  Dr Adam Thorn authored 3 years ago
  
  https://tickets.ch.cam.ac.uk/rt/Ticket/Display.html?id=211460 e.g. RT 211460, 211465. spri-musuem-rt-2025 had been set up with an adhoc script which was not properly tidying up after itself due to bailing early on a "set -e" error.
  0.9-ch86
  
  9d0bc09f
Jul 14, 2021

Prepend reporting lines with the zfs target name · 6e536df5

Dr Adam Thorn authored 3 years ago

This will let us use zfs_target as the name of a subtest which
in turn means we would be able to separately log and graph multiple
backup targets associated with a single host.

This change does not affect the current parsing performed when
we input data into postgres: it uses non-anchored regexps to
identify SpaceUsed etc so prepending extra text won't change
anything

6e536df5

Jul 09, 2021

Partial fix for behaviour where we see multiple backups for one task running at once · 3941a9df

Dr Adam Thorn authored 3 years ago

In some versions of backup_queue (I think just on splot4 now), we use
backup_log.isrunning as part of the logic to determine if a task should be
enqueued. The problem is that scheduler.pl makes three writes on the table:

1) an insert when the task is queued (a trigger sets isrunning='t' here)
2) an update to set started_processing when the task begins (a trigger
sets isrunning='f' here!!!!)
3) an update to set ended_processing when the task finishes (a trigger
again sets isrunning='f' here)

Thus, being careful to only set isrunning='f' when a backup task is finished
(i.e. when we set ended_processing=now() in scheduler.pl) seems sensible, and
empirically does seem to lead to the right backup_queue without duplicates.

This commit will only affect new setups of backup servers; the change has been
deployed to live servers with an ad hoc script I've run.

I think we only see this on splot4 because it has a very different definition of
the backup_queue view to a) the one defined in this file, b) the one that's on
all the other backup servers. If I just try to replace the view on splot4, though,
any attempt to select from it just times out so there may be other relations on
splot4 that need updating too.

NB the obvious thing missing on splot4 is

WHERE ((backup_log.backup_task_id = a.backup_task_id) AND (backup_log.ended_processing IS NULL))) < 1))

which feels like a hack but nonetheless ensures in practice that we don't get
duplicate queued tasks.

3941a9df

Jul 08, 2021

Ensure pg-dump-script includes a dump of roles · 67d141b5

Dr Adam Thorn authored 3 years ago

We don't always need the role data, if the presumption is that we'll
be doing a pg_restore in conjunction with an ansible role which creates
all required roles. But, having a copy of the role data will never hurt!
It also gives us a straightforward way of restoring a database to a
standalone postgres instance without having to have provisioned a
dedicated VM with the relevant ansible roles.

67d141b5

Add a script to do a postgres backup via pg_dump · 5b4a8757

Dr Adam Thorn authored 3 years ago

At present we use myriad one-off per host scripts to do a pg_dump,
and they all do (or probably should do) the same thing. In combination
with setting options in the host's backup config file, I think
this single script covers all our routine pg backups.

5b4a8757

Call PRE and POST with same args as zfs-rsync.sh · e01d7ebc

Dr Adam Thorn authored 3 years ago

we were just passing the hostname. Adding extra args should
not impact any existing script, but will let us write better/
more maintainable/deduplicated PRE scripts

e01d7ebc

Jun 29, 2021

Add an outline script for moving a whole zpool · 7ae60f97

Dr Catherine Pitt authored 3 years ago

This came about because a disk has failed on nest-backup, which only has
subdirectory backups of nest-filestore-0 and so move-machine.sh was not
going to be helpful - it assumes all tasks for a machine are on the same
zpool which isn't true there. In this case I did the move by hand, but
have sketched out the steps in the script in the hope that next time we
have to do this we'll do it by looking at the script and running bits by
hand, then improve the script a bit, and continue until it's usable.

7ae60f97

Jun 18, 2021

Release with correct list of conffiles · 92b25e88

Dr Adam Thorn authored 3 years ago

This is just a change to the packaging, not to the actual deployed
contents of the package. This deb has quite a few conffiles which
made unattended-upgrades flag the mistake when I tried to upgrade.

We should now have the right list of conffiles:

makedeb@08d12c3c

92b25e88

Add a reminder to set a quota on a newly-created backup · c08df044

Dr Adam Thorn authored 3 years ago

I don't think there's a sensible default quota; the value for
a workstation will be very different to a tiny VM, for example.

c08df044

Admin message