Commits · master · Yusuf Hamied Department of Chemistry / COs / backup-scheduler

Jan 21, 2025
- Fix install location of systemd unit file · fabc1d27
  Dr Catherine Pitt authored 1 month ago
  
  1.1.1
  
  fabc1d27
- Fix install file for change of init script · ddade81e
  Dr Catherine Pitt authored 1 month ago
  
  ddade81e
- Always build package · 4c9fbfab
  Dr Catherine Pitt authored 1 month ago
  
  4c9fbfab
- Replace init script with systemd unit · 9844b1ad
  Dr Catherine Pitt authored 1 month ago
  
  9844b1ad
Oct 23, 2024
- Release 1.0.8 · 95a761ae
  Dr Catherine Pitt authored 4 months ago
  
  1.0.8
  
  95a761ae
- Fix quoting in send-backup-to-server.sh · 5ae63e54
  Dr Catherine Pitt authored 4 months ago
  
  5ae63e54
Sep 25, 2024

ensure host config dir exists before copying exclude file to it · 036b518c

Dr Adam Thorn authored 5 months ago

Hmm. My newly-created backup was failing due to this dir missing, but it surely
isn't the first backup that has been added when we've had the current
version of the prepare script in place. Yet I see us calling mkdir in
the collection of old prepare scripts, and there's no attempt to create
it in the script that creates new backups ....

036b518c

Aug 21, 2024

fix bug in xymon script when checking if no quota is set · 8f0f5b1b

Dr Adam Thorn authored 6 months ago

We had been looking for the string 'none', but deliberately run a zfs
command which returns a numeric parseable value, so get the value 0 when
no quota is set. We'd been doing this properly on thisquota but not
parentquota.

8f0f5b1b

Jul 10, 2024
- Update to 1.0.5 for fix · ef457d45
  Owen Johnson authored 8 months ago
  
  1.0.5
  
  ef457d45
- As root on dest. server sudo postgres to run psql · 95bd3861
  Owen Johnson authored 8 months ago
  
  95bd3861
Jun 10, 2024
- fix cron-thread.log contents · 3de479aa
  Dr Adam Thorn authored 9 months ago
  
  We were referencing a db field that doesn't exist.
  1.0.4
  
  3de479aa
May 24, 2024

Fix typo in distro name · a12fe67c
Dr. Frank Lee authored 9 months ago

1.0.3

a12fe67c
Reinstate per-machine SSH config on the preparation script · bad7fd41
Dr. Frank Lee authored 9 months ago

bad7fd41

Add -q flag to psql calls to suppress 'INSERT 0 1' etc · 59d01a57

Dr Catherine Pitt authored 9 months ago

Closes #6

When running psql commands to insert rows in the database, psql normally
returns an message about what it did, eg "INSERT 0 1" if it inserted a
row. This can be suppressed with -q . Several of the scripts use psql
commands to get primary keys from the database, inserting the row if
necessary. This can lead to the host id variable in the script being set
to 'INSERT 0 1 <thehostid>' which causes problems when this variable is
used in other SQL commands.

This always used to work; I suspect the thing that changed is our
upgrading to Postgres 16 on the backup servers, but I'm struggling to
see how as Postgres 13 seems to behave the same for me.

59d01a57

May 07, 2024
- Add script to call client-side postgres dump scripts · def5995e
  Dr. Frank Lee authored 10 months ago
  
  1.0.1
  
  def5995e
May 01, 2024
- Add the CI magic · 616e32a3
  Dr. Frank Lee authored 10 months ago
  
  1.0.0
  
  616e32a3
- Add new zfs-rsync-prep script · 5eb022fc
  Dr. Frank Lee authored 10 months ago
  
  5eb022fc
- Move to new packaging model · f88d6565
  Dr. Frank Lee authored 10 months ago
  
  f88d6565
Oct 23, 2023

add unique hostname constraint to host table · b902dac5

Dr Adam Thorn authored 1 year ago

We had ended up, somehow, with a few hosts on one backup server
which appeared twice in `host` - one with disabled=f and some backup
tasks as expected, and one with disabled=t. I manually (necessarily)
deleted the latter before adding this constraint on the live servers.

(all servers listed in zfs_backup_server.conf have had the db table
manually updated)

b902dac5

Release 1.0-ch112 · f16afbf4
Dr Adam Thorn authored 1 year ago

0.9-ch112

f16afbf4

don't mark hosts as disabled at the point they get created · 594be148

Dr Adam Thorn authored 1 year ago

I think the intention here was perhaps:

- create new host, marked as disabled
- finish setting up backups
- once done, mark host as enabled

...except the script runs as "set -e", so if something goes wrong
we just never get as far as enabling the host, which means not only
do no backups run but no failure reports get sent to xymon so we
don't even notice the failure. This is not good.

Given the entry of a row in the `host` table doesn't do much in and
of itself, I see no reason why we shouldn't just mark the host as
initially enabled. We won't try to actually perform a backup until
a `backup_task` has been created. Perhaps this leads to a brief
transient behaviour where xymon reports a backup as failing whilst
the script is still running - but OTOH the xymon report for a new
machine will always be red for "a while" until the first backup
has actually run OK.

594be148

don't bail (due to set -e) if ssh-keyscan fails · 20de2933

Dr Adam Thorn authored 1 year ago

We don't need to record the ssh host key in most cases given that we
generally deploy signed ssh host keys, but I suspect we might have the
occasional backup target where that doesn't apply (e.g. clusters?)

Regardless, if we can't scan the host key the right behaviour is for
the script to continue on and set up the backup. If the backup then
fails due to the absent host key, we will be alerted and take suitable
action. Right now, the failure mechanism is that we silently don't finish
setting up the backup, the backup never gets enabled, and we don't
realise we don't have a backup - eek.

20de2933

Sep 27, 2023

Xymon test for backups is valid for 3 hours · b883ad8d

Dr Catherine Pitt authored 1 year ago

The Xymon test that reports on backup status runs every 45 minutes. But
the status of an individual backup does not change very frequently - we
try to back most things up a few times a day. This change makes the
individual backup statuses valid for three hours, rather than the one
hour they were previously. This is to avoid getting purple dots when we
we reboot a backup server and interrupt the 45 minute check, which then
won't run again for another 45 minutes causing a 90 minute gap between
reports for some hosts and hence purple dots.

b883ad8d

Sep 08, 2023

Add script for moving a backup task rather than a host · 43774528

Dr Catherine Pitt authored 1 year ago

For machines like nest-backup and cerebro-backup we have lots of backup
tasks for the same host spread across several zpools, so
move-machine-to-zpool.sh can't be used to migrate the contents of a
failing zpool/disk. This adds a script to move an individual ZFS which is
the target of a backup task to another zpool.

It assumes all necessary parent ZFSes already exist on the target. If
they don't it fails.

It does not yet clean up the old ZFS as it's not had a lot of use.

43774528

Sep 04, 2023
- Release new package version · 4237c9b2
  Dr Catherine Pitt authored 1 year ago
  
  0.9-ch110
  
  4237c9b2
Sep 01, 2023

send-backup-to-server.sh copies additional config · 0f5bbdc6

Dr Catherine Pitt authored 1 year ago

We have started putting extra configuration for sshing to a host in a
file in the /etc/chem-zfs-backup-server/zfs-rsync.d/$HOSTNAME directory.
This updates the backup migration script to copy that as well as the
main config file for the machine. I've chosen to copy the entire
directory to catch other files we might want to add in future. There is
often an 'exclude' file in there that's autogenerated by the prepare
scripts, but copying that won't do any damage; it's just redundant
because it will be regenerated when the backup runs.

0f5bbdc6

Aug 30, 2023

Cease attempt to unexport ZFS when moving backup to a different zpool · 2dd5cd07

Dr Adam Thorn authored 1 year ago

1. The export is done via set sharenfs which means we shouldn't need
   to manually manage exports

2. This part of the script does not work because it tries to unexport
   the old export but by looking up the db record that we have already
   updated to refer to the new zpool.

2dd5cd07

Aug 23, 2023
- Release version with custom rsync command option · 94dc75f2
  Dr Catherine Pitt authored 1 year ago
  
  0.9-ch108
  
  94dc75f2
Aug 09, 2023

Allow setting of global rsync command and rsync args · 46533d5a

Dr Catherine Pitt authored 1 year ago

This adds a new config file which allows setting the command to use for
'rsync' and global options for that command. This is motivated by the
need to use an alternative rsync command on Jammy machines, as the
system one is too slow.

The option for global rsync arguments was added as a way to add the
'--trust-sender' flag to all backups to turn off certain checks that we
suspect to be the cause of the slowdown, but it didn't help enough to
fix the speed problem. Instead we are going to use our own package of an
older rsync from before the checking code was added, which of course
doesn't support --trust-sender so the global args are left blank.

46533d5a

Jul 27, 2023

release ch104 · d3c7bb3e
Dr Adam Thorn authored 1 year ago

0.9-ch104

d3c7bb3e
restore distribution to focal · 20ea0258
Dr Adam Thorn authored 1 year ago
```
This was unintentionally removed in 53f5ba
```
20ea0258

Add abilty to have a per-host ssh config file · 6fda699c

Dr Adam Thorn authored 1 year ago

Custom options need to be a file passed via -F because we want
to specify options for both ssh and scp. They don't have a compatible
set of CLI options but both take -F.

This supercedes 53f5ba49 ; I had only deployed SSHOPTIONS for one host which
I've updated.

This also removes the SSHPORT option, which had only been used in the config
for one host which I've updated.

6fda699c

move zfs-rync template config files out of /etc · 871511b2
Dr Adam Thorn authored 1 year ago
```
These are static files provided by our package, not config files.
```
871511b2

move prepare scripts out of /etc · ced34c01

Dr Adam Thorn authored 1 year ago

These are not config files, and we should not be modifying the package-provided
versions of these files. I'm leaving symlinks behind to make sure we don't break
all our existing backups though!

ced34c01

remove unused PRE scripts · 3602fd4b

Dr Adam Thorn authored 1 year ago

I've checked all our live backup servers and made sure these aren't in use
(and updated some backup configs to achieve this in some cases)

3602fd4b

Jul 24, 2023

release for jammy · 66f74986
Dr Adam Thorn authored 1 year ago

66f74986

add option to specify SSHOPTIONS to rsync tasks · 53f5ba49

Dr Adam Thorn authored 1 year ago

This could/should probably supercede the specific option for SSHPORT
as I think usage of that is minimal or perhaps even zero, but we'd
have to check if that's in use and make suitable updates to config
files before removing it.

53f5ba49

May 31, 2023
- report which backup tasks are missing in the xymon report · 7563b636
  Dr Adam Thorn authored 1 year ago
  
  0.9-ch102
  
  7563b636
May 30, 2023

ensure xymon dot is red if we have a backup task that has never completed · 5fbb64c2

Dr Adam Thorn authored 1 year ago

We had been raising a failure report if we had never seen a succesful backup
for a host. However, when we have a host with more than one backup task, we
can have the situation where one backup is working OK but the other has never
completed correctly. This let to a green report as we had a non-zero number
of rows, but we require number_of_good_backups == number_of_tasks !

5fbb64c2

May 03, 2023

replace deprecated tempfile call with mktemp · d3c8ab49

Dr Adam Thorn authored 1 year ago

tempfile is a debian-ism and jammy warns us that use is deprecated.
mktemp has been available on all our debian/ubuntu machines for a long time
via coreutils (e.g. it was definitely in wheezy and trusty, and I'm pretty
sure since before then too)

d3c8ab49

Admin message