- Jan 21, 2025
-
-
Dr Catherine Pitt authored
-
Dr Catherine Pitt authored
-
Dr Catherine Pitt authored
-
Dr Catherine Pitt authored
-
- Oct 23, 2024
-
-
Dr Catherine Pitt authored
-
Dr Catherine Pitt authored
-
- Sep 25, 2024
-
-
Dr Adam Thorn authored
Hmm. My newly-created backup was failing due to this dir missing, but it surely isn't the first backup that has been added when we've had the current version of the prepare script in place. Yet I see us calling mkdir in the collection of old prepare scripts, and there's no attempt to create it in the script that creates new backups ....
-
- Aug 21, 2024
-
-
Dr Adam Thorn authored
We had been looking for the string 'none', but deliberately run a zfs command which returns a numeric parseable value, so get the value 0 when no quota is set. We'd been doing this properly on thisquota but not parentquota.
-
- Jul 10, 2024
-
-
Owen Johnson authored
-
Owen Johnson authored
-
- Jun 10, 2024
-
-
Dr Adam Thorn authored
We were referencing a db field that doesn't exist.
-
- May 24, 2024
-
-
Dr. Frank Lee authored
-
Dr. Frank Lee authored
-
Dr Catherine Pitt authored
Closes #6 When running psql commands to insert rows in the database, psql normally returns an message about what it did, eg "INSERT 0 1" if it inserted a row. This can be suppressed with -q . Several of the scripts use psql commands to get primary keys from the database, inserting the row if necessary. This can lead to the host id variable in the script being set to 'INSERT 0 1 <thehostid>' which causes problems when this variable is used in other SQL commands. This always used to work; I suspect the thing that changed is our upgrading to Postgres 16 on the backup servers, but I'm struggling to see how as Postgres 13 seems to behave the same for me.
-
- May 07, 2024
-
-
Dr. Frank Lee authored
-
- May 01, 2024
-
-
Dr. Frank Lee authored
-
Dr. Frank Lee authored
-
Dr. Frank Lee authored
-
- Oct 23, 2023
-
-
Dr Adam Thorn authored
We had ended up, somehow, with a few hosts on one backup server which appeared twice in `host` - one with disabled=f and some backup tasks as expected, and one with disabled=t. I manually (necessarily) deleted the latter before adding this constraint on the live servers. (all servers listed in zfs_backup_server.conf have had the db table manually updated)
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
I think the intention here was perhaps: - create new host, marked as disabled - finish setting up backups - once done, mark host as enabled ...except the script runs as "set -e", so if something goes wrong we just never get as far as enabling the host, which means not only do no backups run but no failure reports get sent to xymon so we don't even notice the failure. This is not good. Given the entry of a row in the `host` table doesn't do much in and of itself, I see no reason why we shouldn't just mark the host as initially enabled. We won't try to actually perform a backup until a `backup_task` has been created. Perhaps this leads to a brief transient behaviour where xymon reports a backup as failing whilst the script is still running - but OTOH the xymon report for a new machine will always be red for "a while" until the first backup has actually run OK.
-
Dr Adam Thorn authored
We don't need to record the ssh host key in most cases given that we generally deploy signed ssh host keys, but I suspect we might have the occasional backup target where that doesn't apply (e.g. clusters?) Regardless, if we can't scan the host key the right behaviour is for the script to continue on and set up the backup. If the backup then fails due to the absent host key, we will be alerted and take suitable action. Right now, the failure mechanism is that we silently don't finish setting up the backup, the backup never gets enabled, and we don't realise we don't have a backup - eek.
-
- Sep 27, 2023
-
-
Dr Catherine Pitt authored
The Xymon test that reports on backup status runs every 45 minutes. But the status of an individual backup does not change very frequently - we try to back most things up a few times a day. This change makes the individual backup statuses valid for three hours, rather than the one hour they were previously. This is to avoid getting purple dots when we we reboot a backup server and interrupt the 45 minute check, which then won't run again for another 45 minutes causing a 90 minute gap between reports for some hosts and hence purple dots.
-
- Sep 08, 2023
-
-
Dr Catherine Pitt authored
For machines like nest-backup and cerebro-backup we have lots of backup tasks for the same host spread across several zpools, so move-machine-to-zpool.sh can't be used to migrate the contents of a failing zpool/disk. This adds a script to move an individual ZFS which is the target of a backup task to another zpool. It assumes all necessary parent ZFSes already exist on the target. If they don't it fails. It does not yet clean up the old ZFS as it's not had a lot of use.
-
- Sep 04, 2023
-
-
Dr Catherine Pitt authored
-
- Sep 01, 2023
-
-
Dr Catherine Pitt authored
We have started putting extra configuration for sshing to a host in a file in the /etc/chem-zfs-backup-server/zfs-rsync.d/$HOSTNAME directory. This updates the backup migration script to copy that as well as the main config file for the machine. I've chosen to copy the entire directory to catch other files we might want to add in future. There is often an 'exclude' file in there that's autogenerated by the prepare scripts, but copying that won't do any damage; it's just redundant because it will be regenerated when the backup runs.
-
- Aug 30, 2023
-
-
Dr Adam Thorn authored
1. The export is done via set sharenfs which means we shouldn't need to manually manage exports 2. This part of the script does not work because it tries to unexport the old export but by looking up the db record that we have already updated to refer to the new zpool.
-
- Aug 23, 2023
-
-
Dr Catherine Pitt authored
-
- Aug 09, 2023
-
-
Dr Catherine Pitt authored
This adds a new config file which allows setting the command to use for 'rsync' and global options for that command. This is motivated by the need to use an alternative rsync command on Jammy machines, as the system one is too slow. The option for global rsync arguments was added as a way to add the '--trust-sender' flag to all backups to turn off certain checks that we suspect to be the cause of the slowdown, but it didn't help enough to fix the speed problem. Instead we are going to use our own package of an older rsync from before the checking code was added, which of course doesn't support --trust-sender so the global args are left blank.
-
- Jul 27, 2023
-
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
This was unintentionally removed in 53f5ba
-
Dr Adam Thorn authored
Custom options need to be a file passed via -F because we want to specify options for both ssh and scp. They don't have a compatible set of CLI options but both take -F. This supercedes 53f5ba49 ; I had only deployed SSHOPTIONS for one host which I've updated. This also removes the SSHPORT option, which had only been used in the config for one host which I've updated.
-
Dr Adam Thorn authored
These are static files provided by our package, not config files.
-
Dr Adam Thorn authored
These are not config files, and we should not be modifying the package-provided versions of these files. I'm leaving symlinks behind to make sure we don't break all our existing backups though!
-
Dr Adam Thorn authored
I've checked all our live backup servers and made sure these aren't in use (and updated some backup configs to achieve this in some cases)
-
- Jul 24, 2023
-
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
This could/should probably supercede the specific option for SSHPORT as I think usage of that is minimal or perhaps even zero, but we'd have to check if that's in use and make suitable updates to config files before removing it.
-
- May 31, 2023
-
-
Dr Adam Thorn authored
-
- May 30, 2023
-
-
Dr Adam Thorn authored
We had been raising a failure report if we had never seen a succesful backup for a host. However, when we have a host with more than one backup task, we can have the situation where one backup is working OK but the other has never completed correctly. This let to a green report as we had a non-zero number of rows, but we require number_of_good_backups == number_of_tasks !
-
- May 03, 2023
-
-
Dr Adam Thorn authored
tempfile is a debian-ism and jammy warns us that use is deprecated. mktemp has been available on all our debian/ubuntu machines for a long time via coreutils (e.g. it was definitely in wheezy and trusty, and I'm pretty sure since before then too)
-