- Jul 08, 2021
-
-
Dr Adam Thorn authored
We don't always need the role data, if the presumption is that we'll be doing a pg_restore in conjunction with an ansible role which creates all required roles. But, having a copy of the role data will never hurt! It also gives us a straightforward way of restoring a database to a standalone postgres instance without having to have provisioned a dedicated VM with the relevant ansible roles.
-
Dr Adam Thorn authored
At present we use myriad one-off per host scripts to do a pg_dump, and they all do (or probably should do) the same thing. In combination with setting options in the host's backup config file, I think this single script covers all our routine pg backups.
-
Dr Adam Thorn authored
we were just passing the hostname. Adding extra args should not impact any existing script, but will let us write better/ more maintainable/deduplicated PRE scripts
-
- Jun 29, 2021
-
-
Dr Catherine Pitt authored
This came about because a disk has failed on nest-backup, which only has subdirectory backups of nest-filestore-0 and so move-machine.sh was not going to be helpful - it assumes all tasks for a machine are on the same zpool which isn't true there. In this case I did the move by hand, but have sketched out the steps in the script in the hope that next time we have to do this we'll do it by looking at the script and running bits by hand, then improve the script a bit, and continue until it's usable.
-
- Jun 18, 2021
-
-
Dr Adam Thorn authored
I don't think there's a sensible default quota; the value for a workstation will be very different to a tiny VM, for example.
-
- Jun 08, 2021
-
-
Dr Adam Thorn authored
This is needed on focal if a client is to be able to access snapshots over NFS. From the docs I don't see why we didn't also need this option on xenial, but empirically, we need it on focal. (e.g. RT-207229)
-
- May 12, 2021
-
-
Dr Catherine Pitt authored
The generation of the command to unexport NFS filesystems could generate an invalid command. Leading spaces were not being stripped, and in cases where there is more than one backup target for a machine we need to unexport every target. Because we also had 'set -e' in operation at this point, the script would fail there and never clean up the moved ZFS. I don't mind if we fail to unexport; if that's subsequently a problem for removing the ZFS then the script will fail at that point. This change makes the script generate better exportfs -u commands and not exit if they fail.
-
- Apr 30, 2021
-
-
Dr Catherine Pitt authored
The code used to open a database connection for each thread and leave them open for as long as the scheduler ran. This worked reasonably well until we moved to PostgreSQL 13 on Focal, although the scheduler would fail if the database was restarted because there was no logic to reconnect after a connection dropped. On Focal/PG13 the connection for the 'cron' thread steadily consumes memory until it has exhausted everything in the machine. This appears to be a Postgres change rather than a Perl DBI change: the problem can be reproduced by sitting in psql and running 'select * from backup_queue' repeatedly. Once or twice a minute an instance of this query will cause the connection to consume another MB of RAM which is not released until the database connection is closed. The cron thread runs that query every two seconds. My guess is it's something peculiar about the view that query selects from - the time interval thing is interesting. This needs more investigation. But in the meantime I'd like to have backup servers that don't endlessly gobble RAM, so this change makes the threads connect to the database only when they need to, and closes the connection afterwards. This should also make things work better over database restarts but that's not been carefully tested.
-
- Dec 11, 2020
-
-
Dr Catherine Pitt authored
Can't say "keys $foo", must say "keys %{$foo}" to get the keys of a hash pointed to by a hash reference $foo. This broke with the perl in focal.
-
- Oct 06, 2020
-
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
NB this is so we can put things in LOGFILE at earlier points than the script does at present
-
Dr Adam Thorn authored
We capture the error code at a few points in the script, and then (intend to) log something about what went wrong. But if we set -e we immediately bail before logging anything useful!
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
Resolves #1
-
- Apr 07, 2020
-
-
A.J. Hall authored
-
- Dec 18, 2019
-
-
Dr Adam Thorn authored
-
- Jul 30, 2019
-
-
Dr Adam Thorn authored
We now e.g. have a cron job on cerebro-backup which calls these scripts, where /sbin is not on the $PATH.
-
- Jul 23, 2019
-
-
Dr Catherine Pitt authored
For cerebro-backup where we do backups by the directory rather than the whole server. We need to have one machine being backed up (cerebro-filestore) but a task per directory.
-
- Apr 23, 2019
-
-
Dr Catherine Pitt authored
The new-backup-rsnapshot script understands a 'postgres' argument, but this set up a postgres backup in an old style that we no longer use. This change updates it to do some of the work of setting up a new style postgres backup and tell the user what else they might need to edit to make it go; it varies quite a lot depending on server.
-
- Jan 16, 2019
-
-
Dr Adam Thorn authored
Let's not worry if we haven't backed up a machine that has been offline for 3 months (instead of 6 months)
-
- Nov 06, 2018
-
-
Dr Adam Thorn authored
-
- Oct 22, 2018
-
-
Dr Adam Thorn authored
-
- Oct 18, 2018
-
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
From the comment in the script, I think this was an attempt to ignore an error about setting GIDs. The only place I saw that particular error occuring in practice was for loopback-mounted isos on OS builders. We shouldn't be backing those up in the first place! Exit code 23 is "partial transfer due to error" and so can be due to problems other than failing to set GIDs
-
- Jun 19, 2018
-
-
Dr Adam Thorn authored
-
- Jan 11, 2018
-
-
Dr Adam Thorn authored
-
- Aug 11, 2017
-
-
Dr Adam Thorn authored
The script runs some SQL which ultimately determines the time at which the last backup with an exit code of zero was. If there has never been a successful backup, we ended up reporting a clear dot. Whilst this may lead to a few transitory red dots immediately after adding a new host, I think this is preferable to not realising for months that a host isn't being backed up! See ticket 154527
-
- May 22, 2017
-
-
Dr Catherine Pitt authored
-
- Mar 29, 2017
-
-
Dr Adam Thorn authored
For offline machines, use the time between last backup and last change of conn dot status to decide on backup status
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
-
- Mar 28, 2017
-
-
Dr Adam Thorn authored
-
- Feb 14, 2017
-
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
-
- Jan 12, 2017
-
-
Dr Adam Thorn authored
-
Dr Adam Thorn authored
-
- Dec 08, 2016
-
-
Dr Adam Thorn authored
-
- Dec 02, 2016
-
-
Dr Adam Thorn authored
-
- Sep 07, 2016
-
-
Dr. Frank Lee authored
-