  1. 25 May, 2022 1 commit
  2. 24 May, 2022 6 commits
    • Dr Adam Thorn's avatar
      Stop returning "true" from default prepare script · b4b89ed3
      Dr Adam Thorn authored
      This has been silently masking failures, which is a Bad Thing. The only
      errors I've spotted so far are ones hopefully fixed by ad91da90, which
      has the effect that we've been backing up more things than we needed
      (which is better than the opposite possibility, at least!)
    • Dr Adam Thorn's avatar
    • Dr Adam Thorn's avatar
      Fix generation of list of package files to exclude in the standard prepare script · ad91da90
      Dr Adam Thorn authored
      c714a331 introduced an annoying bug here. If the filenames piped to xargs include
      an apostrophe, xargs will complain and stop. We have thus, in practice, been
      including all files that appear in the list after
      /usr/share/sounds/ubuntu/ringtones/Sam's Song.ogg
      !! Also, we've not been properly handling filenames with spaces in, and perhaps
      other filenames too. We thus null-terminate the filenames for ourself.
      Because I want to get to the point where we can have a sensible return code
      from the prepare script, this commit also adds some set -e commands to try
      to ensure errors bubble up. This is a little tedious to achieve due to all
      the subshells in this script.
      For the same reason, we split off the "diff" command into a "set +e" block
      because diff returns non-zero if differences are found. We do not consider
      that to be an error!
    • Dr Adam Thorn's avatar
      Improve check of whether mysql is(/should be) running in default prepare script · 40f64d5c
      Dr Adam Thorn authored
      We think the intention of the old version of this block is
      "if mysql is running, dump the databases".
      The check has been buggy for a long long time: it reads the contents
      of my.cnf ... which nowadays just has some !includedir directives.
      This lead to setting
      and, it turns out, [ -S "" ] returns true.
      The main intention of this part of the prepare script is to backup
      vaguely normal machines running a simple mysql database, such as
      small group webservers. We thus don't need to consider every
    • Dr Adam Thorn's avatar
    • Dr Adam Thorn's avatar
      Cease trying to configure dateext logrotation in prepare script · dd28c9af
      Dr Adam Thorn authored
      On servers we configure this via ansible.
      On workstations it is not having the desired effect because the stock
      logrotate.conf includes the line ...
      and so the simple grep thinks it's already configured! I'll add an
      ansible task in due course to actually configure this.
  3. 25 Apr, 2022 1 commit
  4. 10 Mar, 2022 1 commit
  5. 09 Mar, 2022 4 commits
    • Dr Adam Thorn's avatar
    • Dr Adam Thorn's avatar
      Release 0.9-ch90 · 63abf952
      Dr Adam Thorn authored
    • Dr Adam Thorn's avatar
      Add script to move a backup to another server · 3d245d54
      Dr Adam Thorn authored
      i.e. send the ZFS, update db records and copy the config files. The
      sql inserts should broadly mirror those done when setting up a new
      backup, though with field values matching those in the source database
      rather than just using the defaults.
      This script has so only been tested for the case of moving a "simple"
      backup where a host has a single backup task and no special config.
      It's quite likely there'll be bugs to fix for other cases that
      we'll find in due course.
    • Dr Adam Thorn's avatar
      Remove move-machine.sh script · 56a40eed
      Dr Adam Thorn authored
      I'm about to add a script to send a backup to a different backup server.
      It's thus probably best if the script names describe their functions in a
      little more detail
  6. 20 Dec, 2021 1 commit
  7. 17 Nov, 2021 3 commits
  8. 14 Jul, 2021 1 commit
    • Dr Adam Thorn's avatar
      Prepend reporting lines with the zfs target name · 6e536df5
      Dr Adam Thorn authored
      This will let us use zfs_target as the name of a subtest which
      in turn means we would be able to separately log and graph multiple
      backup targets associated with a single host.
      This change does not affect the current parsing performed when
      we input data into postgres: it uses non-anchored regexps to
      identify SpaceUsed etc so prepending extra text won't change
  9. 09 Jul, 2021 1 commit
    • Dr Adam Thorn's avatar
      Partial fix for behaviour where we see multiple backups for one task running at once · 3941a9df
      Dr Adam Thorn authored
      In some versions of backup_queue (I think just on splot4 now), we use
      backup_log.isrunning as part of the logic to determine if a task should be
      enqueued. The problem is that scheduler.pl makes three writes on the table:
      1) an insert when the task is queued (a trigger sets isrunning='t' here)
      2) an update to set started_processing when the task begins (a trigger
         sets isrunning='f' here!!!!)
      3) an update to set ended_processing when the task finishes (a trigger
         again sets isrunning='f' here)
      Thus, being careful to only set isrunning='f' when a backup task is finished
      (i.e. when we set ended_processing=now() in scheduler.pl) seems sensible, and
      empirically does seem to lead to the right backup_queue without duplicates.
      This commit will only affect new setups of backup servers; the change has been
      deployed to live servers with an ad hoc script I've run.
      I think we only see this on splot4 because it has a very different definition of
      the backup_queue view to a) the one defined in this file, b) the one that's on
      all the other backup servers. If I just try to replace the view on splot4, though,
      any attempt to select from it just times out so there may be other relations on
      splot4 that need updating too.
      NB the obvious thing missing on splot4 is
      WHERE ((backup_log.backup_task_id = a.backup_task_id) AND (backup_log.ended_processing IS NULL))) < 1))
      which feels like a hack but nonetheless ensures in practice that we don't get
      duplicate queued tasks.
  10. 08 Jul, 2021 3 commits
    • Dr Adam Thorn's avatar
      Ensure pg-dump-script includes a dump of roles · 67d141b5
      Dr Adam Thorn authored
      We don't always need the role data, if the presumption is that we'll
      be doing a pg_restore in conjunction with an ansible role which creates
      all required roles. But, having a copy of the role data will never hurt!
      It also gives us a straightforward way of restoring a database to a
      standalone postgres instance without having to have provisioned a
      dedicated VM with the relevant ansible roles.
    • Dr Adam Thorn's avatar
      Add a script to do a postgres backup via pg_dump · 5b4a8757
      Dr Adam Thorn authored
      At present we use myriad one-off per host scripts to do a pg_dump,
      and they all do (or probably should do) the same thing. In combination
      with setting options in the host's backup config file, I think
      this single script covers all our routine pg backups.
    • Dr Adam Thorn's avatar
      Call PRE and POST with same args as zfs-rsync.sh · e01d7ebc
      Dr Adam Thorn authored
      we were just passing the hostname. Adding extra args should
      not impact any existing script, but will let us write better/
      more maintainable/deduplicated PRE scripts
  11. 29 Jun, 2021 1 commit
    • Catherine Pitt's avatar
      Add an outline script for moving a whole zpool · 7ae60f97
      Catherine Pitt authored
      This came about because a disk has failed on nest-backup, which only has
      subdirectory backups of nest-filestore-0 and so move-machine.sh was not
      going to be helpful - it assumes all tasks for a machine are on the same
      zpool which isn't true there. In this case I did the move by hand, but
      have sketched out the steps in the script in the hope that next time we
      have to do this we'll do it by looking at the script and running bits by
      hand, then improve the script a bit, and continue until it's usable.
  12. 18 Jun, 2021 3 commits
  13. 15 Jun, 2021 1 commit
    • Catherine Pitt's avatar
      Add prepare-redhat script · 83e54e26
      Catherine Pitt authored
      prepare-nondebian does not work on RedHat machines running MySQL as the
      paths are different, so providing a fixed version. prepare-nondebian has
      historically been used more widely than just RedHat, hence the decision
      to provide a RedHat-specific version and not just edit it.
  14. 08 Jun, 2021 1 commit
    • Dr Adam Thorn's avatar
      Add crossmnt to list of default NFS options · ca4eaf3b
      Dr Adam Thorn authored
      This is needed on focal if a client is to be able to access snapshots over NFS.
      From the docs I don't see why we didn't also need this option on xenial,
      but empirically, we need it on focal. (e.g. RT-207229)
  15. 12 May, 2021 1 commit
    • Catherine Pitt's avatar
      Fix a bug in the move-machine script · 75db08dc
      Catherine Pitt authored
      The generation of the command to unexport NFS filesystems could generate
      an invalid command. Leading spaces were not being stripped, and in cases
      where there is more than one backup target for a machine we need to
      unexport every target. Because we also had 'set -e' in operation at this
      point, the script would fail there and never clean up the moved ZFS. I
      don't mind if we fail to unexport; if that's subsequently a problem for
      removing the ZFS then the script will fail at that point.
      This change makes the script generate better exportfs -u commands and
      not exit if they fail.
  16. 30 Apr, 2021 1 commit
    • Catherine Pitt's avatar
      Make database connections short-lived · e40c1a55
      Catherine Pitt authored
      The code used to open a database connection for each thread and leave
      them open for as long as the scheduler ran. This worked reasonably well
      until we moved to PostgreSQL 13 on Focal, although the scheduler would
      fail if the database was restarted because there was no logic to
      reconnect after a connection dropped.
      On Focal/PG13 the connection for the 'cron' thread steadily consumes
      memory until it has exhausted everything in the machine. This appears to
      be a Postgres change rather than a Perl DBI change: the problem can be
      reproduced by sitting in psql and running 'select * from backup_queue'
      repeatedly. Once or twice a minute an instance of this query will cause
      the connection to consume another MB of RAM which is not released until
      the database connection is closed. The cron thread runs that query every
      two seconds. My guess is it's something peculiar about the view that
      query selects from - the time interval thing is interesting.
      This needs more investigation.
      But in the meantime I'd like to have backup servers that don't endlessly
      gobble RAM, so this change makes the threads connect to the database
      only when they need to, and closes the connection afterwards. This
      should also make things work better over database restarts but that's
      not been carefully tested.
  17. 18 Jan, 2021 1 commit
  18. 06 Jan, 2021 1 commit
    • Dr Adam Thorn's avatar
      Pipe list of package-files through realpath, to resolve any symlinks · c714a331
      Dr Adam Thorn authored
      As of focal a bunch of top-level dirs are symlinks (eg /lib -> /usr/lib) but
      the deb packages still deploy files to the symlink rather than the real dir.
      Thus if we just take the contents of all the *.list files we end up not
      excluding lots of files that are in fact provided by debs
  19. 11 Dec, 2020 3 commits
  20. 16 Nov, 2020 1 commit
  21. 09 Nov, 2020 1 commit
  22. 06 Nov, 2020 3 commits