FAQ | This is a LIVE service | Changelog

Skip to content
Snippets Groups Projects
  1. Jul 27, 2023
  2. Jul 24, 2023
    • Dr Adam Thorn's avatar
      add option to specify SSHOPTIONS to rsync tasks · 53f5ba49
      Dr Adam Thorn authored
      This could/should probably supercede the specific option for SSHPORT
      as I think usage of that is minimal or perhaps even zero, but we'd
      have to check if that's in use and make suitable updates to config
      files before removing it.
      53f5ba49
  3. May 31, 2023
  4. May 30, 2023
    • Dr Adam Thorn's avatar
      ensure xymon dot is red if we have a backup task that has never completed · 5fbb64c2
      Dr Adam Thorn authored
      We had been raising a failure report if we had never seen a succesful backup
      for a host. However, when we have a host with more than one backup task, we
      can have the situation where one backup is working OK but the other has never
      completed correctly. This let to a green report as we had a non-zero number
      of rows, but we require number_of_good_backups == number_of_tasks !
      0.9-ch101
      5fbb64c2
  5. May 03, 2023
    • Dr Adam Thorn's avatar
      replace deprecated tempfile call with mktemp · d3c8ab49
      Dr Adam Thorn authored
      tempfile is a debian-ism and jammy warns us that use is deprecated.
      mktemp has been available on all our debian/ubuntu machines for a long time
      via coreutils (e.g. it was definitely in wheezy and trusty, and I'm pretty
      sure since before then too)
      0.9-ch100
      d3c8ab49
  6. Jan 05, 2023
    • Dr Catherine Pitt's avatar
      Redhat prepare scripts compresses MySQL backup · d2a7ea19
      Dr Catherine Pitt authored
      The incremental backups on some cluster head nodes are growing quite
      large, and most of the churn is the uncompressed MySQL dumpfile which
      changes with every backup and can be over 1GB. This commit compresses
      that data, which reduces the size of the file by 90% on at least one
      machine.
      0.9-ch99
      d2a7ea19
  7. Dec 19, 2022
  8. Oct 31, 2022
  9. Oct 26, 2022
  10. May 25, 2022
  11. May 24, 2022
  12. Apr 25, 2022
  13. Mar 10, 2022
  14. Mar 09, 2022
  15. Dec 20, 2021
  16. Nov 17, 2021
  17. Jul 14, 2021
    • Dr Adam Thorn's avatar
      Prepend reporting lines with the zfs target name · 6e536df5
      Dr Adam Thorn authored
      This will let us use zfs_target as the name of a subtest which
      in turn means we would be able to separately log and graph multiple
      backup targets associated with a single host.
      
      This change does not affect the current parsing performed when
      we input data into postgres: it uses non-anchored regexps to
      identify SpaceUsed etc so prepending extra text won't change
      anything
      0.9-ch85
      6e536df5
  18. Jul 09, 2021
    • Dr Adam Thorn's avatar
      Partial fix for behaviour where we see multiple backups for one task running at once · 3941a9df
      Dr Adam Thorn authored
      In some versions of backup_queue (I think just on splot4 now), we use
      backup_log.isrunning as part of the logic to determine if a task should be
      enqueued. The problem is that scheduler.pl makes three writes on the table:
      
      1) an insert when the task is queued (a trigger sets isrunning='t' here)
      2) an update to set started_processing when the task begins (a trigger
         sets isrunning='f' here!!!!)
      3) an update to set ended_processing when the task finishes (a trigger
         again sets isrunning='f' here)
      
      Thus, being careful to only set isrunning='f' when a backup task is finished
      (i.e. when we set ended_processing=now() in scheduler.pl) seems sensible, and
      empirically does seem to lead to the right backup_queue without duplicates.
      
      This commit will only affect new setups of backup servers; the change has been
      deployed to live servers with an ad hoc script I've run.
      
      I think we only see this on splot4 because it has a very different definition of
      the backup_queue view to a) the one defined in this file, b) the one that's on
      all the other backup servers. If I just try to replace the view on splot4, though,
      any attempt to select from it just times out so there may be other relations on
      splot4 that need updating too.
      
      NB the obvious thing missing on splot4 is
      
      WHERE ((backup_log.backup_task_id = a.backup_task_id) AND (backup_log.ended_processing IS NULL))) < 1))
      
      which feels like a hack but nonetheless ensures in practice that we don't get
      duplicate queued tasks.
      3941a9df
  19. Jul 08, 2021
    • Dr Adam Thorn's avatar
      Ensure pg-dump-script includes a dump of roles · 67d141b5
      Dr Adam Thorn authored
      We don't always need the role data, if the presumption is that we'll
      be doing a pg_restore in conjunction with an ansible role which creates
      all required roles. But, having a copy of the role data will never hurt!
      It also gives us a straightforward way of restoring a database to a
      standalone postgres instance without having to have provisioned a
      dedicated VM with the relevant ansible roles.
      0.9-ch84
      67d141b5
    • Dr Adam Thorn's avatar
      Add a script to do a postgres backup via pg_dump · 5b4a8757
      Dr Adam Thorn authored
      At present we use myriad one-off per host scripts to do a pg_dump,
      and they all do (or probably should do) the same thing. In combination
      with setting options in the host's backup config file, I think
      this single script covers all our routine pg backups.
      0.9-ch83
      5b4a8757
    • Dr Adam Thorn's avatar
      Call PRE and POST with same args as zfs-rsync.sh · e01d7ebc
      Dr Adam Thorn authored
      we were just passing the hostname. Adding extra args should
      not impact any existing script, but will let us write better/
      more maintainable/deduplicated PRE scripts
      0.9-ch82
      e01d7ebc
  20. Jun 29, 2021
    • Dr Catherine Pitt's avatar
      Add an outline script for moving a whole zpool · 7ae60f97
      Dr Catherine Pitt authored
      This came about because a disk has failed on nest-backup, which only has
      subdirectory backups of nest-filestore-0 and so move-machine.sh was not
      going to be helpful - it assumes all tasks for a machine are on the same
      zpool which isn't true there. In this case I did the move by hand, but
      have sketched out the steps in the script in the hope that next time we
      have to do this we'll do it by looking at the script and running bits by
      hand, then improve the script a bit, and continue until it's usable.
      7ae60f97
  21. Jun 18, 2021
Loading