FAQ | This is a LIVE service | Changelog

Skip to content
Snippets Groups Projects
  1. Jun 13, 2024
    • Dr Catherine Pitt's avatar
      Retry failed hosts one at a time, not together · 4c6156f7
      Dr Catherine Pitt authored
      The cronjob script attempts one retry if jobs fail, as we sometimes see
      probelsm on the dom0s where they time out gathering facts. However if
      some of the failed hosts have genuinely failed this can lead to
      problems: when the playbook run gets to a play where there is only one
      applicable host in the set being retried and fails on that host in that
      play (quite likely to happen once we are into the specialist plays) then ansible
      terminates the whole playbook, because 100% of the hosts in the last
      play failed. This leads to confusing results as other hosts that needed
      a retry but hadn't reached their failing play at the time, probably
      because it was much later in the playbook, may report green not red, as
      they hadn't yet reached a changed or failing task when the playbook
      ended. The ansible-site test remains green unless the ansible-xymon2027
      machine happened to be one of the failing hosts, which is unlikely.
      
      This updates the cronjob to retry the failed hosts one at a time.
      1.0-ch29
      4c6156f7
  2. Dec 02, 2022
  3. Nov 29, 2022
  4. Feb 07, 2022
    • Dr Adam Thorn's avatar
      Release 1.0-ch27 · 9bf6b7a2
      Dr Adam Thorn authored
      1.0-ch27
      9bf6b7a2
    • Dr Adam Thorn's avatar
      Don't set inventory by -i argument to ansible-playbook · 207e172c
      Dr Adam Thorn authored
      The scripts make a point of exporting ANSIBLE_INVENTORY which can then
      optionally be overridden when sourcing our config file. Except.. that's
      unconditionally overridden because we've hard-coded "-i inventory.py"
      
      Being able to set an alternate ANSIBLE_INVENTORY is useful for
      testing changes to these scripts without necessarily using the real
      live inventory.
      207e172c
    • Dr Adam Thorn's avatar
      cronjob: retry failed hosts once · 4bbc5b96
      Dr Adam Thorn authored
      We're seeing some particular tasks fail with
      
      "Unable to execute ssh command line on a controller due to: [Errno 12] Cannot allocate memory"
      
      and in particular, the tasks which configure all the VLAN interfaces on our dom0s
      and/or fileservers. They always run to completion when subsequently re-run using
      the ./oneoff script though.
      
      Whilst it would be good to understand why those tasks seem to trigger unusual memory
      usage, I've not made any headway in doing so. Let's make the cronjob script re-run
      the playbook on failed hosts for us to avoid spurious tickets.
      4bbc5b96
    • Dr Adam Thorn's avatar
      Optionally provide debug logs · 3381efeb
      Dr Adam Thorn authored
      NB change was made and deployed a couple of weeks ago but I
      seem to have failed to commit. See also e.g.
      
      https://gitlab.developers.cam.ac.uk/ch/co/ansibleconf/-/commit/5e06fc9d597e2550458adf9e77281e71f71691ae
      3381efeb
  5. Jan 09, 2022
  6. Jan 06, 2022
  7. Apr 26, 2021
  8. Feb 12, 2021
  9. Feb 10, 2021
  10. Nov 16, 2020
  11. May 11, 2020
  12. Apr 18, 2019
  13. Apr 17, 2019
  14. Mar 21, 2019
  15. Mar 20, 2019
  16. Nov 16, 2017
  17. Jun 06, 2017
  18. May 23, 2017
  19. May 22, 2017
  20. May 19, 2017
Loading