Commits · master · Yusuf Hamied Department of Chemistry / COs / ansible-xymon

Jun 13, 2024

Retry failed hosts one at a time, not together · 4c6156f7

The cronjob script attempts one retry if jobs fail, as we sometimes see
probelsm on the dom0s where they time out gathering facts. However if
some of the failed hosts have genuinely failed this can lead to
problems: when the playbook run gets to a play where there is only one
applicable host in the set being retried and fails on that host in that
play (quite likely to happen once we are into the specialist plays) then ansible
terminates the whole playbook, because 100% of the hosts in the last
play failed. This leads to confusing results as other hosts that needed
a retry but hadn't reached their failing play at the time, probably
because it was much later in the playbook, may report green not red, as
they hadn't yet reached a changed or failing task when the playbook
ended. The ansible-site test remains green unless the ansible-xymon2027
machine happened to be one of the failing hosts, which is unlikely.

This updates the cronjob to retry the failed hosts one at a time.

4c6156f7

Dec 02, 2022
- Source shell fragment needed to run modules in the oneoff script · d719f83b
  Dr Adam Thorn authored 2 years ago
```
This is already in the cron script; having it in oneoff also
seems sensible and useful. It's needed on jammy and I don't know
why it wasn't needed in the bionic instance...
```
  1.0-ch28
  
  d719f83b
- Cease distribution to bionic · 8cd4eb92
  Dr Adam Thorn authored 2 years ago
  
  8cd4eb92
Nov 29, 2022
- Release to jammy · fa3770aa
  Dr Adam Thorn authored 2 years ago
  
  fa3770aa
Feb 07, 2022

Release 1.0-ch27 · 9bf6b7a2
Dr Adam Thorn authored 2 years ago

1.0-ch27

9bf6b7a2

Don't set inventory by -i argument to ansible-playbook · 207e172c

Dr Adam Thorn authored 2 years ago

The scripts make a point of exporting ANSIBLE_INVENTORY which can then
optionally be overridden when sourcing our config file. Except.. that's
unconditionally overridden because we've hard-coded "-i inventory.py"

Being able to set an alternate ANSIBLE_INVENTORY is useful for
testing changes to these scripts without necessarily using the real
live inventory.

207e172c

cronjob: retry failed hosts once · 4bbc5b96

Dr Adam Thorn authored 2 years ago

We're seeing some particular tasks fail with

"Unable to execute ssh command line on a controller due to: [Errno 12] Cannot allocate memory"

and in particular, the tasks which configure all the VLAN interfaces on our dom0s
and/or fileservers. They always run to completion when subsequently re-run using
the ./oneoff script though.

Whilst it would be good to understand why those tasks seem to trigger unusual memory
usage, I've not made any headway in doing so. Let's make the cronjob script re-run
the playbook on failed hosts for us to avoid spurious tickets.

4bbc5b96

Optionally provide debug logs · 3381efeb

Dr Adam Thorn authored 2 years ago

NB change was made and deployed a couple of weeks ago but I
seem to have failed to commit. See also e.g.

https://gitlab.developers.cam.ac.uk/ch/co/ansibleconf/-/commit/5e06fc9d597e2550458adf9e77281e71f71691ae

3381efeb

Jan 09, 2022
- ensure crobjob has right environment for using environment modules · 4f73ccc2
  Dr Adam Thorn authored 3 years ago
```
Needed as of d7d386d2
```
  1.0-ch25
  
  4f73ccc2
Jan 06, 2022

Switch to using our ansible/server environment module · d7d386d2

Dr Adam Thorn authored 3 years ago

Hmm, if only we had some mechanism for putting the right things on
PATH, PYTHON_PATH etc rather than having to manually faff with those
environment variables - and then forget to update them when we switch
the version of ansible we're using ....

d7d386d2

Remove old perl scripts · 8ac12ecc

Dr Adam Thorn authored 3 years ago

These ceased to be useful when we switched over to a custom
xymon-aware callback

8ac12ecc

Cease distributing to xenial · 09f9152e
Dr Adam Thorn authored 3 years ago

09f9152e

Apr 26, 2021

Tidier killing of ssh-agent after use · e158a6ab

Dr Catherine Pitt authored 3 years ago

As recommended by @ajh221 , eval the ssh-agent -k when killing the
agent. Although it works without, this is tidier.

e158a6ab

Use an ssh-agent in tests to allow cross-machine connections · bcfe9ad0

Dr Catherine Pitt authored 3 years ago

Our ansibleconf repository has some tasks which require sshing from one
client machine to another. This fails when run by ansible-xymon because
the ssh key isn't available to the client machines. This change starts
up an ssh-agent and loads the key into it before running ansible so that
this type of task works. The agent is killed at the end.

bcfe9ad0

Feb 12, 2021

Remove generation of ansible-site test in cronjob · b8ca8711

Dr Catherine Pitt authored 3 years ago

Have extended the xymon callback plugin and site.yml to handle this
instead, see
https://gitlab.developers.cam.ac.uk/ch/co/ansibleconf/-/commit/3aa092bc8b9c955172175ce494c7646fd50bea50

b8ca8711

Feb 10, 2021
- Distribute to Bionic · dbe86439
  Dr Catherine Pitt authored 3 years ago
  
  dbe86439
- Cease distributing to jessie · ae55d44b
  Dr Catherine Pitt authored 3 years ago
  
  ae55d44b
- Version bump because we seem to have untagged releases · 963525c7
  Dr Catherine Pitt authored 3 years ago
```
The live server was running 1.0-ch18 but there's no 1.0-ch18 or 1.0-ch17
tag in the repo.
```
  1.0-ch19
  
  963525c7
- Report overall playbook success/failure to Xymon · 8ddb9c65
  Dr Catherine Pitt authored 3 years ago
```
Otherwise we may not notice when the playbook starts to fail halfway
through.
```
  8ddb9c65
Nov 16, 2020
- Remove explicit conffiles list (makedeb autopopulates with everything in /etc) · 72d947c6
  Dr Adam Thorn authored 4 years ago
  
  72d947c6
May 11, 2020
- Switch to ansible 2.9.6 from 2.7.10. · d6433415
  A.J. Hall authored 4 years ago
  
  d6433415
Apr 18, 2019
- Add warning that vars might be overridden in separate config file · 9e1a036f
  Dr Adam Thorn authored 5 years ago
  
  9e1a036f
Apr 17, 2019
- Switch to ansible 2.7 · 12ce3e3d
  Dr Adam Thorn authored 5 years ago
  
  1.0-ch16
  
  12ce3e3d
Mar 21, 2019
- cron job still uses rather a lot of RAM - lets see if reducing the number of forks helps · 4754354c
  Dr Adam Thorn authored 5 years ago
  
  1.0-ch15
  
  4754354c
Mar 20, 2019
- Switch to using our custom "xymon" stdout callback · ba8894b0
  Dr Adam Thorn authored 5 years ago
  
  1.0-ch14
  
  ba8894b0
- Add makedeb Makefile · 79bd120c
  Dr Adam Thorn authored 5 years ago
  
  79bd120c
Nov 16, 2017
- Move to ansible 2.4 · ff2b4c9f
  Dr Catherine Pitt authored 7 years ago
  
  1.0-ch11
  
  ff2b4c9f
Jun 06, 2017

oneoff script uses its own logfile now · 679efd18

Dr Catherine Pitt authored 7 years ago

This is so I don't keep leaving the main logfile root owned when
running oneoff, which then mucks up the main cron run which runs
as xymon-ansible.

679efd18

May 23, 2017
- Report changed tasks as red not yellow · e07c3e74
  Dr Catherine Pitt authored 7 years ago
  
  e07c3e74
May 22, 2017
- Fix mistake in cronjob script · e00f0dd1
  Dr Catherine Pitt authored 7 years ago
```
Had got a spurious '-l' parameter to ansible-playbook introduced
when I tidied up.
```
  e00f0dd1
May 19, 2017
- Improve packaging · 6bed9b98
  Dr Catherine Pitt authored 7 years ago
```
- Add args check to oneoff script
- Make scripts slightly more configurable
- use -f on rm commands
- ensure permissions correct on homedir
```
  6bed9b98
- Initial commit · 2ac426ab
  Dr Catherine Pitt authored 7 years ago
  
  2ac426ab