Partial fix for behaviour where we see multiple backups for one task running at once (3941a9df) · Commits · Yusuf Hamied Department of Chemistry / COs / backup-scheduler

Commit 3941a9df authored 3 years ago by Dr Adam Thorn

Partial fix for behaviour where we see multiple backups for one task running at once

In some versions of backup_queue (I think just on splot4 now), we use
backup_log.isrunning as part of the logic to determine if a task should be
enqueued. The problem is that scheduler.pl makes three writes on the table:

1) an insert when the task is queued (a trigger sets isrunning='t' here)
2) an update to set started_processing when the task begins (a trigger
sets isrunning='f' here!!!!)
3) an update to set ended_processing when the task finishes (a trigger
again sets isrunning='f' here)

Thus, being careful to only set isrunning='f' when a backup task is finished
(i.e. when we set ended_processing=now() in scheduler.pl) seems sensible, and
empirically does seem to lead to the right backup_queue without duplicates.

This commit will only affect new setups of backup servers; the change has been
deployed to live servers with an ad hoc script I've run.

I think we only see this on splot4 because it has a very different definition of
the backup_queue view to a) the one defined in this file, b) the one that's on
all the other backup servers. If I just try to replace the view on splot4, though,
any attempt to select from it just times out so there may be other relations on
splot4 that need updating too.

NB the obvious thing missing on splot4 is

WHERE ((backup_log.backup_task_id = a.backup_task_id) AND (backup_log.ended_processing IS NULL))) < 1))

which feels like a hack but nonetheless ensures in practice that we don't get
duplicate queued tasks.

parent 67d141b5

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 6 additions and 3 deletions

Dr Adam Thorn @alt36 · 3 years ago

Author Owner

For posterity, I've now also updated the definition of backup_queue on splot4 to match the other backup servers. It queries backup_log, and due to historical legacy that table was rather huge on splot4. I've done much housekeeping and trimmed ~4e6 rows, and splot4 can now query its (new) backup_queue view in a sensible time, O(second).

For posterity, I've now also updated the definition of backup_queue on splot4 to match the other backup servers. It queries backup_log, and due to historical legacy that table was rather huge on splot4. I've done much housekeeping and trimmed ~4e6 rows, and splot4 can now query its (new) backup_queue view in a sensible time, O(second).

Please register or to comment