Investigate using of create_before_destroy option for node pools
In https://gitlab.developers.cam.ac.uk/uis/devops/devhub/gitlab-runner-infrastructure/-/issues/31#note_599611 we had to perform an emergency upgrade of a node pool. In testing it was noted that terraform destroys the existing node pool before creating the replacement.
Investigate using the create_before_destroy
lifecycle meta-argument on the node pool resource to ensure that a new node pool is created before the old one is destroyed. This allows for workload migration.
Designs
- Show closed items
Activity
-
Newest first Oldest first
-
Show all activity Show comments only Show history only
- Dr Rich Wareham added teamCloud workflowNeeds Refinement labels
added teamCloud workflowNeeds Refinement labels
- Dmitrii Unterov changed title from Enable create-before-destroy for node pools to Investigate using of create_before_destroy option for node pools
changed title from Enable create-before-destroy for node pools to Investigate using of create_before_destroy option for node pools
- Maintainer
Refinement notes:
- check if there will be any downtime
- check if it is possible at all (i.e. having the second node pool with the same name)
Collapse replies - Author Owner
check if it is possible at all (i.e. having the second node pool with the same name)
Mmm. Yes. We might want to have something like
local.pool_generation
or similar appended to the pool name which we can bump if there's a change necessitating a pool re-creation. - Maintainer
I finally got some time for experiments and I got promising results (with no efforts, love it).
So I created the cluster with our module in my test env, and I used lates google provider. When I tried to change node type in my project I got this
# module.cluster.google_container_node_pool.cluster-pool-1[0] will be updated in-place ~ resource "google_container_node_pool" "cluster-pool-1" { ... ~ machine_type = "g1-small" -> "n1-standard-1" ... } Plan: 0 to add, 1 to change, 0 to destroy
I applied and all works like a charm
After that I tried the same in our https://gitlab.developers.cam.ac.uk/uis/devops/devhub/gitlab-runner-infrastructure project (not applied obviously, just plan)
# module.cluster.google_container_node_pool.cluster-pool-1[0] must be replaced ... ~ machine_type = "e2-standard-4" -> "n1-standard-1" # forces replacement ... } Plan: 1 to add, 0 to change, 1 to destroy.
The only difference is that my project is on google v5 provider, and runner-infra is on
version = "~> 4.0"
I bumped the version in here: https://gitlab.developers.cam.ac.uk/uis/devops/devhub/gitlab-runner-infrastructure/-/merge_requests/68
(and to do so I had to bump version in here as well - gcp-pubsub-to-ms-teams!3 (merged))
And after that the problem is gone. So I think there's nothing to do with the `gke-module` itself, as ti has
version = ">= 4.0"
and works fine if the project, that us the module uses google v5 provider.
Edited by Dmitrii Unterov
- Dmitrii Unterov added priority3 Low workflowbacklog labels and removed workflowNeeds Refinement label
added priority3 Low workflowbacklog labels and removed workflowNeeds Refinement label
- Dmitrii Unterov changed iteration to Cloud Team Sprints Mar 13, 2024 - Mar 26, 2024
changed iteration to Cloud Team Sprints Mar 13, 2024 - Mar 26, 2024
- GitLab Automation Bot removed iteration Cloud Team Sprints Mar 13, 2024 - Mar 26, 2024
removed iteration Cloud Team Sprints Mar 13, 2024 - Mar 26, 2024
- GitLab Automation Bot changed iteration to Cloud Team Sprints Mar 27, 2024 - Apr 9, 2024
changed iteration to Cloud Team Sprints Mar 27, 2024 - Apr 9, 2024
- GitLab Automation Bot removed iteration Cloud Team Sprints Mar 27, 2024 - Apr 9, 2024
removed iteration Cloud Team Sprints Mar 27, 2024 - Apr 9, 2024
- GitLab Automation Bot changed iteration to Cloud Team Sprints Apr 10, 2024 - Apr 23, 2024
changed iteration to Cloud Team Sprints Apr 10, 2024 - Apr 23, 2024
- GitLab Automation Bot removed iteration Cloud Team Sprints Apr 10, 2024 - Apr 23, 2024
removed iteration Cloud Team Sprints Apr 10, 2024 - Apr 23, 2024
- GitLab Automation Bot changed iteration to Cloud Team Sprints Apr 24, 2024 - May 7, 2024
changed iteration to Cloud Team Sprints Apr 24, 2024 - May 7, 2024
- GitLab Automation Bot removed iteration Cloud Team Sprints Apr 24, 2024 - May 7, 2024
removed iteration Cloud Team Sprints Apr 24, 2024 - May 7, 2024
- GitLab Automation Bot changed iteration to Cloud Team Sprints May 8, 2024 - May 21, 2024
changed iteration to Cloud Team Sprints May 8, 2024 - May 21, 2024
- Dmitrii Unterov assigned to @du228
assigned to @du228
- Dmitrii Unterov mentioned in merge request gcp-pubsub-to-ms-teams!3 (merged)
mentioned in merge request gcp-pubsub-to-ms-teams!3 (merged)
- Dmitrii Unterov added workflowReview Required label and removed workflowbacklog label
added workflowReview Required label and removed workflowbacklog label
- Ryan Kowalewski added workflowRework label and removed workflowReview Required label
added workflowRework label and removed workflowReview Required label
- Dmitrii Unterov added workflowReview Required label and removed workflowRework label
added workflowReview Required label and removed workflowRework label
- Ryan Kowalewski added workflowDone label and removed workflowReview Required label
added workflowDone label and removed workflowReview Required label
- Ryan Kowalewski closed
closed