The module will currently create a VPC for the purpose of assigning a static external IP to the Cloud Run service, but it can't assign the Cloud Run service to an existing VPC for ingress protection purposes. This issue covers adding support for specifying an existing VPC (as a variable) and connecting the Cloud Run service to that VPC such that ingress traffic will only come from the VPC ("internal"). This should include setting up the connection to the VPC (either Serverless VPC Access connector or Direct VPC), and permitting the use of Private Service Connect (I don't think any changes are required to this module for that, but needs to be confirmed).
Using a VPC appears more straightforward with the Cloud Run v2 API, so the option to use Cloud Run v2 API should be supported as part of this change (and using v2 should be compulsory when using an existing VPC).
I have a couple of comments/thoughts on this proposal as follows.
Move to Cloud Run V2 as part of a new major release
My main thought is that I really don't want to be in a situation where we're trying to support both the google_cloud_run_service and google_cloud_run_v2_service in the same release. I propose that we drop support for the google_cloud_run_service resource and release the proposed changes as a breaking 9.0.0 release.
I don't think the creation of the VPC Connector should be included in this module. I'd like it to simply be noted as a prerequisite, in the same way that Google do on their own Cloud Run module.
My concerns with this are:
That these resources should usually (in our case) be created per-project, not per-Cloud Run service. For example, if you have three Cloud Run services in the same project/network and you want all of them to use a VPC connector for ingress protection (or the static IP stuff, see my comment below) you will end up either creating three lots of the same networking resources, or you'd have to pick one of the Cloud Run services to deploy the resources and then configure the other Cloud Run services to be dependant on the resources created by the first one. Both of these options are not ideal and potentially confusing.
That future developers may see that the a connector already exists and hook up to it for other use cases, not really understanding that it is managed by the cloud run module. In theory, although I appreciate it is unlikely, the Cloud Run module could then be removed from the code in the future, destroying the connector and breaking any other resources that had decided to also use the connector.
I do appreciate that we already have a VPC connector being created in this module in static_egress.tf, however, this is something that I also disagree with and want to refactor out of this module at some point as well. I feel like we might need a separate "gcp-networking" Terraform module to contain some of these per-project networking customisations so that we can still provide a user-friendly interface in the boilerplate etc. but the resources are managed via a central networking-specific module.
Moving forwards
I do appreciate that what I'm proposing is a lot more work than this issue initially requires and I do not want to slow down any development of projects that need this functionality. So, I wonder if in reality it would be better for us to start a new repo (gcp-cloud-run-app-v2??) to implement the larger changes I'm proposing alongside this repo, with a view to deprecating this repo at some point in the future when we transition projects to the new module? This would also tie in nicely with a number of other workflow/testing improvements that I have been wanting to develop a pattern for use with our Terraform modules as it would give me one to work on in anger.
Anyway, brain dump over. Interested in people's thoughts.
All good points. Particularly the "but what if you want multiple services and one VPC connector". Separately I'm thinking that we're going to end up just needing to let people pass the entire template block in wholesale to support things like sidecar containers, etc.
[...]
I don't think the creation of the VPC Connector should be included in this module. I'd like it to simply be noted as a prerequisite, in the same way that Google do on their own Cloud Run module.
This may derail the thread. What are the pros and cons of moving to Google's module instead of making a v2 one for us?
Ah, just to clarify, I am not proposing moving to Google's module. I was simply pointing out that they have made the decision to not implement to VPC connector in their code and instead list it as a prerequisite on their README.md. I think there is still lots of value in rolling our own module.
I am not proposing moving to Google's module. [...] I think there is still lots of value in rolling our own module.
Suppose, for the sake of discussion, I did propose moving to Google's module for Cloud Run v2 services. What do we lose which we don't also want to split out, like VPCs?
Ah, I just re-read your comment and realised you might actually be implying the "what if we did?" .
The Google module is such a thin wrapper around the google_cloud_run_service resource that I honestly would always recommend using the resource directly and avoiding all the bloat that comes with the module (i.e. endless dynamic blocks for options that we don't use). Also, the google module also doesn't support Cloud Run V2 yet .
I think there is value in running an in house module to add things like helpful defaults, enforce our own security policies, add custom IAM bindings etc.
Google's module only appears to support v1. I can't find an official module for v2.
That's a good reason :).
I think there is value in running an in house module to add things like helpful defaults, enforce our own security policies, add custom IAM bindings etc.
- so maybe we should identify what we'd like to keep and what we'd like to change. So, as a straw man, I'm hearing:
We like:
auto-configuration of a service account identity with the option to override with an externally supplied identity,
convenience variables for passing in SQL instances,
pre-deployment jobs,
ignoring certain annotations which Google likes to change under our feet,
optional auto-grant of SQL client roles to the service account identity, and
turnkey alerting and monitoring.
We'd be OK with removing:
domain mapping and related TLS certificate stuff in favour of always using an external load balancer,
VPC stuff in favour of a "paved-path" recommended approach for doing it outside of the module, and
secrets volume/environment variables in favour of a recommended pattern for template.
We'd like to add:
nearly a complete passthrough of the template argument in order to support volumes, sidecar containers, etc,
some way of configuring the traffic block to allow for blue-green/canary deployments.
I worry about dropping support for google_cloud_run_service, we have a lot of project using this and upgrading them will be an almighty task. I appreciate that we can warn people that we're deprecating v1, but I still think we're going to need to support this version for a long time to come.
I appreciate that we can warn people that we're deprecating v1, but I still think we're going to need to support this version for a long time to come.
I think that's true but I think we're talking about whether we add more features to the 8.x series or not? We can keep 8.x as a supported release and fix bugs as necessary to support Cloud Run v1 but if someone runs into some new feature they need adding, we carrot/stick them into moving to Cloud Run v2.
I think that's true but I think we're talking about whether we add more features to the 8.x series or not? We can keep 8.x as a supported release and fix bugs as necessary to support Cloud Run v1 but if someone runs into some new feature they need adding, we carrot/stick them into moving to Cloud Run v2.
^ This is exactly how I imagined it would work if we go the major version 9.0.0 route. Or we run the repos in parallel as I mentioned.
And I'd be OK with that living in a long-lived next branch while we get it in shape and releasing it as "the version" of the module once we're happy.
My only concern with this is the number of conflicts that we'll need to work through when we eventually go to merge back to main. That would be my only reservation in this particular instance.
removed support for domain mapping without an external load balancer
changed the ingress settings (to be closer to the Cloud Run v2 API)
changed format of the memory limit, CPU limit and timeout settings to match the Cloud Run v2 API
In addition I've changed the static IP feature to use Direct VPC rather than a serverless VPC connector (as the latter requires the use of an additional VM), which would change quite a bit of existing infra when deployed but I don't think is backwards compatible in terms of functionality or configuration settings.
The only feature I've added so far is the option to connect to an existing VPC via Direct VPC. There are lots of changes to how the Terraform resource is configured but that's mostly a case of rewiring that can be hidden from apps that use our module. If we wanted more backwards compatibility then we could accept the old ingress, memory limit, CPU limit and timeout settings and quietly convert them.
Google seem to be quietly shifting over to the Cloud Run v2 API (I notice most references to it seem to have disappeared from their documentation; meanwhile the docs for the Terraform resource just say to use v2), so we should probably look towards shifting our projects over to the new API where it's not too inconvenient. That probably pushes us in the direction of creating a new branch for the v2 API rather than a new repository, or if domain mapping without a load balancer is a rarely/never used feature then we could switch straight over to the Cloud Run v2 API in v9 of our Cloud Run module.
Side-note: we should be clear on the distinction between the Cloud Run v2 API and the 2nd generation execution environment. The latter can be used regardless of which API version is being used. This issue is about changing our Cloud Run module to use the v2 API (although we could add an option to use the 2nd generation execution environment at the same time).
Side-note: we should be clear on the distinction between the Cloud Run v2 API and the 2nd generation execution environment. The latter can be used regardless of which API version is being used. This issue is about changing our Cloud Run module to use the v2 API (although we could add an option to use the 2nd generation execution environment at the same time).
Oh, Lord. I hadn't thought about that confusion. Yes, the 2nd generation question is an orthogonal one and I think, for the vast majority of our cases, we either don't care or weakly prefer 1st generation for better scale-up-from-zero performance.
Just to add that Direct VPC and Domain Mapping are both currently in preview and are offered under the "Pre-GA Offerings Terms" only. And if the Domain Mapping service is anything to go by, we should definitely not be looking to use Direct VPC until it is GA. Domain Mapping has been around for a couple of years in Pre-GA now and it looks like it is being dropped in favour of a Global Application Load Balancer or Firebase Hosting.
Looking at the features, a Global Application Load Balancer is a reasonable alternative to Domain Mapping, with the latter having quite a few limitations in comparison. Direct VPC is better than Serverless VPC Access connecters in just about every respect (including being much cheaper).
It's not too hard to swap between them though. I've mainly gone with Direct VPC for now (in the malware scanner product) because it doesn't have the fixed VM cost (which I don't really want during development). Maybe we could have it as a configurable option?
Indeed, this was what was proposed in Cloud Team refinement today.
For reference here is the draft issue regarding planning the next major release of the GCP Cloud Run App Terraform module where this discussion will continue.
Dave Hartchanged title from Add VPC and Cloud Run v2 support to the Cloud Run App Terraform module to Add VPC and Cloud Run v2 API support to the Cloud Run App Terraform module
changed title from Add VPC and Cloud Run v2 support to the Cloud Run App Terraform module to Add VPC and Cloud Run v2 API support to the Cloud Run App Terraform module
Dave Hartchanged the descriptionCompare with previous version