This week, at Google Cloud Next, GCP announced an interesting new service: Cloud Run. My thoughts about the new Cloud Run service are a bit more complicated than this Twitter thread, so I’ve expanded on them in this blog and welcome your comments.
In this article, I’ll compare Google’s Cloud Run with AWS Lambda and API Gateway because I am most familiar with those services and provider. But my thoughts below are a general critique of Cloud Run relative to FaaS including Google Cloud Functions and managed API services in general — regardless of the provider.
What is Cloud Run?
Google’s Cloud Run allows you to hand over a container image with a web server inside, and specify some combination of memory/CPU resources and allowed concurrency. The logic inside your container must be stateless.
Cloud Run then takes care of creating an HTTP endpoint, receiving requests and routing them to containers, and making sure enough containers are running to handle the volume of requests. While your containers are handling requests, you are billed in 100 ms increments.
How is this different than AWS Lambda?
This sounds a lot like Lambda. How is it different? You are handling multiple requests within a single container. At a fundamental level, Cloud Run is just serving as a very fancy load balancer.
All of the web logic is inside your container; auth*, validation, error handling, the lot of it. But instead of just measuring resource utilization or other metrics that are a proxy for load, Cloud Run understands requests and uses that directly as a measure of load to know how to scale and route.
Note: if you’re using GCP IAM and you’re running on the managed version of Cloud Run, the service can do the auth for you. There’s no custom authorizers, nor do I expect there to be in the future, because why not just put it in your web server?
In Lambda, while you can set up API Gateway to do no validation or auth and pass everything through to your Lambda, you’d be missing out on a wealth of managed features that perform these functions for you.
What’s the Good?
So what’s good about Cloud Run?
- It’s going to make it very, very simple for people who are running containerized, stateless web servers today to get serverless benefits.
- Better scaling and fine-grained billing.
- It’s also dead simple to test locally, because inside your container is a fully-featured web server doing all the work.
What’s the Bad?
So what’s bad about Cloud Run? Inside your container is a fully-featured web server doing all the work!
- The point of serverless is to focus on business value, and the best way to do that is to use managed services for everything you can — ideally only resorting to custom code for your business logic.
- If we try to compare Cloud Run to what’s possible inside a Lambda function execution environment, we’re missing the point.
- The point is that the code you put inside Lambda, the code that you are liable for, can be smaller and more focused because so much of the logic can be moved into the services themselves.
The FaaS Model: Handling each request in isolation
API Gateway allows you to use custom authentication. That means that your code, in a Lambda that does nothing else, can reject the request.
- You don’t have to even think about that request touching your downstream web handling code.
- You don’t have to pay for invocation of the request-handler Lambda.
- You don’t even pay for the API Gateway request.
- You aren’t paying for evaluating auth on every request since that custom authentication response is cached by API Gateway.
API Gateway allows you to perform schema validation on incoming requests. If the request fails validation, your Lambda doesn’t get invoked. Your code doesn’t have to worry about malformed requests.
Note: The API Gateway model validation is sadly a little more complicated than I’ve described above. Expect a post from myself and Richard Boyd on this topic in the near future.
The FaaS model is that each request is handled in isolation. Some people complain about this. I’ve even heard someone claim that AWS is pushing Lambda because users’ inability to optimize resource usage across requests is lucrative for them — which is about the most outlandish conspiracy theory I’ve heard this side of flat-earthers.
But the slightly less efficient usage model comes with benefits: you never have to worry about cross-talk effects. In Lambda, I don’t have to think about whether one request might have an impact on another. Everything’s isolated. This makes it easy to reason about, and removes one more thing I need to think about in the development process.
Security is hard, and the ability to scope your code’s involvement with it as small as possible is a huge win. Beyond the security implications, it’s also fewer moving parts that are your responsibility.
Cloud Run is also not the same as Lambda’s custom runtimes. Beyond the fact that custom runtimes should be a last resort, they don’t require running a server. Instead, you only need an HTTP client, which makes it more clear that what your code is doing is not acting as a tiny web server.
Cloud Run is not FaaS
All this is to say that Cloud Run should not be seen as equivalent, or even analogous, to pure FaaS — Cloud Run fundamentally involves significantly more code ownership. Cloud Run is still a valid rung on the serverless ladder, but there are many more above this service.
And that gets to my biggest concern. Cloud Run, and GCP in general, are providing people with a system that is going to make them complacent with traditional architecture, and not push them to gain the immense benefits of shifting (however slowly) to service-full architecture that offloads as many aspects of an application as possible to fully managed services.
Google’s strategy is to push Kubernetes as the solution to cloud architecture. And for good reason: Kubernetes is really good at solving people’s pain points while staying within the familiar architecture paradigm. And Google is doing a great job creating a Kubernetes layer on top of every possible base infrastructure.
But Kubernetes keeps us running servers. It removes the infrastructure notion of server, but encourages us to keep running application servers, like the ones inside Cloud Run containers.
Google’s ability to put Kubernetes on-prem is going to satisfy developers, and this will potentially come at the cost of delaying organizational moves to the public cloud. The difference from an application development perspective will be less apparent and will hide the higher total cost of ownership for being on-prem.
While Cloud Run is going to enable better usage of existing web server infrastructure, it’s also going to provide a safety blanket for developers intimidated by the paradigm shift of FaaS and service-full architecture. This will further delay the shift to the more value-oriented approach to development.