If the solution architect has doubled down and adopted an API driven microservice architecture as well, then running the stateless workloads in Kubernetes helps speed up development and delivery of services. An important rule for microservices architecture is that each microservice must own its domain data and logic, and so data subsets are starting to run in Kubernetes to be closer to the microservice and typically easier to handle than dealing with the gnarly legacy databases running on-premise. This has lead to the Decentralized Data Management model, as described by Martin Fowler’s Microservice architecture².
Therefore, as a sensible solution architect, you include a Kubernetes Statefulset deployment³ in your design, and these Kubenernetes Stateful constructs handle the state and the persistent data for your stateful application. A critical part of the Statefulset is the storage class that describes the physical volume claim⁴ the pod will make of Kubernetes, and you end up with some form of cloud-specific block storage. Your stateful application is happy running with its persistent volume, and you can kick the pod in the guts, and it will come back up with the state-managed in the persistent volume, happy days job done.
Unfortunately, not quite, let’s imagine that your workload is running in the most popular by usage Cloud Kubernetes Provider AWS EKS. This particular flavour of Kubernetes provider does not have auto repairing Worker nodes, they can just go NotReady, and the pods running on that worker node go into pending state and nothing can be scheduled on the node. Not to worry this is Kubernetes so when we notice that the node is NotReady we can just spin up another one and the workload will be scheduled on that node, happy days.
That is fine for the Stateless workload it will move spin up and be in their happy place. However, the new worker node has come up in a different Availability Zone because EKS spans zones for availability. The Statefulset spins up on this node, and the persistent volume claim asks for the persistent volume, and Kubernetes says that it can’t have it, as that persistent volume only exists that Availability zone that it was created in, oh dear. Now we have to re-create the persistent volume from a snapshot, and we need to do some system administration to get this Statefulest operational again, not ideal.
I have described the EBS stuck volume problem⁵ however there are more issues with persistent storage that I won’t cover in this article and the good news there are many solutions to handle these issues, including rook.io incubating in CNCF or OpenEBS and Longhorn that are in the CNCF sandbox. I have only scratched the surface there in the open-source persistent storage solutions as there are many more. There are also some enterprise solutions likes Portworx and StorageOS which are more expensive but have more features and also are also more performant, so if you are serious about performance like trying to run a performant database, then you should have a look as these solutions.
These solutions also open up an exciting new opportunity for the Solution Architect as well as solving the cloud, hybrid, multi-cloud and on-premise persistent storage issues in Kubernetes and VMs. They also open up the possibility of using Kubernetes as a Data control plane, which I think is exciting as this is leveraging the power of Kubernetes to serve up data on-premise and in any cloud provider and allow the lightweight application containers to run stateless and stateful work anywhere. That for me and other like-minded solution architects is an exciting prospect and can unlock data from the gnarly legacy databases that tie workloads to on-premise data centres. Or how about disaster recovery moving data to another data centre or cloud region using your data control plane, so many solutions can be added to Solution Architects tool belt with persistent storage done right in Kubernetes.