Service Mesh for the developer workflow, a series

Christian Posta

Service mesh has largely been discussed from the perspective of architecture, SRE, and operations personas as it presents an interesting way to solve difficult challenges that exist between services and applications. Developers stand to gain from service-mesh functionality as well, and in this series, and accompanying workshop Nic Jackson (@sherrifjackson) and I (@christianposta) aim to make concrete how a service mesh helps developer workflows.

A service mesh is decentralized application infrastructure, typically implemented with sidecar proxies, that solves difficult service-to-service communication challenges such as request-level routing, resilience (timeouts, retries, circuit breaking), telemetry collection, and security regardless of what language or framework you use to implement the service.

In this architecture, requests flow to an application (and from an application) through its associated sidecar proxy. The sidecar proxy implements the difficult, but non-differentiated, heavy lifting when it comes to solving service-to-service communication challenges. It runs out of process from the application and implements these pieces of functionality consistently and independently from the chosen application programming language or framework.

When you consider what it means to abstract the networking code from the business logic, especially in a scenario where your team may be working on JavaEE and another team on Spring and another team on Nodejs, you can imagine the benefit of keeping the business logic tied to the language or framework but having the networking code that connects all these services together be language agnostic. Each team is responsible for their bit of the whole application, but they all need to be connected together to deliver a complete user experience.

The abstraction of having sidecars proxy manage the traffic between services puts new capabilities into the hands of developers including:

  • Application Resilience: Start offloading things like timeouts, retries and circuit breaking to the mesh and get consistency in doing these things without having to make code changes to the application and without having a different (partial?) implementation for each programming language.
  • Mutual TLS: This is not the first thing that comes to most developers’ minds but service mesh makes it easier to set up this for your application
  • Application Visibility: Before network engineers are the only ones that had visibility into the applications at the network level. With microservices, the network is the crux of how you make your application work so having that visibility is important for the developer. Service mesh lets the developer instrument it and visibility into things like request throughput, latency and failure rates all through the proxies without having to change the application code.

And there are more opportunities to be explored. The more we have access to the protocol-level behaviors of the network, and the more we can observe through metrics, the richer the network support can be — to an extent that is. In our workshop, we’ll show some more examples such as using this new-found abilities to debug our applications and services.

Let’s walk through some common scenes from a day in the life of a developer: debugging an issue and testing the application.

Debugging applications

In a purely monolithic world, debugging was easier than with microservices. You still had to deal with trying to recreate everything about the production on your local environment to recreate the issue before trying to solve it. Your debugger was likely specific to the language your application was written in and things like VM snapshots and other utilities made reproduction of the issue possible.

In the world of microservices, your pain of reproducing the production environment locally is exponentially harder. Trying to find the issue is like searching for a needle in a haystack — which service is it amongst the potentially hundreds that are running and talking to each other? What was the issue and how many services are affected? What if I didn’t write all of those services? What language is that app written in and which debugger do I use?

How can service mesh help with debugging? The sidecars see every incoming request and outgoing traffic from each service when an end user is trying to use the application. If you were to observe this traffic, you can deduce what is a “normal” range of behavior and then be able to detect when something is wrong; for example if a request went to a service but the traffic did not proceed to the next.

The service mesh can help by both providing invaluable network telemetry like requests/second, latency information, number of failures, etc as well as aiding in distributed tracing to get a visual understanding of how services interact with each other. If you have a service mesh in place, you could also leverage the ability for the mesh to record the traffic that flows through the mesh. This traffic could possibly be saved on failure events and replayed in lower (testing) environments. This would be a powerful way for developers to use their familiar tooling to debug services while not impacting production request flows.

Proactive testing of applications

For monolithic applications, developers would run a series of tests where the scope of what you are testing becomes bigger at each stage because your bit of code gets integrated into the monolith and then lastly a series of regression tests to make sure the entire code base still works as intended. These resulted in much longer testing and QA cycles which slowed down the rate at which developers can ship new features to their end users, but on the flip side the application was tested and static at time of release.

With microservices, you run into the issue of potentially hundreds of services being updated and deployed independently from each other. An individual developer may be responsible for a few of those services, so beyond unit testing, how do you run a full integration or system test on something that is always changing? Never mind figuring out regression testing — which of the hundred dynamic systems are you trying to see if your feature regressed? The entire idea of microservices blows up how we approach testing.

How can service mesh help? With so much of the application behavior tied to the network between the services, there is a possibility to leverage that network to test your application. Imagine doing fault injection or adding latency to a section of your application to see how your services are responding and the resulting behavior to its downstream services. This concept of Chaos Engineering, popularized by Netflix, follows the idea to proactively introduce bad behavior into a production system to uncover potential weak points before they become customer-facing issues. This also complements your testing by testing in production explicitly instead of implicitly. Chaos Engineering becomes even more interesting when you remove the need to change your application code, import libraries or have your chaos tests be language dependent. Service Mesh allows you to manipulate the operational code without touching the application code and you can apply these types of tests across all of services because they are all running on a shared service mesh.

While we don’t profess that service mesh is the answer to all challenges facing a developer, we do believe it is not solely in the realm of ops and has the potential to improve the developer workflow. The interesting part is to think of service mesh as a vehicle to deliver functionality — so what kind of functionality could we build? We look forward to going into these areas in more detail in future blog posts — in the meantime, don’t hesitate to reach out to ask us questions.

Also if you are going to KubeCon San Diego, Nic Jackson and I are doing a tutorial on this very topic on Thursday, Nov 21 at 2:25 PM. Get the session details here: https://sched.co/Uaeb

The idea for the tutorial and this blog series came from a discussion that both Nic Jackson, Developer Advocate at Hashicorp and I had recently about the technologies we were working with. As we both come from a developer background and are also both excited about service mesh, we wanted to apply our experience together to discuss how service mesh could improve developer workflows.Originally published at https://medium.com on November 5, 2019.