Perform Canary Deployments with AWS App Mesh on Amazon ECS Fargate

Yi Ai

In this article, I will walk you through all the steps required to perform canary deployments on Amazon ECS / Fargate with AWS App Mesh.

Canary deployments are a pattern for rolling out releases to a subset of users or servers. In this way, new features and other updates can be tested before it goes live for the entire user base.

We are going to deal with a Flask restful api. Once the api application is signed off for new release, only a few users are routed to new version. If no errors reported, we will roll out the new version to the rest of the users.

  • /api – api handler
  • /api-gateway – api gateway
  • Basic understanding of Docker
  • Basic understanding of CloudFormation
  • Setup an AWS account
  • Install latest aws-cli
  • Configure aws-cli to support App mesh APIs
  • jq installed

You can find all complete project in my GitHub repo.

First, We will setup a VPN with public subnets. If you like to set up ECS tasks in VPC with private subnets and NAT gateway, please read this tutorial from AWS team.

Now, let’s start to a CloudFormation template ecs-vpc.yaml:

then run the aws cloudformation create-stack command to create a stack:

$ aws cloudformation create-stack --stack-name flask-sample --template-body file://ecs-vpc.yaml --profile YOUR_PROFILE --region YOUR_REGION

AWS App Mesh is a service mesh that provides application-level networking to make it easy for your services to communicate with each other across multiple types of compute infrastructure. App Mesh standardizes how your services communicate, giving you end-to-end visibility and ensuring high-availability for your applications.

The following CF template app-mesh.yaml will be used to create an mesh, virtual service, virtual router, corresponding route and virtual nodes for our api application:

Run the aws cloudformation create-stack command to create the mesh stack:

$ aws cloudformation create-stack --stack-name flask-app-mesh --template-body file://app-mesh.yaml --profile YOUR_PROFILE --region YOUR_REGION

Before we can deploy our services, we will need to deploy the docker images to ECR image repositories so that ECS would use them to create Task definition.

Go to api api/ directory , create bash script to deploy api image to ECR api repository:

$ ./api/

Then move to api-gateway/ directory, create a bash script to deploy gateway image to ECR gateway repository:

$ ./api-gateway/

Task Definition is a blueprint that describes how a docker container should launch. We will need to create ECS task definitions for our gateway and api handlers and make tasks to be compatible with App Mesh and Xray.

Below is example JSON for Amazon ECS task definition of Flask gateway:

Now, we need to create a bash script to create api gateway task definition, then run command ./

#!/bin/bashset -exDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null && pwd )"AWS_DEFAULT_REGION="YOUR_REGION"
cluster_stack_output=$(aws --profile "${AWS_PROFILE}" --region "${AWS_DEFAULT_REGION}"
cloudformation describe-stacks --stack-name "flask-sample"
| jq '.Stacks[].Outputs[]')
task_role_arn=($(echo $cluster_stack_output
| jq -r 'select(.OutputKey == "TaskIamRoleArn") | .OutputValue'))
echo ${task_role_arn}execution_role_arn=($(echo $cluster_stack_output
| jq -r 'select(.OutputKey == "TaskExecutionIamRoleArn") | .OutputValue'))
ecs_service_log_group=($(echo $cluster_stack_output
| jq -r 'select(.OutputKey == "ECSServiceLogGroup") | .OutputValue'))
envoy_log_level="debug"GATEWAY_IMAGE="$( aws ecr describe-repositories
--repository-name flask-gateway --region ${AWS_DEFAULT_REGION}
--profile ${AWS_PROFILE} --query '[repositories[0].repositoryUri]' --output text)"
#Gateway Task Definition
task_def_json=$(jq -n
--arg SERVICE_LOG_GROUP $ecs_service_log_group
--arg TASK_ROLE_ARN $task_role_arn
--arg EXECUTION_ROLE_ARN $execution_role_arn
--arg ENVOY_LOG_LEVEL $envoy_log_level
-f "${DIR}/task-definition-gateway.json")
task_def_arn=$(aws --profile "${AWS_PROFILE}" --region "${AWS_DEFAULT_REGION}"
ecs register-task-definition
--cli-input-json "${task_def_json}"
--query [taskDefinition.taskDefinitionArn] --output text

The task definition for api v1 and v2 is very similar as above, you can find bash scripts in GitHub repo/api/.

Creating Services that runs the Task Definition

The command to create the ECS service takes a few parameters so it is easier to use CloudFormation template as input. Let’s create a ecs-service.yaml file with the following:

Next, create a bash script

and run following command:

$ ./

Now that we have setup everything we need. We can go to AWS console to review what we have created.

CloudFormation Console
ECS Task Definition
ECS Cluster

Once we have deployed our api application, we can curl the frontend service (gateway) to test. To get the endpoint, open the AWS EC2 console, on the navigation pane, under LOAD BALANCING, choose Load Balancers and select load balancer we just created, find the DNS name which is the endpoint, and run curl command:

$ curl{"todo": {   "task": "build an API"},   "version": "1"}

Notice that all the services of the application are reflecting version 1. Now, it’s time for us to perform a canary deployment of api v2.

We can manage the target weight (WeightedTargets) in app-mesh.yaml ApiRoute as below:

Type: AWS::AppMesh::Route
- ApiVirtualRouter
- ApiV1VirtualNode
- ApiV2VirtualNode
MeshName: !Ref AppMeshMeshName
VirtualRouterName: api-vr
RouteName: api-route
- VirtualNode: api-vn
Weight: 2
- VirtualNode: api-v2-vn
Weight: 1
Prefix: "/"

and re-deploy the template. Or we can open AWS App Mesh Console, choose Mesh we created, select Virtual Routers, open route and edit, set target traffic weight in Targets section as below:

AWS X-Ray helps us to monitor and analyze distributed microservice applications through request tracing, providing an end-to-end view of requests traveling through the application so we can identify the root cause of errors and performance issues. We’ll use X-Ray to provide a visual map of how App Mesh is distributing traffic and inspect traffic latency through our routes.

In setting up task definition step, we have already defined X-Ray container in task definitions, however X-Ray can only run in local mode with Fargate, so we need to manually create X-ray segment to track traffic between gateway , api v1& v2.

We will manually create trace id and segment, then pass the trace id and segment id as parent id to api handlers. The gateway/ should look like:

After all services deployed successfully, we can open AWS X-Ray console and monitor traffic we’re sending to the application frontend (gateway) when we request api application on the /todos route.

It is quickest to use the CloudFormation Console to delete the following stacks:

  • flask-ecs-services
  • flask-app-mesh
  • flask-sample