Using the New Client-Side Metrics feature in the AWS SDK for Java v2

We are pleased to announce the preview release of the metrics module for AWS SDK for Java v2!

The metrics module enables you to collect and publish key performance metrics recorded automatically by the SDK as you use it. These metrics will help you detect and diagnose issues in your applications like increased API call latency and startup time. You can even use the metrics to monitor trends in your SDK usage over time and tune the SDK for optimal performance and resource usage.

Collecting and Publishing Metrics

Out of the box, you can start collecting and publishing metrics to Amazon CloudWatch with just a few lines of code:

First add cloudwatch-metric-publisher to your pom.xml:


<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>cloudwatch-metric-publisher</artifactId>
    <version>2.13.52-PREVIEW</version>
</dependency>

Then, create and set the publisher on the clients you want to publish metrics for:

CloudWatchMetricPublisher cloudWatchMetricPublisher = CloudWatchMetricPublisher.create();
S3Client s3 = S3Client.builder()
        .overrideConfiguration(o -> o.addMetricPublisher(cloudWatchMetricPublisher))
        .build();

That’s all there is to it. Now, as you use the SDK, the CloudWatchMetricsPublisher will collect metrics in the background and hand them to the CloudWatch publisher to be sent to the CloudWatch service. You can view and access data through the CloudWatch SDK or Console, or create dashboards and alarms to monitor your SDK usage.

 

Publishing Metrics to Custom Location

If you’d like to publish the SDK’s metrics to a place other than CloudWatch, we’ve got you covered. You’d need to implement the MetricPublisher to publish metrics to a location of your choice. A simple publisher that logs the metrics to a file is just a few lines of code away:

public class LoggingPublisher implements MetricPublisher {
private static final Logger LOG = LoggerFactory.getLogger("software.amazon.awssdk.example.metrics");

@Override
public void publish(MetricCollection metricCollection) {
LOG.info("Metrics: {}", metricCollection);
}

@Override
public void close() {
}
}

We’ll see below that thanks to the insight we get from the metrics recorded by the SDK, even a simple publisher like this one can be a big help.

Metrics in action

Let’s imagine we have a simple application that downloads a file from S3, using a given STS role.

public class TestApp {
    
    ...

    public static void main(String[] args) throws IOException {
        TestApp testApp = new TestApp();
        for (int i = 0; i < 3; ++i) {
            Files.delete(Paths.get(KEY));
            testApp.downloadToFile(BUCKET, KEY, ROLE_ARN);
        }
    }
}

Here, the downloadToFile method simply downloads the object using the given role ARN:

public void downloadToFile(String bucket, String key, String roleArn) {
    AwsCredentialsProvider roleProvider = getRoleProvider(roleArn);
    GetObjectRequest request = createGetObjectRequest(bucket, key, roleProvider);
    s3.getObject(request, ResponseTransformer.toFile(Paths.get(key)));
}

As awesome engineers, we want this method to be as efficient as possible. Without metrics however, it’s difficult to know what we need to be change or optimize to lower the latency without a lot of digging. This is where metrics comes to the rescue.

Looking at the output of our simple LoggingPublisher, we can see that a large portion of time is spent fetching credentials. The first fetch is especially long at .84 seconds because of the additional overhead to establish a TCP connection to STS.

2020-06-25 17:22:55,363 [main] INFO  software.amazon.awssdk.example.metrics -  Metrics: MetricCollection(name=ApiCall,
metrics=[MetricRecord(metric=ServiceId, value=S3), MetricRecord(metric=OperationName, value=GetObject),
MetricRecord(metric=CredentialsFetchDuration, value=PT0.841487422S), ...

020-06-25 17:22:55,739 [main] INFO  software.amazon.awssdk.example.metrics -  Metrics: MetricCollection(name=ApiCall,
metrics=[MetricRecord(metric=ServiceId, value=S3), MetricRecord(metric=OperationName, value=GetObject),
MetricRecord(metric=CredentialsFetchDuration, value=PT0.325807755S), ...

2020-06-25 17:22:56,141 [main] INFO  software.amazon.awssdk.example.metrics -  Metrics: MetricCollection(name=ApiCall,
metrics=[MetricRecord(metric=ServiceId, value=S3), MetricRecord(metric=OperationName, value=GetObject),
MetricRecord(metric=CredentialsFetchDuration, value=PT0.321498448S), ...

A cool thing about the StsAssumeRoleCredentialsProvider we’re using is that it actually caches credentials to avoid making unnecessary extra API requests. With this in mind, we can make a quick and simple optimization so that we can reuse the credentials provider rather than creating a new one for each S3 request:

public void downloadToFile(String bucket, String key, String roleArn) {
    AwsCredentialsProvider roleProvider = getCachedCredentialsProvider(roleArn);
    GetObjectRequest request = createGetObjectRequest(bucket, key, roleProvider);
    s3.getObject(request, ResponseTransformer.toFile(Paths.get(key)));
}

...

private AwsCredentialsProvider getCachedCredentialsProvider(String roleArn) {
    AwsCredentialsProvider cached = providerCache.get(roleArn);
    if (cached == null) {
        cached = getRoleProvider(roleArn);
        providerCache.put(roleArn, cached);
    }
    return cached;
} 

Looks good! Let’s run the code again to see how it performs now.

2020-06-25 17:27:47,045 [main] INFO  software.amazon.awssdk.example.metrics -  Metrics: MetricCollection(name=ApiCall,
metrics=[MetricRecord(metric=ServiceId, value=S3), MetricRecord(metric=OperationName, value=GetObject),
MetricRecord(metric=CredentialsFetchDuration, value=PT0.953062913S), ...

2020-06-25 17:27:47,082 [main] INFO  software.amazon.awssdk.example.metrics -  Metrics: MetricCollection(name=ApiCall,
metrics=[MetricRecord(metric=ServiceId, value=S3), MetricRecord(metric=OperationName, value=GetObject),
MetricRecord(metric=CredentialsFetchDuration, value=PT0.000030022S), ...

2020-06-25 17:27:47,115 [main] INFO  software.amazon.awssdk.example.metrics -  Metrics: MetricCollection(name=ApiCall,
metrics=[MetricRecord(metric=ServiceId, value=S3), MetricRecord(metric=OperationName, value=GetObject),
MetricRecord(metric=CredentialsFetchDuration, value=PT0.000014517S), ...

We can see that the first request still takes a while to fetch credentials because it doesn’t have cached credentials yet. However, the following two requests spend virtually no time fetching credentials because we’re reusing the same provider that already has the credentials cached internally. Nice!

Conclusion

I hope that this short post and example have shown the utility of having detailed metrics to monitor, diagnose, and optimize SDK usage and performance, and invite you try it out for yourself! Head over to our GitHub where you can find the SDK and give feedback through the Issues page. Be sure to check out our developer guide as well which has more in depth information on using the metrics module.