Empirically test and measure queries from Amazon DynamoDB

This post describes how our team inexpensively provided simple output for the empirical analysis of Amazon DynamoDB queries. Our goal was to take a legacy DynamoDB database, transform the data in a novel way, and then store it in a new DynamoDB database. After eight months of hard work on our project, we decided to stop reasoning about our system theoretically, and to start empirically testing and measuring it instead.

We had proven correctness for a myriad of legacy use cases, but the system’s performance was ultimately unacceptable because it was expensive and slow. Many legacy services sat on top of our work. We felt stuck and exhausted from reading code and trying to follow data through the many legacy services, so we decided to approach our project in a different way.

The empirical mindset

You may be used to the development cycle of writing code, running it, and iterating on it until you get a desired result. If the answer is wrong, you go back to the code, load some data in your mental memory, and trace the execution. If you’re especially stymied, you add a few println statements to track the state of the system.

On a small scale, this theoretical approach is manageable. On a large scale, it can be nearly impossible to reason about data across several systems. That’s why you should abandon the theoretical for the empirical for large-scale cases.

Our team wrote some inexpensive and simple code using RequestMetricCollector to log standardized DynamoDB queries. Standardizing queries means creating a form of the query as a string without the data that makes it unique—stripping out all data until only the fields, table name, and index names remain.

The RequestMetricCollector action can be used to intercept calls to AWS with the AWS SDK. This approach should work for other AWS services as well. It exposes a collectMetrics function that has generic request and response objects. The documentation focuses on the possibilities for metrics, but non-metric data is available as well. It’s low cost because it doesn’t need to integrate with any other services or systems.

All the work is done in memory. It’s natural because the output is a list of standardized queries that can be read by a human. There is no need to put the query data into another data store for analysis.

The diagram below starts with a complex architecture composed of many services that are inherently difficult to reason about. One of the components calls out to DynamoDB to persist data. Intercept these calls with RequestMetricsCollector and log information about each request, which helps you solve some of the problems listed in the next section.

ALT Text: A workflow diagram illustrating how Service 1 interacts with Service 2, 3, 4, etc., and how those services interact with Service N, which connects with the DynamoDB table. The DynamoDB table interacts with the AWS SDK (RequestMetricsCollector), which in turn logs on your host/server.

Complex architecture:

Diagram index

  1. A complex architecture with many interconnected services
  2. Implementing collectMetrics(request, response)
  3. Logs on your host/server, for example:
    PerformanceMetrics: @@ 4 @@ {TableName: LogTest,Key: {uuid={S: 73bed7db-55d8-4ecc-b516-ddb0672bc31a,}},} @@ {TableName: LogTest,Key: {uuid={S:,}},}

Common use cases for empirical testing

The empirical testing method helps solve the following types of problems:

  • Performance analysis: Decisions to tune your DynamoDB tables for one type of query means that other types of queries are not as efficient. You must make performance tradeoffs. How can you know the reasons for poor performance? What specific query parameters are causing queries to run more slowly than you expect? An empirical approach helps answer these questions.
  • Refactoring of legacy systems that sit above your DynamoDB database: The word “legacy” usually implies a system that you, as an engineer, are not familiar with. It’s hard enough debugging your own code, much less code written by several other engineers. Working backwards from your reads and writes to DynamoDB from the original request can be daunting.For our project, our team found that we weren’t able to successfully connect legacy input to our system to the reads and writes happening at the DynamoDB level. Not only did our empirical approach help us work with a smaller problem space (much smaller than the theoretical), but it gave us the exact shape of the problem.
  • Root cause analysis for data-related bugs: When you look for the root cause of bugs, the explanation often appears when you observe bad, or wrong, queries in your logs. Suddenly, the solution becomes clear and the bug can be quickly fixed. The Example code section generates logs with standardized queries and an example query at the DynamoDB layer to help you find the root cause for your bug. The code helps to remove the temptation of guesswork from only reading code. The other benefit of the approach is that the code can be injected temporarily, and then removed quickly without impact.

Example code

In this section, I look at three Java classes that enable you to quickly log DynamoDB request metadata. Two classes are meant to be integrated into your code – DDBQueryLogger.java and DDBQueryInfo.java. The third class is an example AWS Lambda function written in Java. I include information about how to re-create this test in your own AWS account.

After you have created the classes in your code base, how do you wire it up? Locate the code (or configuration) that constructs your DynamoDB client and add the DDBQueryLogger metric collector to that code or configuration. The LogTest.java class shows an example by using the “withMetricsCollector()” function. When you are ready to disable logging, you can remove this line.

Now, consider how the two classes do their work. You are intentionally trading performance and accuracy for quick insights.

One example is the regular expressions that you are using. They are computationally expensive, but easy to work with and understand. Other concerns, like the thread safety of the in-memory data structure, are given a “best effort” approach to yield quick insights.

The following two classes go through the following steps:

  • Log every request to DynamoDB from your system.
  • Standardize the textual representation of the query.
  • Add the result to an in-memory data structure.
  • Flush the in-memory data structure to your logs every five minutes.

Further analysis of the output can be done offline.

DDBQueryLogger.java:

package example;

import com.amazonaws.Request;
import com.amazonaws.Response;
import com.amazonaws.metrics.RequestMetricCollector;

import java.util.Timer;
import java.util.TimerTask;
import java.util.concurrent.ConcurrentHashMap;
import java.util.regex.Pattern;

public class DDBQueryLogger extends RequestMetricCollector {

    private static final long TIMER_DELAY = 0L;
    private static final long TIMER_PERIOD = 100; // 5 * 60 * 1000;
    private static final int MAX_HASH_MAP_SIZE = 10000;
    protected static final ConcurrentHashMap<String, DDBQueryInfo> ddbQueryInfos = new ConcurrentHashMap<>();

    // These regular expressions strip out all query specific values. This helps
    // us “standardize” the textual representation of the query. For example,
    // “uuid={S:123}” would become “uuid={S:}”. This makes it possible to group
    // classes of queries together.
    private static final Pattern N = Pattern.compile("(?<=N:)(.*?)(?=,)");
    private static final Pattern S = Pattern.compile("(?<=S:)(.*?)(?=,)");

    private static final TimerTask emitLogTask = new TimerTask() {
        /**
         * Loop over all standardized DDB queries in memory for this host and emit our DDBQueryInfo object.
         */
        public void run() {
            for (final DDBQueryInfo ddbQueryInfo : ddbQueryInfos.values()) {
                System.out.println("PerformanceMetrics: " + ddbQueryInfo.toString());
            }
        }
    };

    private static final Timer timer = new Timer("Timer");

    static {
        timer.scheduleAtFixedRate(emitLogTask, TIMER_DELAY, TIMER_PERIOD);
        System.out.println("Timer started.");
    }

    @Override
    public void collectMetrics(Request<?> request, Response<?> response) {
        if (request == null || request.getOriginalRequest() == null) {
            return;
        }

        // Preventing out-of-memory errors. If you are standardizing queries correctly, you should never come close to this.
        if (ddbQueryInfos.keySet().size() > MAX_HASH_MAP_SIZE) {
            System.out.println("DDBQueryLogger in-memory data structure is too big. Standardized queries are broken.");
            return;
        }

        final String ddbQuery = standardizeQuery(request.getOriginalRequest().toString());

        final DDBQueryInfo infoOld = ddbQueryInfos.get(ddbQuery);

        final DDBQueryInfo infoNew = new DDBQueryInfo();
        infoNew.setExampleQuery(request.getOriginalRequest().toString());
        infoNew.setStandardizedQuery(ddbQuery);

        if (infoOld != null) {
            infoNew.getCount().set(infoOld.getCount().incrementAndGet());
        } else {
            infoNew.getCount().set(1L);
        }

        ddbQueryInfos.put(ddbQuery, infoNew);
    }

    /**
     * Take a DDB query (as a string) and remove all values that make it unique.
     * @param query toString() of the "original request" from the AWS SDK.
     * @return Standardized DDB query.
     */
    private String standardizeQuery(final String query) {
        return N.matcher(S.matcher(query).replaceAll("")).replaceAll("");
    }

}

DDBQueryInfo.java:

package example;

import java.util.concurrent.atomic.AtomicLong;

public class DDBQueryInfo {
    private AtomicLong count;
    private String exampleQuery;
    private String standardizedQuery;

    public AtomicLong getCount() { return count; }
    public void setCount(AtomicLong count) { this.count = count; }

    public String getExampleQuery() { return exampleQuery; }
    public void setExampleQuery(String exampleQuery) { this.exampleQuery = exampleQuery; }

    public String getStandardizedQuery() { return standardizedQuery; }
    public void setStandardizedQuery(String standardizedQuery) { this. standardizedQuery = standardizedQuery; }

    public DDBQueryInfo() {
        this.count = new AtomicLong();
    }

    /**
     * "@@" should help us tokenize a log entry.
     * @return A parsable string representing the object state.
     */
    @Override
    public String toString() {
        return String.format("@@ %s @@ %s @@ %s", count, exampleQuery, standardizedQuery);
    }
}

LogTest.java:

package example;

import com.amazonaws.AmazonServiceException;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
import com.amazonaws.services.dynamodbv2.model.AttributeValue;
import com.amazonaws.services.dynamodbv2.model.GetItemRequest;
import com.amazonaws.services.lambda.runtime.Context; 
import com.amazonaws.services.lambda.runtime.RequestHandler;

import java.util.HashMap;
import java.util.Map;
import java.util.Set;

public class LogTest implements RequestHandler<Integer, String> {

    public String handleRequest(Integer myCount, Context context) {
    	for (int i = 0; i < 5; i++) {
    		getItem();
    	}

        return "Log test done.";
    }

    public void getItem() {
        HashMap<String, AttributeValue> key = new HashMap<>();

        key.put("uuid", new AttributeValue("73bed7db-55d8-4ecc-b516-ddb0672bc31a"));

        GetItemRequest request = new GetItemRequest()
            .withKey(key)
            .withTableName("LogTest");

        // Make sure that your DynamoDB client references the DDBQueryLogger class.
        // You might consider turning it on/off with a feature flag.
        final AmazonDynamoDB ddb = AmazonDynamoDBClientBuilder
        	.standard()
        	.withMetricsCollector(new DDBQueryLogger())
        	.build();

        try {
            Map<String, AttributeValue> item = ddb.getItem(request).getItem();
        } catch (AmazonServiceException e) {
            System.err.println(e.getErrorMessage());
        }
    }

}

Testing with a Lambda function

I used Gradle to package up the three classes in this post to run against a DynamoDB table called LogTest. The table had one field named uuid, with the hardcoded value from the sample code in it. For information about how to package up a Lambda function written in Java with Gradle, see Creating a ZIP Deployment Package for a Java Function.

Here is the build.gradle file that I used:

apply plugin: 'java'

repositories {
  mavenCentral()
}

dependencies {
    compile (
        'com.amazonaws:aws-lambda-java-core:1.1.0',
        'com.amazonaws:aws-lambda-java-events:1.1.0',
        'com.fasterxml.jackson.core:jackson-core:2.9.8',
        'com.amazonaws:aws-java-sdk-dynamodb'
    )
}

task buildZip(type: Zip) {
    from compileJava
    from processResources              
    into('lib') {
        from configurations.compileClasspath
    }           
}

build.dependsOn buildZip

Code explanation

The mission of the code is to easily integrate with an existing system (a one-line change to the DynamoDB client object), perform some measurements, and then easily come out. You don’t want to introduce any new dependencies in the system under test. Here is a list of the design decisions made:

  • Store all DynamoDB queries in memory. You have checks to prevent it from growing indefinitely. (DDBQueryLogger.java, lines 47–50).
  • Standardize queries to DynamoDB. This allows you to group classes of queries that makes the memory footprint small, and make analysis tractable. (DDBQueryLogger.java, lines 60–64).
  • Flush the data structure to logs periodically. It makes collecting data a bit more painful, but prevents additional dependencies and complexity.
  • Keep an example for every standardized query. You might be able to re-use the data you see to dig in on a specific use case.

Understanding the output

The output of the DDBQueryInfo class is arbitrary. It reflects my personal preference for quickly working with logs. You are encouraged to change the output to suit your needs (DDBQueryInfo.java, line 29).

Here’s an example from running the LogTest Lambda function:

START RequestId: 5e22204c-d91c-4899-924d-abaef69589f4 Version: $LATEST
Timer started.
PerformanceMetrics: @@ 1 @@ {TableName: LogTest,Key: {uuid={S: 73bed7db-55d8-4ecc-b516-ddb0672bc31a,}},} @@ {TableName: LogTest,Key: {uuid={S:,}},}
PerformanceMetrics: @@ 1 @@ {TableName: LogTest,Key: {uuid={S: 73bed7db-55d8-4ecc-b516-ddb0672bc31a,}},} @@ {TableName: LogTest,Key: {uuid={S:,}},}
PerformanceMetrics: @@ 1 @@ {TableName: LogTest,Key: {uuid={S: 73bed7db-55d8-4ecc-b516-ddb0672bc31a,}},} @@ {TableName: LogTest,Key: {uuid={S:,}},}
PerformanceMetrics: @@ 1 @@ {TableName: LogTest,Key: {uuid={S: 73bed7db-55d8-4ecc-b516-ddb0672bc31a,}},} @@ {TableName: LogTest,Key: {uuid={S:,}},}
PerformanceMetrics: @@ 2 @@ {TableName: LogTest,Key: {uuid={S: 73bed7db-55d8-4ecc-b516-ddb0672bc31a,}},} @@ {TableName: LogTest,Key: {uuid={S:,}},}
PerformanceMetrics: @@ 2 @@ {TableName: LogTest,Key: {uuid={S: 73bed7db-55d8-4ecc-b516-ddb0672bc31a,}},} @@ {TableName: LogTest,Key: {uuid={S:,}},}
PerformanceMetrics: @@ 4 @@ {TableName: LogTest,Key: {uuid={S: 73bed7db-55d8-4ecc-b516-ddb0672bc31a,}},} @@ {TableName: LogTest,Key: {uuid={S:,}},}
PerformanceMetrics: @@ 4 @@ {TableName: LogTest,Key: {uuid={S: 73bed7db-55d8-4ecc-b516-ddb0672bc31a,}},} @@ {TableName: LogTest,Key: {uuid={S:,}},}
END RequestId: 5e22204c-d91c-4899-924d-abaef69589f4
REPORT RequestId: 5e22204c-d91c-4899-924d-abaef69589f4	Duration: 9607.21 ms	Billed Duration: 9700 ms 	Memory Size: 512 MB	Max Memory Used: 139 MB	

By using the @@ string as a delimiter, ignoring the PerformanceMetrics: prefix, you can see that there are three fields in this record:

  • The number of times you’ve seen the standardized version of the DynamoDB query.
  • An example of the standardized query string, which has the unique values of the query removed. I’ve found this useful when tracing data through the code.
  • The standardized query.

With these fields, you can begin to answer questions about how our code is calling DynamoDB. It is especially useful for considering what type of indexes you might want to add to your table, or why certain use cases run slower than you’d like.

The code in this post does not attempt to diagnose slow queries directly. Consider modifying DDBQueryLogger.java at line 52 to include other data like IOP consumption.

Though only the query requests to DynamoDB are logged, you can see which fields in “hot” or frequently called queries might benefit from being moved into an indexed field.

Conclusion

This empirical approach is a useful way to analyze DynamoDB queries. The code in this post should be adaptable to your specific problem and use case.

For example, you might extend the code to also track consumed capacity at a finer level of detail. You might also change the output format (DDBQueryInfo.java, line 29) to something you’re more comfortable with. Hopefully, this post has helped you solve some of your problems so you can keep using DynamoDB to delight your customers.

 


About the Author

 

Ryan Meyer is a Senior Engineer with Amazon Digital Goods.