Understanding Amazon DynamoDB encryption by using AWS Key Management Service and analysis of API calls with Amazon Athena

As applications evolve to be more scalable for the web, customers are adopting flexible data structures and database engines for their use cases. Using NoSQL data stores has become increasing popular because of NoSQL’s flexible data model for building modern applications. Amazon DynamoDB is a fast and flexible NoSQL database service that can provide consistent single-digit millisecond latency at scale. As you adopt DynamoDB for web scale workloads, it’s important that you understand security controls available within DynamoDB.

You can use various capabilities to run DynamoDB securely. Amazon VPC endpoints provide secure access to DynamoDB tables for applications running in a VPC. Amazon VPC endpoints also provide fine-grained access control through AWS Identity and Access Management (IAM) to regulate access to items and attributes stored in DynamoDB tables. You can also work with Transport Layer Security (TLS) endpoints for encryption of data in transit.

For encryption of data at rest, you can choose one of two customer master key (CMK) options to encrypt your tables. The AWS-owned CMK is the default encryption type, where the key is owned by AWS as a collection of CMKs and manages use in multiple AWS accounts. AWS-owned CMKs are not in your AWS account. On the other hand, AWS-managed CMKs are keys stored in your account that are created, managed, and used on your behalf by an AWS service that integrates with AWS Key Management Service (AWS KMS).

Server-side encryption at rest using the AWS-owned CMK is enabled by default on all DynamoDB tables. DynamoDB encrypts all existing tables that were previously unencrypted using the AWS-owned CMK. However, you can select an option to encrypt some or all of your tables by using an AWS-managed CMK. In addition, you can use client-side encryption to protect data before sending it to DynamoDB.

In this blog post, we cover the mechanics of server-side encryption by using an AWS-managed CMK. We also discuss tracking API calls to AWS KMS by using AWS CloudTrail and Amazon Athena to understand the distribution of calls made (GenerateGrant vs. Decrypt).

Create a DynamoDB table

Let’s begin by creating a DynamoDB table with the AWS-managed CMK. The attribute –sse-specification Enabled with AWS KMS as SSEType defines the method of encryption. In this case, it is an AWS-managed CMK.

aws dynamodb create-table 
    --table-name ratings 
    --attribute-definitions AttributeName=player,AttributeType=S 
 AttributeName=rating,AttributeType=N 
    --key-schema AttributeName=player,KeyType=HASH  AttributeName=rating,KeyType=RANGE 
    --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 
    --sse-specification Enabled=true,SSEType=KMS

Reviewing the following response output from the AWS CLI command, SSEDescription Status is set to ENABLED. SSEType is KMS with the Amazon Resource Name (ARN) of the KMS key used for server-side encryption.

Response output

{
    "TableDescription": {
        "TableArn": "arn:aws:dynamodb:us-east-1:904672585901:table/ratings", 
        "AttributeDefinitions": [
            {
                "AttributeName": "player", 
                "AttributeType": "S"
            }, 
            {
                "AttributeName": "rating", 
                "AttributeType": "N"
            }
        ], 
        "ProvisionedThroughput": {
            "NumberOfDecreasesToday": 0, 
            "WriteCapacityUnits": 5, 
            "ReadCapacityUnits": 5
        }, 
        "TableSizeBytes": 0, 
        "TableName": "ratings", 
        "TableStatus": "CREATING", 
        "TableId": "e6befed6-86c8-4b4d-b800-9be1062eb67b", 
        "SSEDescription": {
            "Status": "ENABLED", 
            "KMSMasterKeyArn": "arn:aws:kms:us-east-1:904672585901:key/af886ef7-08d3-4c1b-bd14-70d2b722e165", 
            "SSEType": "KMS"
        }, 
        "KeySchema": [
            {
                "KeyType": "HASH", 
                "AttributeName": "player"
            }, 
            {
                "KeyType": "RANGE", 
                "AttributeName": "rating"
            }
        ], 
        "ItemCount": 0, 
        "CreationDateTime": 1552353132.722
    }
}

Note: If you don’t see SSEDescription in the response for a table with server-side encryption, try updating to the latest AWS CLI.

Verify encryption for the table

If you want to verify a table’s encryption method, you can use the describe-table API call or the DynamoDB console.

aws dynamodb describe-table –-table-name ratings 
     --query 'Table.{TableName:TableName, TableStatus:TableStatus, SSEDescription:SSEDescription}' --output json

You can use the --query parameter to filter out and print only necessary attributes in the response output, as follows. You can see that the table is ACTIVE and the status attribute in the SSEDescription object is ENABLED with AWS KMS as SSEType.

Response output

{
    "TableStatus": "ACTIVE", 
    "TableName": "ratings", 
    "SSEDescription": {
        "Status": "ENABLED", 
        "KMSMasterKeyArn": "arn:aws:kms:us-east-1:904672585901:key/af886ef7-08d3-4c1b-bd14-70d2b722e165", 
        "SSEType": "KMS"
    }
}

How server-side encryption works

Now that we know the ratings table is created with AWS KMS server-side encryption, let’s look at the workflow for server-side encryption.

Workflow for server side encryption

These are the steps in the server-side encryption process, as shown in the preceding diagram:

  1. The owner of the table uses the CreateTable API call with server-side encryption set to AWS KMS.
  2. When the CreateTable API request is received, DynamoDB authenticates the request.
  3. DynamoDB uses the AWS-managed CMK as the top-level key. Because DynamoDB has to use this key for server-side encryption, the first step is to make a set of CreateGrant API calls.
  4. DynamoDB uses the CMK to generate a table key, which is a unique key for each table. This table key is used to generate data encryption keys that are used to encrypt underlying structures in the table.
  5. The plaintext key material and the encrypted key material are sent to DynamoDB.
  6. The plaintext table key is cached in DynamoDB.

The following diagram shows the hierarchy of server-side encryption keys used by DynamoDB. DynamoDB uses the AWS KMS-managed CMK in each AWS Region in your AWS account as the top-level key to generate and encrypt a unique table key for each table. DynamoDB uses the table key to generate data encryption keys and then uses the data encryption keys to encrypt table data and the underlying structures in a table.

The hierarchy of server-side encryption keys used by DynamoDB

Now that we have created the table, let’s look at the mechanics while using the PutItem API call.

The mechanics while using the PutItem API call

When using the PutItem API call:

  1. The user issues a PutItem call to add data to a DynamoDB table.
  2. DynamoDB authenticates the user’s request.
  3. DynamoDB verifies that the user has the necessary permissions to write data to the DynamoDB table
  4. Depending on the data being encrypted, DynamoDB identifies the right data encryption key to encrypt the data. To avoid having DynamoDB call KMS for every DynamoDB operation, the table key is cached for each principal in memory. The table key is refreshed once every five minutes per client connection with active traffic. If DynamoDB gets a request for the cached table key after five minutes of inactivity, it sends a new request to KMS to decrypt the table.
  5. Encrypted data and encrypted key material are stored in DynamoDB.

Now that we have inserted data into the DynamoDB table, let’s look at the mechanics of retrieving the data with the GetItem API call.

The mechanics of retrieving the data with the GetItem API call

When using the GetItem API call:

  1. The user issues a GetItem call to retrieve data from the DynamoDB table.
  2. DynamoDB authenticates the user request.
  3. DynamoDB verifies that the user has the necessary permissions to read data from the DynamoDB table.
  4. The request for retrieving the data is made.
  5. Encrypted data is retrieved.
  6. DynamoDB caches the plaintext table keys for each principal in memory. If DynamoDB gets a request for the cached table key after five minutes of inactivity, it sends a new request to AWS KMS to decrypt the table key.
  7. Decrypted plaintext key material is retrieved.
  8. Data is decrypted by using received plaintext key material.
  9. Plaintext data is sent to the user by using HTTPS (for the TLS endpoint only).

Note: CloudTrail logs are necessary for the next section. Ensure that CloudTrail is enabled on your account. For more information, see Getting Started with CloudTrail. 

Analyze KMS key usage using CloudTrail logs and Athena

 CloudTrail records API calls and publishes log files to Amazon S3. Account activity is tracked as an event in the CloudTrail log file. Each event contains information such as who performed the action, the date and time of the action, and the resources affected. Multiple events are stitched together and structured in JSON format in the CloudTrail log files. When DynamoDB makes API calls to create a grant on the CMK, they are recorded by CloudTrail. In addition, when DynamoDB makes an API call to generate a table key or API calls to decrypt, they are recorded by CloudTrail. In this post, we use Athena, an interactive SQL query service, to analyze CloudTrail logs stored on Amazon S3 to understand calls made to AWS KMS and DynamoDB.

The following sample queries list calls made to DynamoDB tables for a date range, the number of calls to AWS KMS, and distribution by API call type. Before we can run queries, though, we need to create an external table in Athena that describes the structure of CloudTrail logs.

Create a table in Athena

Use the following CREATE EXTERNAL TABLE command in the Athena console to create the table. Replace the Amazon S3 bucket name and location with your Amazon S3 bucket name and location.

CREATE EXTERNAL TABLE cloudtrail_logs_<s3 bucket name> (
eventversion STRING,
userIdentity STRUCT<
  type:STRING,
  principalid:STRING,
  arn:STRING,
  accountid:STRING,
  invokedby:STRING,
  accesskeyid:STRING,
  userName:STRING,
  sessioncontext:STRUCT<
    attributes:STRUCT<
      mfaauthenticated:STRING,
      creationdate:STRING>,
    sessionIssuer:STRUCT<
      type:STRING,
      principalId:STRING,
      arn:STRING,
      accountId:STRING,
      userName:STRING>>>,
eventTime STRING,
eventSource STRING,
eventName STRING,
awsRegion STRING,
sourceIpAddress STRING,
userAgent STRING,
errorCode STRING,
errorMessage STRING,
requestParameters STRING,
responseElements STRING,
additionalEventData STRING,
requestId STRING,
eventId STRING,
resources ARRAY<STRUCT<
  ARN:STRING,accountId:
  STRING,type:STRING>>,
eventType STRING,
apiVersion STRING,
readOnly STRING,
recipientAccountId STRING,
serviceEventDetails STRING,
sharedEventID STRING,
vpcEndpointId STRING
)
PARTITIONED BY(year string, month string, day string)
ROW FORMAT SERDE 'com.amazon.emr.hive.serde.CloudTrailSerde'
STORED AS INPUTFORMAT 'com.amazon.emr.cloudtrail.CloudTrailInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://<Your CloudTrail s3 bucket>/AWSLogs/<AWS_Account_ID>/CloudTrail/<region>’;

As shown in the preceding code example, CloudTrail logs are delivered to Amazon S3 in this format: s3://&lt;your-cloudtrail-s3-bucket&gt;/AWSLogs/&lt;AWS account-number&gt;/CloudTrail/region/year/month/date. The Athena external table that we have created is partitioned by year, month, and day as specified in PARTITIONED BY syntax in the preceding code example.

The next step is to add partitions to the table with the following command. You can execute the commands directly in the Athena console. Here, I add the partition with year=’2019’, month=’03’, day=’11’. Partitioning your data can restrict the amount of data scanned by each query and thus improve performance and reduce cost. For more information about partitioning data in Athena, see Partitioning Data in the Athena User Guide.

ALTER TABLE cloudtrail_logs_<s3 bucket name>
ADD PARTITION (year=’2019’, month=’03’, day= ‘11’)
LOCATION ‘s3://<your-cloudtrail-s3-bucket>/AWSLogs/<account-number>/CloudTrail/<region>/2019/03/11/’;

Now that we have created the external partitioned table in Athena and added partitioned data, let’s execute a few queries.

Example queries

In the following example queries, I use the partition data of year=’2019’, month=’03’, AND day=’11’, as per the preceding example ALTER command. Change these values based on your Athena partition data.

Example query 1: This query returns API calls made to DynamoDB for the specified date from the CloudTrail logs table. It does this by filtering on eventsource = ‘dynamodb.amazonaws.com’. It limits the number of records returned to 1,000.

SELECT * FROM cloudtrail_logs_<S3 bucket name> 
WHERE year='2019' AND month='03' AND day='11'
AND eventsource = 'dynamodb.amazonaws.com'
LIMIT 1000;

Example query 2: In the previous query, we retrieved all available attributes that have the eventsource = “dynamodb.amazonaws.com”. Now, let’s further filter the output by specifying select columns and attributes with API calls made to event source kms.amazonaws.com from dynamodb.amazonaws.com. The output should show calls made to KMS from DynamoDB.

SELECT eventtime, eventname, sourceipaddress, useridentity, requestparameters FROM cloudtrail_logs_<S3 bucket name> 
WHERE year='2019' AND month='03' AND day='11'
AND eventsource = 'kms.amazonaws.com'
AND sourceipaddress = 'dynamodb.amazonaws.com'
LIMIT 1000;

Example query 3: Let’s review API calls made to KMS. The following query helps identify the set of API calls made to a specific table. Replace your-table-name with the name of the DynamoDB table you want to query. You can order the results by eventtime to understand a timeline of API calls made to AWS KMS. You should see decrypt eventnames in the Athena output.

SELECT eventtime, eventname, sourceipaddress, useridentity, requestparameters FROM cloudtrail_logs_<S3 bucket name> 
WHERE year='2019' AND month='03' AND day='11'
AND eventsource = 'kms.amazonaws.com'
AND sourceipaddress = 'dynamodb.amazonaws.com'
AND REPLACE(JSON_EXTRACT_SCALAR(requestparameters, '$.encryptionContext.aws:dynamodb:tableName'),'"','') = 'your-table-name'
ORDER BY eventtime
LIMIT 1000;

Example query 4: Every AWS Region has a unique KMS CMK that is used to generate table keys. This query helps you identify tables that are using a specific key for server-side encryption at rest. Replace arn:aws:kms:your-region:your-account-number:key/your-key-id with the ARN of the KMS CMK in the AWS Region in which you are interested. Athena output should show eventname and DynamoDB tables that are using the KMS key.

SELECT eventtime, eventname, REPLACE(JSON_EXTRACT_SCALAR(requestparameters, '$.encryptionContext.aws:dynamodb:tableName'),'"','') as ddbtbl FROM cloudtrail_logs_<S3 bucket name> 
WHERE year='2019' and month='03' and day='11'
AND eventsource = 'kms.amazonaws.com'
AND sourceipaddress = 'dynamodb.amazonaws.com'
AND resources[1].arn = 'arn:aws:kms:your-region:your-account-number:key/your-key-id' 
ORDER BY ddbtbl
LIMIT 1000;

Summary

In this blog post, we outlined encryption options with DynamoDB and walked through the process of creating DynamoDB tables with server-side encryption using the AWS-managed CMK. We reviewed DynamoDB API workflows and KMS interaction when creating a table, adding an item to a table, and retrieving an item from a DynamoDB table with encryption enabled. We also looked at the hierarchy of encryption keys used with DynamoDB. We then used Athena to analyze CloudTrail logs to retrieve relevant information. This information includes KMS API call activity with DynamoDB tables, numbers and types of API calls, and mapping of service keys to DynamoDB tables. All together, this should give you further insights into DynamoDB encryption and its interaction with AWS KMS.


About the Authors

Sai Sriparasa is a Sr. Big Data & Security Consultant with AWS Professional Services. He works with our customers to provide strategic and tactical big data solutions with an emphasis on automation, operations, governance & security on AWS. In his spare time, he follows sports and current affairs.

 

 

 

 

Prahlad Rao is a Solutions Architect with AWS and focused on databases and bigdata. He works with enterprise customers to help navigate their cloud journey and optimize applications for the cloud