Building a GraphQL interface to Amazon QLDB with AWS AppSync: Part 1

Amazon Quantum Ledger Database (QLDB) is a purpose-built database for use cases that require an authoritative data source. Amazon QLDB maintains a complete, immutable history of all changes committed to the database (referred to as a ledger). Amazon QLDB fits well in finance, eCommerce, inventory, government, and numerous other applications.

Pairing Amazon QLDB with services such as AWS AppSync allows you to safely expose data and that data’s history for mobile applications, websites, or a data lake. This post explores a reusable approach for integrating Amazon QLDB with AWS AppSync to power an example government use case.

To add Amazon QLDB as a data source for AWS AppSync, you use an AWS Lambda function to connect to the database. The following diagram illustrates the architecture of this solution.

For this post, you add Amazon QLDB as a data source to AWS AppSync using a Department of Motor Vehicles (DMV) use case, which is available in Getting Started with the Amazon QLDB Console. In addition to connecting the Amazon QLDB data source, you also write a simple query.

A future post to follow explores performing more advanced Amazon QLDB operations, such as mutating data and retrieving history. For information about integrating AWS AppSync with Amazon ElastiCache and Amazon Neptune, see Integrating alternative data sources with AWS AppSync: Amazon Neptune and Amazon ElastiCache.

Getting to know AWS AppSync

AWS AppSync is a managed service for building data-rich applications using GraphQL. Clients of an AWS AppSync API can select the exact data needed, which allows you to build rich, flexible APIs that can combine data from multiple data sources. AWS AppSync also enables real-time and offline use cases without the need to manage scaling.

When building an API in AWS AppSync, you start by defining a GraphQL schema. The schema defines the shape of data types available in your API and the operations that you can perform via that API. GraphQL operations include queries (reading data), mutations (writing data), and subscriptions (receiving real-time updates). Each operation is backed by a data source. AWS AppSync supports a variety of data sources out-of-the-box, including Amazon DynamoDB, Amazon Elasticsearch Service, HTTP endpoints, and Lambda. The flexibility of Lambda functions allows you to create a wide variety of data sources, including for Amazon QLDB.

In addition to a data source, each GraphQL operation is associated with a resolver. Resolvers are composed of two mapping templates composed in Apache Velocity Template Language (VTL). The request mapping template defines how AWS AppSync should query or mutate data in the data source; the response template defines how to return the result of the operation to the client. GraphQL operations typically use the JSON data format to communicate with clients. The following diagram illustrates this architecture.

The full breadth of functionality in AWS AppSync is beyond the scope of this post. For more information, see in the AWS AppSync Developer Guide. You can also explore AWS Amplify, a development platform for building mobile and web applications, which includes support for AWS AppSync.

Building the DMV API in AWS AppSync

The first step in constructing a GraphQL API in AWS AppSync is to specify the schema, which defines the shape of data and operations available in the API. The complete code is available on the GitHub repo.

For this post, you initially include five GraphQL types and one query in the schema. See the following code:

type Person {
   FirstName: String!
   LastName: String!
   DOB: AWSDate
   GovId: ID!
   GovIdType: String
   Address: String
}

type Owner {
   PersonId: ID!
}

type Owners {
   PrimaryOwner: Person!
   SecondaryOwners: [Person]
}

type Vehicle {
   VIN: ID!
   Type: String
   Year: Int
   Make: String
   Model: String
   Color: String
}

type VehicleRegistration {
   VIN: ID!
   LicensePlateNumber: String!
   State: String
   City: String
   PendingPenaltyTicketAmount: Float
   ValidFromDate: AWSDateTime!
   ValidToDate: AWSDateTime!
   Owners: Owners
}

type Query {
   getVehicle(vin:ID!): Vehicle
}

schema {
   query: Query
}

If you have experience working with relational databases and SQL, working with Amazon QLDB may feel similar. Like a relational database, Amazon QLDB organizes data in tables. Three of the GraphQL types in the schema map to a table of the same name, with the addition of two types (Owner and Owners) that represent nested data.

The sample code for this post deploys both the necessary AWS resources and a small dataset. The Amazon QLDB ledger (similar to a database in relational databases) contains four tables and example data. See the following screenshot.

When you review the schema and tables in the ledger, you can see that the types and fields in the schema align closely with the tables and document attributes in the ledger.

Querying for Vehicle Data

The DMV API currently supports one query to access data: getVehicle. The getVehicle query takes a single parameter, the vehicle identification number (VIN), and returns data about that vehicle.

The following code shows the GraphQL query to retrieve information about the 2019 Mercedes CLK 350 in the DMV dataset. GraphQL allows you to specify the fields included in the result (if they’re part of the overall data type). In the following code, the result includes the make, model, and year, but not color and other attributes:

query GetVehicle {
  getVehicle(vin: "1C4RJFAG0FC625797") {
    Make
    Model
    Year
  }
}

Each AWS AppSync operation (query or mutation) is associated with a data source and a resolver. Amazon QLDB isn’t directly integrated with AWS AppSync out-of-the-box, but you can use Lambda to enable Amazon QLDB as a data source.

Building an Amazon QLDB data source

A single integration function manages all interactions between AWS AppSync and Amazon QLDB in the example application, though you may choose to implement in another way. Interacting with Amazon QLDB requires a driver (similar to a relational database, Amazon ElastiCache, or Neptune) that you package in your integration function. The function also needs IAM permission to perform queries on the Amazon QLDB ledger. Amazon QLDB isn’t in an Amazon VPC, though you could also use a Lambda data source to integrate AWS AppSync with a database that’s in a VPC.

Amazon QLDB currently offers drivers in Java and previews of Node.js and Python. This post uses Java to build the Amazon QLDB integration function based on its maturity, though Lambda also supports either of the other options. This post also uses AWS Serverless Application Model (SAM) to simplify management of the function and the AWS SAM CLI to build it.

To attach the integration function to the AWS AppSync API, you add it as a new Lambda data source that references the function and provide a service role that allows AWS AppSync to invoke the function. For this post, you perform this work in AWS CloudFormation but you can also connect via the AWS CLI and the AWS Management Console. The following code is the applicable portion of the CloudFormation template:

QLDBIntegrationDataSource:
  Type: AWS::AppSync::DataSource
  Properties:
    ApiId: !GetAtt DmvApi.ApiId
    Name: QldbIntegration
    Description: Lambda function to integrate with QLDB
    Type: AWS_LAMBDA
    ServiceRoleArn: !GetAtt AppSyncServiceRole.Arn
    LambdaConfig:
      LambdaFunctionArn: !GetAtt QLDBIntegrationFunction.Arn

Attaching a resolver

After you create the Amazon QLDB data source, you can define the resolver for the getVehicle query. The first part of the resolver is the request mapping, which defines how AWS AppSync interacts with the data source. The request mapping template is defined in JSON and includes a common envelope for all Lambda data sources. For the Amazon QLDB integration function, the payload field includes the specifics of this particular query. See the following code:

{
  "version": "2017-02-28",
  "operation": "Invoke",
  "payload": {
    "action": "Query",
    "payload": [
      {
        "query": "SELECT * FROM Vehicle AS t WHERE t.VIN = ?",
        "args": [ "$context.args.vin" ]
      }
    ]
  }
}

When the getVehicle query is called, AWS AppSync invokes the integration function and passes the contents of the outer payload field as the event. In this use case, AWS AppSync also replaces $ctx.args.vin with the value passed as the vin argument in the query (in the preceding query, the value is 1C4RJFAG0FC625797).

The integration function takes an action argument and another payload that contains the actual query. The structure of the invocation payload is flexible, but the invoked Lambda function needs to understand it. For this use case, the integration function expects a payload with the following schema:

{
  "action": "STRING_VALUE", /* required - always "Query" */
  "payload": [
    {
      "query": "STRING_VALUE", /* required – PartiQL query */
      "args": [
        "STRING_VALUE" /* optional – one or more arguments */
      ]
    }
    /* optional - additional queries (covered in subsequent post) */
  ]
}

If you’re familiar with SQL, the query included in the preceding request mapping should be familiar. For this use case, you query for all attributes in the Vehicle table where the VIN attribute is some value. The value of the VIN argument is passed from AWS AppSync in the $context variable available to the resolver. For this use case, AWS AppSync transforms the variable to the actual value before invoking the Lambda function. For more information about resolver mapping templates, see Resolver Mapping Template Context Reference.

You can test this query yourself using the Query Editor in the Amazon QLDB console and replace the question mark with a valid VIN in the ledger. See the following screenshot.

Exploring the Amazon QLDB integration function

After transforming the getVehicle request mapping template, AWS AppSync invokes the Amazon QLDB integration function. This section explores the implementation of the function.

Connecting to Amazon QLDB

Before you can execute queries on the ledger, you need to establish a connection to it. The integration function uses PooledQldbDriver, which is an Amazon QLDB best practice. For more information about best practices, see What is Amazon QLDB? For more information about the driver, see Amazon QLDB Java Driver 1.1.0 API Reference on the Javadocs website. In the Lambda function, the driver is initialized in a static code block so that it isn’t created on every invocation. This is a Lambda best practice because creating the connection is a relatively slow process.

To instantiate a connection, use the builder object provided by the PooledQldbDriver class, passing the name of the ledger. The ledger is named vehicle-registration; that name is passed via a Lambda environment variable (QLDB_LEDGER). See the following code:

private static PooledQldbDriver createQLDBDriver() {
    AmazonQLDBSessionClientBuilder builder =    
      AmazonQLDBSessionClientBuilder.standard();

    return PooledQldbDriver.builder()
             .withLedger(System.getenv("QLDB_LEDGER"))
             .withRetryLimit(3)
             .withSessionClientBuilder(builder)
             .build();
}

As mentioned earlier, Amazon QLDB doesn’t require a VPC but a caller needs IAM permission to execute queries for the particular ledger. An IAM policy such as the following grants the Lambda function appropriate access to Amazon QLDB:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "qldb:SendCommand"
            ],
            "Resource": "arn:aws:qldb:REGION:ACCOUNT_ID:ledger/vehicle-registration",
            "Effect": "Allow"
        }
    ]
}

Executing a query

To transact with Amazon QLDB, you need to create a session via the driver. The integration function creates a new session on each invocation of the Lambda function. See the following code:

private QldbSession createQldbSession() {
    return DRIVER.getSession();
}

With a session, you can begin to interact with the Amazon QLDB ledger. Amazon QLDB supports the PartiQL query language, which provides SQL-compatible query access across structured, semi-structured, and nested data. You can run multiple queries within a single transaction.

To promote reusability, the Amazon QLDB integration function allows multiple queries in a single AWS AppSync query or mutation. This post focuses single query operations, but a later post discusses how to use multiple Amazon QLDB queries for more complex transactions.

To run a query on Amazon QLDB, create a transaction and execute each query of interest. See the following code:

private String executeTransaction(Query query) {
  try (QldbSession qldbSession = createQldbSession()) {
    String result = "";

    qldbSession.execute((ExecutorNoReturn) txn -> {
      result = executeQuery(txn, query));
    }, (retryAttempt) -> LOGGER.info("Retrying due to OCC conflict..."));

    return result;
  } catch (QldbClientException e) {
    LOGGER.error("Unable to create QLDB session: {}", e.getMessage());
  }

  return "{}";
}


private String executeQuery(TransactionExecutor txn, Query query) {
  final List<IonValue> params = new ArrayList<IonValue>();
  query.getArgs().forEach((a) -> {
    try {
      params.add(MAPPER.writeValueAsIonValue(arg));
    } catch (IOException e) {
      LOGGER.error("Could not write value as Ion: {}", a);
    }
  });

  // Execute the query and transform response to JSON string...
  List<String> json = new ArrayList<String>();
  txn.execute(query.getQuery(), params).iterator().forEachRemaining(r -> {
    String j = convertToJson(r.toPrettyString());
    json.add(j);
  });

  return json.toString();
}

Query results are returned from Amazon QLDB in Amazon ION, which is an extension of JSON. AWS AppSync, however, requires that data to be passed in JSON format. You can convert from ION to JSON with a derivation of an ION Cookbook recipe. See the following code:

rivate String convertToJson(String ionText) {
    StringBuilder builder = new StringBuilder();
    try (IonWriter jsonWriter = IonTextWriterBuilder.json()
                                   .withPrettyPrinting().build(builder)) {
        IonReader reader = IonReaderBuilder.standard().build(ionText);
        jsonWriter.writeValues(reader);
    } catch (IOException e) {
        LOGGER.error(e.getMessage());
    }
    return builder.toString();
}

A future post covers further details of the Lambda function. For more information, see the GitHub repo.

Results from the Amazon QLDB function are returned as part of a JSON response. The actual result from Amazon QLDB is string-encoded when returned from the function. See the following code:

"result": {
  "result": "[n{n  "VIN":"1C4RJFAG0FC625797",n  "Type":"Sedan",n  "Year":2019,n  "Make":"Mercedes",n  "Model":"CLK 350",n  "Color":"White"n}]",
  "success": true
}

Resolving the Result

Before it returns the result to the caller, AWS AppSync applies the second part of the resolver, which is the response mapping template. Like the request to the data source, the response to the caller is the result of the transformation of the response template.

AWS AppSync makes the result of calling the data source available in the same $context object as the query parameters discussed earlier. For this case, the result is found in the result field specifically. To map the result from Amazon QLDB to a valid AWS AppSync result, the mapping template uses a built-in utility function to parse the “stringified” JSON result from the integration function and returns the first result as a JSON object. The following code is a simplified version of the getVehicle response mapping template:

#set( $result = $util.parseJson($ctx.result.result) )
$util.toJson($result[0])

Because you can uniquely tie resolvers to AWS AppSync operations, the request and response mapping templates provide quite a bit of flexibility based on the use case. For this post, you can expect only a single result (or an error). Other operations may return an array of results or some other response; you can customize these via the mapping template.

The following code is the result of your original getVehicle query. The shape of the result is a subset of the Vehicle type in your schema, based on the fields selected in the request:

{
  "data": {
    "getVehicle": {
      "Make": "Mercedes",
      "Model": "CLK 350",
      "Year": 2019
    }
  }
}

Conclusion

This post walked you through using AWS AppSync, Lambda, and Amazon QLDB to perform a relatively straightforward query. To implement the getVehicle query, you authored an AWS AppSync resolver, attached a Lambda integration function, and queried Amazon QLDB.

You can take advantage of the inherent benefits of Amazon QLDB and AWS AppSync by integrating them. You can use the managed ledger from Amazon QLDB for use cases that require a verifiable transaction log and interact with the ledger from a variety of clients via AWS AppSync.

Visit Building a GraphQL interface to Amazon QLDB with AWS AppSync: Part 2, where I expand on the capabilities of the DMV API, including multi-step queries, mutations, and querying for data changes in the ledger. For a complete working example, see the GitHub repo.

 


About the Author

 

Josh Kahn is a Principal Solutions Architect at Amazon Web Services. He works with the AWS customers to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.