How Waves runs user queries and recommendations at scale with Amazon Neptune

This is a guest post by Pavel Vasilyev, Director of Solutions Architecture at ClearScale, an APN Premier Consulting Partner that provides a full range of cloud professional services.

When executive management from Waves, a Y Combinator-backed mobile dating app, realized they were outgrowing their existing IT architecture on Google Cloud, they knew it was time to migrate to AWS. Waves decided to partner with ClearScale to migrate from Google Cloud to AWS and implement features with services including Amazon Neptune, a purpose-built graph database engine. Neptune enables developers to build and run applications that rely on connected datasets, which is the functionality that Waves needed.

The Waves app had been experiencing several challenges due to rapid growth of the user base. These challenges included reliability and latency issues caused by specific database queries that the app’s embedded recommendation engine ran. Additionally, Waves’s infrastructure expenses were scaling faster than anticipated, with no plateau on the horizon.

In this post, we discuss how ClearScale migrated Waves’s workloads to AWS and used Neptune to build a sophisticated recommendation engine capable of handling massive query volumes. We also explain how ClearScale enhanced Waves’s overall architecture using other AWS features.

The project

Waves and ClearScale worked together to scope an engagement consisting of five stages:

  1. Static data migration from Cloud Storage to Amazon Simple Storage Service (Amazon S3)
  2. User profile data migration from Cloud Firestore to Neptune
  3. Implementation of Neptune Streams to capture graph changes
  4. Implementation of AWS AppSync and GraphQL to transmit database queries
  5. Optimization of end-user experience with Amazon Cognito and Amazon Pinpoint

Migrations had to be executed with extreme caution so that existing users experienced little disruption. Additionally, the final state had to empower Waves developers to deploy new functionality and improvements over the long term.

Migrating static data to Amazon S3

Waves previously stored static data, such as user profile photos, in Google Cloud Platform (GCP) Cloud Storage. ClearScale migrated this data to Amazon S3, a highly scalable and durable object storage service. Amazon S3 comes with a range of storage classes characterized by different access levels and rates, which enables you to maximize cost-effectiveness.

ClearScale’s database experts used a transient EMR and DistCp command to execute the migration. DistCp uses MapReduce to copy data from one cluster to another. The team downloaded static data to EMR HDFS and uploaded it to Amazon S3.

To allow connections to GCP Cloud Storage, the team configured SSH access to the cluster and added required properties to the /etc/hadoop/conf/core-site.xml configuration file. The value for the google.cloud.auth.service.account.json.keyfile property pointed to the json-key file provided by the Waves team.

Migrating user profile data to Neptune

Waves initially stored user profile data in Google Cloud Firestore. Because the total data load was relatively small at the time of the migration, ClearScale was able to move all information to Neptune using a separate Amazon Elastic Compute Cloud (Amazon EC2) instance and the node-firestore-import-export (MIT license) utility.

Data was downloaded in a JSON format and converted to a Neptune-compatible format using a custom Node.js utility. The final step of the migration involved using a POST request for the standard endpoint and Neptune bulk loader endpoint. For more information, see Neptune Loader Command.

Why Neptune?

ClearScale chose Neptune because the graph database service can store and query billions of relationships with minimal latency. Social networking and dating apps, like Waves, require this functionality to present users with quality profile matches. Neptune supports mainstream graph models and query languages, and allows for read replicas, point-in-time recovery, and continuous backups. Neptune is also a fully-managed graph database service, which means that development teams can offload burdensome administrative tasks, such as hardware provisioning, configuration, and software patching.

Implementing Neptune Streams to capture graph changes

Neptune also includes Neptune Streams, a mechanism for logging database changes. The feature tracks every graph change and logs entries synchronously with the transactions that cause those changes. With Neptune Streams, Waves developers can add code to the application that reacts to changes and delivers updates to the front-end experience. Users can retrieve change records using an HTTP REST API.

ClearScale developed a filtering component that analyzes the Neptune Streams log and processes events based on specific actions. For example, a change in a user’s geolocation triggers a recalculation of the list of suggested profiles presented to the individual. A similar process occurs for sending notification messages.

Implementing AWS AppSync and GraphQL

ClearScale implemented AWS AppSync so that the Waves app could send database queries from its mobile application. The managed service uses GraphQL to orchestrate data flows between several data sources. It was particularly useful in simplifying how user behaviors in the Waves app were sent and stored in the application’s server component.

At the application level, HTTP requests are sent to the AWS AppSync API, which validates them against a defined GraphQL schema. AWS AppSync uses request and response mapping templates to map to and from the data source accordingly. Using AWS AppSync allowed ClearScale to focus on the application logic and add a decoupling layer to the architecture, which simplified the development process.

Optimizing end-user experience with Amazon Cognito and Amazon Pinpoint

The final component of the project involved the implementation of Amazon Cognito and Amazon Pinpoint to enhance the front-end experience. Amazon Cognito allowed Waves to authorize new user profiles through the users’ personal devices. Amazon Pinpoint enables the app to send tailored push notifications directly to users.

Because ClearScale migrated an existing application, it was crucial to transition current users in the first step. Existing user profiles were pre-created in the Amazon Cognito User Pool to provide a seamless transition between old and new environments. New users are registered in the Amazon Cognito during the sign-up process.

The Waves team also wanted to keep push notifications in the application. ClearScale implemented Amazon Pinpoint to fulfill this requirement.

Summary

In this post, we covered how ClearScale migrated a mobile application from Google Cloud to AWS and implemented several cloud services to maximize performance, scalability, and reliability.

Central to the engagement was Neptune, a purpose-built graph database engine that is especially useful for applications involving recommendation engines, fraud detection, and knowledge graphs. The Waves dating app uses Neptune in combination with Amazon Cognito, Amazon Pinpoint, and AWS AppSync to streamline the front-end user experience and grow to meet future demand.

 


About the Author

Pavel Vasilyev, Director of Solutions Architecture

Pavel leads the Solutions Architecture team at ClearScale. Pavel has more than 12 years of hands-on experience in designing and deploying scalable, highly available, and fault-tolerant systems. He holds a Master’s degree in Engineering and Technology in Informatics and Computer Science.