How to use ProxySQL with open source platforms to split SQL reads and writes on Amazon Aurora clusters

The blog post How to set up a single pgpool endpoint for reads and writes with Amazon Aurora PostgreSQL introduces an architecture that uses the read and write split capabilities of Amazon Aurora PostgreSQL endpoints. This type of architecture works great for Aurora PostgreSQL clusters, but what if you are using Amazon Aurora MySQL clusters for your database? This post complements the original one by introducing ProxySQL as an intermediate layer for achieving read and write splits with Aurora MySQL endpoints.

Amazon Aurora provides endpoints for the primary DB instance (the cluster endpoint) and for Read Replicas (the reader endpoint). Aurora updates the cluster endpoint automatically so that it always points to the primary instance. It uses the reader endpoint to do DNS round-robin for the connections for read operations across all available Read Replicas. Additionally, you can use Amazon Aurora custom endpoints to further segregate the traffic for your application.

Amazon Aurora Replicas typically have less than 100 ms of replication lag. So, if your application can tolerate that lag, it can use both the cluster and reader endpoints to take advantage of a horizontally scaled database, as shown following.

The preceding diagram shows the current architecture with the application deciding which endpoint to use. However, managing two database endpoints, one for reads and one for writes, adds complexity to the application. Some third-party drivers support a narrower set of use cases, but you can apply the ideas in this post more broadly to read and write split use cases.

In this post, we show how to use ProxySQL to provide a single MySQL-compatible Aurora endpoint that automatically directs write traffic to the cluster endpoint and read traffic to the reader endpoint. The following diagram shows the proposed solution based on ProxySQL middleware.

Architecture

ProxySQL is a GNU General Public License (GPL)–licensed MySQL proxy middleware that sits between a MySQL database and the database clients. In this example, we deploy it using the architecture in the following figure.

This diagram shows how you can use ProxySQL as middleware to provide a single endpoint for a MySQL-compatible Aurora cluster. The Aurora cluster consists of one primary instance and two Aurora Read Replicas that are deployed across two Availability Zones and two private subnets. The cluster has a security group that allows incoming data only from the ProxySQL instances.

ProxySQL is deployed behind a Network Load Balancer in an Auto Scaling group with a single active instance (for demonstration purposes in this post) for failover purposes. You can change this configuration by updating the master.yaml file (under Resources.ProxySQLStack.Properties.Parameters), as shown following:

MinSize: 2
MaxSize: 4
DesiredCapacity: 2

This configuration change provisions two ProxySQL instances for regular load, and it can go up to four instances during peak loads. Also, this improves the availability of the database if there is a disruption to one of the ProxySQL nodes.

ProxySQL is also isolated in private subnets with a security group that permits access only from allowed Classless Inter-Domain Routing (CIDR) blocks.

The access subnets host a Network Load Balancer that provides a consistent endpoint for ProxySQL. So, your database endpoint doesn’t change if the ProxySQL instance fails and the Auto Scaling group creates a new one.

Deploying with AWS CloudFormation

The AWS CloudFormation templates that are used in this example are located in this GitHub repository. The example uses several nested AWS CloudFormation templates to deploy the virtual private cloud (VPC) infrastructure, the security groups, the Aurora cluster, and the ProxySQL middleware. Using nested stacks lets you break up a large stack into several reusable components. If you aren’t familiar with AWS CloudFormation, review the AWS CloudFormation documentation.

For complete instructions on deploying the templates, see the README file on GitHub. The following sections contain some of the highlights.

Creating the Amazon Aurora cluster

The following AWS CloudFormation snippets show how to create a three-node Amazon Aurora cluster when the VPC infrastructure and security groups are ready. Amazon Aurora sets one node as the primary and the other two nodes as Read Replicas.

DBAuroraCluster:
    Type: "AWS::RDS::DBCluster"
    Properties:
      DatabaseName: !Ref DatabaseName
      Engine: aurora-mysql
      MasterUsername: !Ref DatabaseUser
      MasterUserPassword: !Ref DatabasePassword
      VpcSecurityGroupIds: 
        - !Ref DBFirewall
      Tags:
        - Key: Project
          Value: !Ref ProjectTag
  DBAuroraOne:
    Type : "AWS::RDS::DBInstance"
    Properties:
      DBClusterIdentifier: !Ref DBAuroraCluster
      Engine: aurora-mysql
      DBInstanceClass: !Ref DbInstanceSize
      Tags:
        - Key: Project
          Value: !Ref ProjectTag
  DBAuroraTwo:
    Type : "AWS::RDS::DBInstance"    
    Properties:
      DBClusterIdentifier: !Ref DBAuroraCluster
      Engine: aurora-mysql
      DBInstanceClass: !Ref DbInstanceSize
      Tags:
        - Key: Project
          Value: !Ref ProjectTag
  DBAuroraThree:
    Type : "AWS::RDS::DBInstance"    
    Properties:
      DBClusterIdentifier: !Ref DBAuroraCluster
      Engine: aurora-mysql
      DBInstanceClass: !Ref DbInstanceSize
      Tags:
        - Key: Project
          Value: !Ref ProjectTag 

Deploying ProxySQL

In the AWS CloudFormation template that deploys ProxySQL, you set up an ELB load balancer and an Auto Scaling group. The launch configuration for the Auto Scaling group deploys and configures ProxySQL using the AWS CloudFormation cfn-init tool.

In this solution, we use the Amazon Linux AMI for each Amazon EC2 instance. The Amazon Linux AMI includes packages and configurations that provide seamless integration with AWS. The repositories are available in all AWS Regions, and you can access them using yum. Hosting repositories in each Region allows you to deploy updates quickly and without any data transfer charges. If you want more information about updating your EC2 instance software, see Updating Instance Software in the Amazon Elastic Compute Cloud User Guide.

First, you need to install the MySQL client:

sudo yum install -y mysql

Next, download the ProxySQL repo:

wget -P /tmp/ https://github.com/sysown/proxysql/releases/download/v2.0.0-rc2/proxysql-rc2-2.0.0-1-centos7.x86_64.rpm

Then, install the repo:

#Install proxysql
sudo yum install -y /tmp/proxysql-rc2-2.0.0-1-centos7.x86_64.rpm 

Finally, start ProxySQL:

sudo service proxysql start

Configuring ProxySQL

You can do most of the ProxySQL configuration using the MySQL client by connecting locally using admin credentials. The default user name and password for ProxySQL are as follows:

‘admin’ and ‘admin’

We strongly recommended that you change these credentials after the initial configuration.

The following command configures initial read and write split options in ProxySQL:

mysql -u admin -padmin -h 127.0.0.1 -P6032 main < proxysql.sql

Although the options contained in the proxysql.sql file are good for a quick split of the read and write traffic, you can configure more sophisticated query routing rules in ProxySQL. The following provides an explanation of the contents of the proxysql.sql file.

First, you ensure that the mysql_servers and mysql_replication_hostgroups tables are clean for your config. The mysql_servers table contains entries for all the backend endpoints that are to be proxied by ProxySQL. The mysql_replication_hostgroups table contains the configuration for automatic identification and segregation of read-only endpoints.

Before version 2, ProxySQL checked for the “read_only” variable to determine reader endpoints from writer endpoints, whereas Amazon Aurora bases its definition of readers versus writers based on the “innodb_read_only” variable. ProxySQL version 2.0 has added support for custom specification for variable definition using the check_type column on the mysql_replication_hotgroups table.

delete from mysql_servers where hostgroup_id in (10,20);
delete from mysql_replication_hostgroups where writer_hostgroup=10;

Next, you configure your endpoints in the mysql_servers table:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('${ClusterEndpoint}',10,3306,1000,2000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('${ReaderEndpoint}',20,3306,1000,2000);

Then, you configure the mysql_replication_hostgroups table to define the writer and reader host groups using the check_type value of “innodb_read_only”:

INSERT INTO mysql_replication_hostgroups (writer_hostgroup,reader_hostgroup,comment,check_type) VALUES (10,20,'aws-aurora','innodb_read_only');

Here, you write the configuration back to the disk so that it persists if the ProxySQL service is restarted:

LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;

Next, you ensure that the mysql_query_rules table is clean for your config:

delete from mysql_query_rules where rule_id in (50,51);

Then, you define the query routing rules for your needs. You can define your rules to support a variety of use cases including sharding, digest-based, or read/write split-based. This example shows a basic read/write split scenario configuration:

INSERT INTO mysql_query_rules (rule_id,active,match_digest,destination_hostgroup,apply) VALUES (50,1,'^SELECT.*FOR UPDATE$',10,1), (51,1,'^SELECT',20,1);

The previous two rules configure for all write traffic to go into the writer host group and all the read traffic to go to the reader.

Again, you write the configuration back to the disk so that it persists if the ProxySQL service is restarted:

LOAD MYSQL QUERY RULES TO RUNTIME; SAVE MYSQL QUERY RULES TO DISK;

Next, you ensure that the mysql_users table is clean for your config:

delete from mysql_users where username='${DatabaseUser}';

Next, you define your backend users in the ProxySQL config:

insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('${DatabaseUser}','${DatabasePassword}',1,10,'${DatabaseName}',1);

Again, you write the configuration back to the disk so that it persists if the ProxySQL service is restarted:

LOAD MYSQL USERS TO RUNTIME; SAVE MYSQL USERS TO DISK;

Next, you configure the monitoring user in ProxySQL. This user is used for backend monitoring and needs Usage and Replication Client privileges:

UPDATE global_variables SET variable_value='${DatabaseUser}' WHERE variable_name='mysql-monitor_username';
UPDATE global_variables SET variable_value='${DatabasePassword}' WHERE variable_name='mysql-monitor_password';

Then, you configure the Backend MySQL version and tweak some monitoring intervals:

UPDATE global_variables SET variable_value='${BackendMySQLVersion}' WHERE variable_name='mysql-server_version';
UPDATE global_variables SET variable_value='2000' WHERE variable_name IN ('mysql-monitor_connect_interval','mysql-monitor_ping_interval','mysql-monitor_read_only_interval');

Again, you write the configuration back to the disk so that it persists if the ProxySQL service is restarted:

LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL VARIABLES TO DISK;

You should provide the following variables to the proxysql.sql file for the setup:

VariableValueNotes
ClusterEndpointAurora cluster endpointThis is also the writer host for ProxySQL.
ReaderEndpointAurora reader endpointThis is also the reader host for ProxySQL.
DatabaseUserAurora database userThis is also used as a monitoring user for your cluster. You can replace it with a custom user. If you create a custom user for monitoring, be sure to update the proxysql.sql config file.
DatabasePasswordAurora database passwordThis is also used as the password for the monitoring user for your cluster. If you create a custom user for monitoring, be sure to update the corresponding password in the proxysql.sql config file.
DatabaseNameAurora database name
BackendMySQLVersionThe Amazon Aurora database MySQL versionThis is the version returned by ProxySQL back to your consuming application. Ensure that this is set to the same version as for the backend. The default setting for ProxySQL version 2.0 is 5.5.30.

Also note that the default port and interface that ProxySQL provides for client connections are as follows:

0.0.0.0:6033 # Requires service restart if changed.

And following is the default setting for the administration of ProxySQL:

Testing the configuration

Now you can issue a few SQL statements and confirm that ProxySQL is directing traffic as expected.

Start by creating a table and inserting a few rows:

CREATE TABLE Persons (
    PersonID int,
    LastName varchar(255),
    FirstName varchar(255),
    Address varchar(255),
    City varchar(255)
);

INSERT INTO Persons (PersonID, LastName, FirstName, Address, City)
VALUES (1, 'Doe', 'John', '123 Main Street', 'North York');
INSERT INTO Persons (PersonID, LastName, FirstName, Address, City)
VALUES (2, 'Doe', 'Jane', '456 Second Street', 'Toronto');
INSERT INTO Persons (PersonID, LastName, FirstName, Address, City)
VALUES (3, 'Smith', 'Robert', '789 Third Street', 'Vaughan');

Checking in ProxySQL stats schema, you can see that these statements are directed to the Amazon Aurora primary host group (host group 10):

mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "select hostgroup, schemaname, username, digest, digest_text, count_star from stats_mysql_query_digest;"

hostgroupschemaname      username     digest             digest_textcount_star
10proxysqlexampleproxysqluser0xCF52DCD38A9B9942CREATE TABLE Persons ( PersonID int, LastName varchar(?), FirstName varchar(?), Address varchar(?), City varchar(?) )1
10proxysqlexampleproxysqluser0xE7B5E8C714313F56INSERT INTO Persons (PersonID, LastName, FirstName, Address, City) VALUES (?, ?, ?, ?, ?)3

Next, issue a query:

Checking in ProxySQL stats schema, you can see that this statement is directed to the Amazon Aurora reader endpoint (host group 20):

mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "select hostgroup, schemaname, username, digest, digest_text, count_star from stats_mysql_query_digest;"

hostgroupschemaname      username     digest             digest_textcount_star
10proxysqlexampleproxysqluser0xCF52DCD38A9B9942CREATE TABLE Persons ( PersonID int, LastName varchar(?), FirstName varchar(?), Address varchar(?), City varchar(?) )1
10proxysqlexampleproxysqluser0xE7B5E8C714313F56INSERT INTO Persons (PersonID, LastName, FirstName, Address, City) VALUES (?, ?, ?, ?, ?)3
20proxysqlexampleproxysqluser0xDEB542EDC426A35Fselect * from Persons1

As a final test, confirm that ProxySQL handles SELECT statements that are embedded in a transaction:

BEGIN;
    INSERT INTO Persons (PersonID, LastName, FirstName, Address, City)
VALUES (4, 'Garg', 'Raj', '234 Fourth Street', 'North York');

    INSERT INTO Persons (PersonID, LastName, FirstName, Address, City)
VALUES (5, 'Doe', 'John', '373 Fifth Street', 'Brampton');
    SELECT * from Persons where PersonID > 3;
COMMIT;

These statements should all hit the primary node as the embedded SELECT is querying rows that are written in the transaction. You can again see from the log that things are working as expected.

mysql -u admin -padmin -h 127.0.0.1 -P6032 -e "select hostgroup, schemaname, username, digest, digest_text, count_star from stats_mysql_query_digest;"

hostgroupschemaname      username     digest             digest_textcount_star
10proxysqlexampleproxysqluser0xCF52DCD38A9B9942CREATE TABLE Persons ( PersonID int, LastName varchar(?), FirstName varchar(?), Address varchar(?), City varchar(?) )1
10proxysqlexampleproxysqluser0xE7B5E8C714313F56INSERT INTO Persons (PersonID, LastName, FirstName, Address, City) VALUES (?, ?, ?, ?, ?)5
20proxysqlexampleproxysqluser0xDEB542EDC426A35Fselect * from Persons1
10proxysqlexampleproxysqluser0xFAD1519E4760CBDEBEGIN1
10proxysqlexampleproxysqluser0x379FAD0823D3FF7BSELECT * from Persons where PersonID > ?1

We recommend that you do a full regression test with your own database clients before using this configuration in production. If the scalability of the ProxySQL middleware concerns you, consider using a clustered ProxySQL instance, where the Auto Scaling group allows multiple instances and scales based on load.

Summary: Striving for simplicity

In this post, you learned how to use ProxySQL to provide a single endpoint for an Amazon Aurora cluster that automatically directs read traffic to the reader endpoint. You can use this technique to simplify how your application handles connections to Amazon Aurora.

If you need more advanced routing options, such as the ability to route database connections based on custom logic, you can explore the query routing capabilities of ProxySQL in further detail. With custom logic, you can handle more advanced routing, such as sending a small percentage of queries to a newer schema for canary testing or weight-based routing.

In the future, keep an eye on Amazon Aurora Serverless, which scales the database backend automatically without requiring the configuration of replicas.


About the Authors

Tulsi Garg is a Senior Solutions Architect for AWS Worldwide Public Sector Canada Team. He works with Canadian Public Sector customers (Government, Non Profit Organizations, Startups and Educational Institutes) to help accelerate their journey to the cloud. He is passionate about IoT and has been found to use his daughters as an excuse to buy tech toys.

 

 

 

Diego Magalhães is a AWS Senior Solutions Architect – World Wide Public Sector, working with Education customers throughout Canada. Always looking for new challenges, the latest one was trading the sunny weather in Brazil for the Toronto cold winters. ☃️