Do you have transactional applications that have high uptime requirements? Do you want a relational database in the cloud to help you meet those requirements? The newly launched Amazon Aurora Multi-Master, is designed for applications that need a relational database resilient to node failures and highly available for both reads and writes.
Amazon Aurora is a relational database service that combines the speed and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases. The MySQL-compatible edition of Aurora delivers up to five times the throughput of standard MySQL running on the same hardware. It enables existing MySQL applications and tools to run without requiring modification. Amazon Aurora supports one writer and up to 15 read replicas in one or more Availability Zones and Regions.
Amazon Aurora Multi-Master is available for the MySQL-compatible edition of Aurora. Every node in the cluster is a writer node, giving you additional power to run transactional applications that have high uptime requirements.
In this post, I review what you should know to make the best possible use of the Aurora Multi-Master for MySQL-compatible edition database.
Aurora Multi-Master is designed to achieve high availability and ACID transactions across a cluster of database nodes with configurable read after write consistency. At its core, an Aurora cluster consists of a set of compute (database) nodes and a shared storage volume. The storage volume consists of six storage nodes placed in three Availability Zones for high availability and durability of user data. Every database node in the cluster is a writer node that can run read and write statements.
The following diagram shows a logical view of a two-node Aurora Multi-Master cluster.
You can see in the diagram that there is no single point of failure in the cluster. Applications can use any writer node for their read/write and DDL needs. A database change made by a writer node is written to six storage nodes in three Availability Zones, providing data durability and resiliency against storage node and Availability Zone failures. The writer nodes are all functionally equal, and a failure of one writer node does not affect the availability of the other writer nodes in the cluster.
Replication and quorum
When the application layer starts a transaction and makes a database change, the writer node that processes the change achieves cross-cluster consistency by proposing the change to all six storage nodes. Each storage node then checks if the proposed change conflicts with a change in flight or a previously committed change and either confirms the change or rejects it.
If the writer node that proposed the change receives a positive confirmation from a quorum of storage nodes, it does two things. First, it commits the change in the storage layer, causing each storage node to commit the change. It then replicates the change records to every other writer node in the cluster using a low latency, peer-to-peer replication protocol. The peer writer nodes, upon receiving the change, apply the change to their in-memory cache (buffer pool).
If the writer node that proposed the change does not receive a positive confirmation from a quorum of storage nodes, it cancels the entire transaction and raises an error to the application layer. The application can then retry the transaction.
From an implementation standpoint, writer nodes propose changes to pages. A page is a block of memory that contains a set of rows and is 16 KB in size. Pages have an ID and a log sequence number (LSN). Writer nodes update pages in the storage nodes by sending redo change records to the storage layer.
Upon successfully committing changes to the storage layer, writer nodes replicate the redo change records to peer writer nodes for buffer pool refresh in the peer node. For more information about buffer pool, pages, and a general understanding of the InnoDB storage engine, see the InnoDB storage engine documentation.
Aurora Multi-Master improves upon the high availability of the single-master version of Amazon Aurora because all of the nodes in the cluster are read/write nodes.
With single-master Aurora, a failure of the single writer node requires the promotion of a read replica to be the new writer. In the case of Aurora Multi-Master, the failure of a writer node merely requires the application using the writer to open connections to another writer.
You can engineer for high uptime by appropriately allocating applications to writers (see the section on conflict avoidance) and by re-distributing connections to healthy writers upon detection of a writer failure. When designing for high availability, make sure that you are not overloading writers. Implement health checks of writer nodes in the application and move connections from failed writers to healthy writers.
One way to accomplish this in a language like Java is to implement a singleton class that manages database connections. This class can encapsulate health checks. The class can also maintain a writer assignment map that maps writer nodes to applications or application components. This allows concurrent writers to process non-overlapping updates, preventing conflicts (see the next section for scenarios that can cause conflicts). Just request connections from the Singleton class without worrying about choosing a writer. Ensure that the class is thread-safe with appropriate guardrails for preventing connection issues in a multi-threaded environment.
The following diagrams demonstrate how connection re-distribution by the application tier upon writer node failure and subsequent recovery can provide for continuous database availability. Figure 1 shows the application with established database connections to Writer 1 and Writer 2. Figure 2 shows a failure of Writer 1, taking it out of the cluster. The application re-establishes (moves) connections that were on the failed Writer 1 to Writer 2. Figure 3 shows Writer 1 repaired and back online. The application moves connections that were moved to Writer 2 back to Writer 1.
Conflicts arise when concurrent transactions or writes executing on different writer nodes attempt to modify the same set of pages. Conflict resolution in a distributed system requires a foolproof mechanism to establish causality and ordering of events that happen across the system.
Many systems centralize writes to detect and prevent conflicts at the expense of performance. In such a system, every database node that accepts writes must forward the write to a central master and wait for an acknowledgment. While parallelism for writes is achieved, the solution is rife with problems as the central write master can become a performance bottleneck.
Conflict detection in Aurora Multi-Master is distributed and is not plagued by the performance problems seen in systems where a centralized agent is responsible for proxying writes to the database. Writers propose changes to pages to the storage nodes. Each storage node compares the LSN (think of this as a page version) of the page submitted by the writer node with the LSN of the page on the node. It approves the change if they are the same and rejects the change with a conflict if the storage node contains a more recent version of the page.
Multiple writers can concurrently propose a change to a page. The first writer to receive a quorum of approvals wins and can continue its transaction. Other losing writers abort the change and roll back their transactions. Writer transactions modify rows in pages and storage nodes compare entire pages to detect conflicts. If the rows are in the same page, two concurrent writers that are modifying two different rows in a table can still conflict. We recommend following a few best practices when designing applications for conflict avoidance.
- Avoid performing overlapping page updates from concurrent writers. If you have a sharded database, it would be a good idea to assign a writer to a shard and update shards through the assigned writer. The mapping is only logical in the application layer. Physically, the data in the storage volume is visible to all writer nodes. Transactions that span shards can still be executed from a single writer.
- When a conflict is raised by the database node to the application layer, retry the transaction. Techniques such as exponential backoff give the buffer pool time to catch up with replication and reflect the most recent change to the page touched by the transaction, increasing the chances of success.
- Based on your own application’s design and needs, route queries to writers in a way that achieves an acceptable write conflict ratio, equal writer utilization, and the best possible availability.
Here are a few examples of scenarios that create conflicts. Aurora Multi-Master returns the MySQL error 1213 (deadlock) when the storage layer detects a conflict and rejects a change. The following scenarios are simulated using the MySQL CLI and a two-node Aurora Multi-Master cluster.
Scenario 1: Adjacent rows
Two writers concurrently update adjacent rows in table t1. These rows are stored in the same page by InnoDB. The two concurrent transactions create a physical conflict in the storage layer, causing one of the sessions to abort with a deadlock error.
Scenario 2: Same row concurrently
Two writers concurrently update the same row in table t1. If the two sessions are executed from the same writer, it results in a logical conflict in one of the sessions, a lock wait condition, and eventually a lock wait timeout.
In Aurora Multi-Master, when the two sessions are concurrently executed from two different writers, it causes one of the sessions to abort with a deadlock error. In the scenario below, Writer 1 begins a transaction and updates a row in table t1 and before it commits the transaction, Writer 2 attempts to update the same row in table t1 and fails with a deadlock error.
Scenario 3: Same row with lag
Two writers update the same row, a few seconds apart from each other. In this scenario, Session 1 issues an update statement which changes the row with id = 1 in table t1 and implicitly auto-commits the change. A few seconds later, Writer 2 attempts to update the same row in table t1. Both transactions succeed without any conflicts because there is enough time for replication in the cluster to catch up between the writes.
Global read-after-write consistency
Read consistency is the desire for reads to reflect the state of the most recent, committed changes regardless of which node is used for writing and for reading. Most implementations provide read-after-write consistency if the read happens on the same node that wrote to the database. Reads that happen on other nodes immediately after a write may not see the write change for a few milliseconds, depending on replication lag.
Aurora Multi-Master gives you the flexibility to choose between a default node-level consistency mode or a cluster-wide consistency mode called global read-after-write (GRAW). With GRAW, applications can perform cluster-wide consistent reads on top of the most recent data. There is a slight penalty to pay in terms of read performance, but it’s an acceptable tradeoff for applications that need strongly consistent reads.
In the default mode, read-after-write consistency is guaranteed if the read is performed on the same writer node. The other nodes reflect the state of the write within a few milliseconds (replication lag).
Aurora Multi-Master for the MySQL-compatible edition of Aurora can be a great solution for MySQL workloads that require multiple writers with robust conflict management. It also supports high uptime, application scenarios such as Active-Active deployments, and SaaS provider use cases, such as when multi-tenant databases with one database per tenant must be run.
At launch, Aurora Multi-Master supports two node clusters in a single Region. Support for more writer nodes and placement of writers in multiple Regions is planned for future releases.
About the Author
Mukund Sundararajan is a Solutions Architect at Amazon Web Services.